data = data.groupby(columns_for_groupby).agg(first('AGE').alias('AGE'),
first('WEIGHT').alias('WEIGHT'),
first('MOBILE').alias('MOBILE'),
sum('HEIGHT').alias('HEIGHT_SUM'),
collect_set('WORK_EXP').alias('WORK_EXP_LIST'),
F.countDistinct('PLANT').alias('PLANT_COUNT'),
first('DATE').alias('DATE'))
So I have code written in this format, I have many features that fall under first, sum, collect_set, F.countDistinct respectively. I want to write a function that will take list of first, sum, collect, distinct variables and pass a dataframe with the respective groupby along with respective renaming. I am pretty new to pyspark, any help would be appreciated.
Thanks
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…