Welcome To Ask or Share your Answers For Others

group by - How to write a function with multiple agg like sum, first, collect_set, count_distinct in pyspark?

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

group by - How to write a function with multiple agg like sum, first, collect_set, count_distinct in pyspark?

data = data.groupby(columns_for_groupby).agg(first('AGE').alias('AGE'),
    first('WEIGHT').alias('WEIGHT'),
    first('MOBILE').alias('MOBILE'),
    sum('HEIGHT').alias('HEIGHT_SUM'),
    collect_set('WORK_EXP').alias('WORK_EXP_LIST'),
    F.countDistinct('PLANT').alias('PLANT_COUNT'),
    first('DATE').alias('DATE'))

So I have code written in this format, I have many features that fall under first, sum, collect_set, F.countDistinct respectively. I want to write a function that will take list of first, sum, collect, distinct variables and pass a dataframe with the respective groupby along with respective renaming. I am pretty new to pyspark, any help would be appreciated. Thanks

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

...

Categories

group by - How to write a function with multiple agg like sum, first, collect_set, count_distinct in pyspark?

group by - How to write a function with multiple agg like sum, first, collect_set, count_distinct in pyspark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags