Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
173 views
in Technique[技术] by (71.8m points)

python - How to Sum by Column in Pandas DF and Remove Additional Rows

I have a dataframe in the form:

    Sales House Station Day Date    Time    Daypart  Total          Unique Key
0   CARLTON     CARLTON Mon 3AUG20  1213    DAYTIME  0              CARLTON_ 3AUG20
1   CARLTON     CARLTON Mon 3AUG20  2307    POSTPEAK 30             CARLTON_ 3AUG20
2   CARLTON     CARLTON Tue 4AUG20  1015    COFFEE   30             NaN
3   CARLTON     CARLTON Tue 4AUG20  1027    COFFEE   30             CARLTON_ 4AUG20
4   CARLTON     CARLTON Wed 5AUG20  1310    DAYTIME  30             CARLTON_ 5AUG20

The Unique Key column is just a column I have added to try make this process easier (correct me if I am wrong please!). Essentially I would like to sum the Total column by using the Unique Key column, but also remove the extra rows associated with the Unique Key and only leaving one..

As an example, the above df would come out as the below. In this instance there is a match for row 1 and row 2, which the Total row should be summed, and then row 2 removed..

    Sales House Station Day Date    Time    Daypart  Total          Unique Key
0   CARLTON     CARLTON Mon 3AUG20  1213    DAYTIME  30             CARLTON_ 3AUG20
1   CARLTON     CARLTON Tue 4AUG20  1015    COFFEE   30             NaN
2   CARLTON     CARLTON Tue 4AUG20  1027    COFFEE   30             CARLTON_ 4AUG20
3   CARLTON     CARLTON Wed 5AUG20  1310    DAYTIME  30             CARLTON_ 5AUG20

Is there a way to easily do this?

question from:https://stackoverflow.com/questions/65917608/how-to-sum-by-column-in-pandas-df-and-remove-additional-rows

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Seems like you need df.groupby() method.

I would try doing this in three steps:

aggregated = df.groupby(['Station', 'Date'])['Total'].sum().reset_index() # Getting sum
df = df.drop_duplicates(['Station', 'Date'])                              # Removing duplicated rows
df = df.drop('Total', axis=1).merge(aggregated, on=['Station', 'Date'])   # Merge back

Edited according to the comment (added df = df.drop_duplicates(['Station', 'Date'])) line in order to remove duplicates.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...