Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
555 views
in Technique[技术] by (71.8m points)

python - How to create rolling percentage for groupby DataFrame

I am trying to calculate the percent change by month for each product. Here is what I have so far. I have this working for a DataFrame involving a single product. I am stumped on how to get the calculation applied to a result set that contains many products and many months.

Example dataframe:

product_desc    activity_month    prod_count
product_a       1/1/2014          53
product_b       1/1/2014          42
product_c       1/1/2014          38
product_a       2/1/2014          26
product_b       2/1/2014          48
product_c       2/1/2014          39
product_a       3/1/2014          41
product_b       3/1/2014          35
product_c       3/1/2014          50

What I need to get out is the dataframe with a percentage change by product_desc by month added to it:

product_desc    activity_month   prod_count pct_change
product_a       1/1/2014         53 
product_a       2/1/2014         26         0.490566038
product_a       3/1/2014         41         1.576923077
product_b       1/1/2014         42 
product_b       2/1/2014         48         1.142857143
product_b       3/1/2014         35         0.729166667
product_c       1/1/2014         38 
product_c       2/1/2014         39         1.026315789
product_c       3/1/2014         50         1.282051282

I can calculate this on a dataframe with a single product_desc with this:

df['change_rate1'] = df['prod_count'].shift(-1)/df['prod_count']
df['pct_change'] = df['change_rate1'].shift(1)
df = df.drop('change_rate1',1)

Here is what I am trying now:

df_grouped = df.groupby(['product_desc','activity_month'])

for product_desc, activity_month in df_grouped:
   df['change_rate1'] = df_grouped['prod_count'].shift(-1)/df_grouped['prod_count']

However, I get back a 'NotImplementedError' on the last line in the for statement.

Any advice on how to get this calculated correctly is appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Well it looks like within groups, there is one observation per month and you want the percent change from one month to the next. You can do that with a groupby/apply by grouping on 'product_desc' and then using the built in pct_change() method:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].pct_change() + 1

Note, I added 1 to the pct_change() method because it computes the net percent change. I'll print out a sorted version so it matches your expected output:

>>> df.sort('product_desc')

  product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

On older versions of pandas you might have to do:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].apply(lambda x: x.pct_change() + 1)

Or you could use shift as you suggest with a small modification:

>>> df['pct_ch'] = df['prod_count'] / df.groupby('product_desc')['prod_count'].shift(1)
>>> df.sort('product_desc')

  product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

You don't need to refer to df['prod_count'] within a groupby, you're not doing anything to that column.

On older versions of pandas you might have to do:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].apply(lambda x: x/x.shift(1))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...