Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
852 views
in Technique[技术] by (71.8m points)

python - Pandas Group Weighted Average of Multiple Columns

Say I have the following dataframe:

>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
  category  var1  var2  weights
0        a    37    36        7
1        a    47    20        1
2        b    33     7        6
3        b    16     6        8

I can calculate the weighted average of a 'var1' as such:

>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a    38.250000
b    23.285714
dtype: float64

However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.

Just as I can get an unweighted average of both columns like this:

>>> Grouped[['var1','var2']].mean()
          var1  var2
category            
a         42.0  28.0
b         24.5   6.5

I'm wondering if there is a parallel way to do that with weighted averages.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can apply and return both averages:

In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

You could write this slightly cleaner as a function:

In [21]: def weighted(x, cols, w="weights"):
             return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)

In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...