I have the following dataframe:
import pandas as pd
import numpy as np
np.random.seed(123)
n = 10
df = pd.DataFrame({"val": np.random.randint(1, 10, n),
"cat": np.random.choice(["X", "Y", "Z"], n)})
val cat
0 3 Z
1 3 X
2 7 Y
3 2 Z
4 4 Y
5 7 X
6 2 X
7 1 X
8 2 X
9 1 Y
I want to know the percentage each category X
, Y
, and Z
has of the entire val
column sum. I can aggregate df
like this:
total_sum = df.val.sum()
#32
s = df.groupby("cat").val.sum().div(total_sum)*100
#this is the desired result in % of total val
cat
X 46.875 #15/32
Y 37.500 #12/32
Z 15.625 #5/32
Name: val, dtype: float64
However, I find it rather surprising that pandas seemingly does not have a percentage/frequency function something like df.groupby("cat").val.freq()
instead of df.groupby("cat").val.sum()
or df.groupby("cat").val.mean()
. I assumed this is a common operation, and Series.value_counts
has implemented this with normalize=True
- but for groupby aggregation, I cannot find anything similar. Am I missing here something or is there indeed no out-of-the-box function?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…