My data:
df = pd.DataFrame(
{
"City": ["NY", "NY", "LA", "LA", "MIA", "MIA"],
"Mall": ["A", "A", "B", "B", "C", "C"],
"Category": ["Milk", "Egg", "Egg", "Beef", "Egg", "Orange"],
"Price": [5, 10, 4, 9, 6, 11]
}
)
df
df = df.set_index(['City', 'Mall', 'Category'])
df
It looks like:
Price
City Mall Category
NY A Milk 5
Egg 10
LA B Egg 4
Beef 9
MIA C Egg 6
Orange 11
(The data must be in multi-index form)
At first I tried the following calculation and it worked out perfectly:
df['Price_diff'] = df['Price'].groupby(level=[0,1]).pct_change(periods=-1)
df
>>>
Price Price_diff
City Mall Category
NY A Milk 5 -0.500000
Egg 10 NaN
LA B Egg 4 -0.555556
Beef 9 NaN
MIA C Egg 6 -0.454545
Orange 11 NaN
Then, for each city, mall, and category, I wanted to compare the price difference with egg. I wrote:
df['Price_diff'] = df['Price'].sub(df['Price'].where(df['Category'].eq('Egg')).groupby(level=[0,1]).transform('first'))
This is when I got the error message:
KeyError: 'Category'
Why is this happening and what should I do?
Expected outcome:
Price Price_diff
City Mall Category
NY A Milk 5 -5
Egg 10 0
LA B Egg 4 0
Beef 9 5
MIA C Egg 6 0
Orange 11 5