Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
317 views
in Technique[技术] by (71.8m points)

python - Calculating in dataframe - why am I getting KeyError here?

My data:

df = pd.DataFrame(
    {
        "City": ["NY", "NY", "LA", "LA", "MIA", "MIA"],
        "Mall": ["A", "A", "B", "B", "C", "C"],
        "Category": ["Milk", "Egg", "Egg", "Beef", "Egg", "Orange"],
        "Price": [5, 10, 4, 9, 6, 11]
    }
)
df
df = df.set_index(['City', 'Mall', 'Category'])
df

It looks like:

                            Price
City    Mall    Category    
NY      A       Milk        5
                Egg         10
LA      B       Egg         4
                Beef        9
MIA     C       Egg         6
                Orange      11

(The data must be in multi-index form)

At first I tried the following calculation and it worked out perfectly:

df['Price_diff'] = df['Price'].groupby(level=[0,1]).pct_change(periods=-1)
df

>>>
                           Price    Price_diff
City    Mall    Category        
NY      A       Milk       5        -0.500000
                Egg        10       NaN
LA      B       Egg        4        -0.555556
                Beef       9        NaN
MIA     C       Egg        6        -0.454545
                Orange     11       NaN

Then, for each city, mall, and category, I wanted to compare the price difference with egg. I wrote:

df['Price_diff'] = df['Price'].sub(df['Price'].where(df['Category'].eq('Egg')).groupby(level=[0,1]).transform('first'))

This is when I got the error message:

KeyError: 'Category'

Why is this happening and what should I do?

Expected outcome:

                            Price   Price_diff
City    Mall    Category    
NY      A       Milk        5       -5
                Egg         10      0   
LA      B       Egg         4       0
                Beef        9       5
MIA     C       Egg         6       0
                Orange      11      5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Okay. So as Category is an index in the dataframe, it cannot be accessed by using df["Category"]. Instead, you could use df.index.get_level_values(2). The code is as follows:

df['Price_diff'] = df['Price'].sub(df['Price'].where(df.index.get_level_values(2) == 'Egg').groupby(level=[0,1]).transform('first'))
df

The output is:

                    Price  Price_diff
City Mall Category                   
NY   A    Milk          5        -5.0
          Egg          10         0.0
LA   B    Egg           4         0.0
          Beef          9         5.0
MIA  C    Egg           6         0.0
          Orange       11         5.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...