If you are using columns
to select a subset of df.columns
, and 'id'
is not in columns
, which seems to be the case, then the KeyError
will occur.
import pandas as pd
# sample dataframe
data = {'id': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4], 'b': [18, 18, 22, 22, 22, 15, 23, 19, 18, 17], 'c': [36, 32, 36, 38, 39, 36, 31, 36, 35, 37], 'd': [52, 51, 50, 54, 48, 53, 52, 54, 54, 45], 'e': [61, 64, 60, 69, 66, 65, 66, 69, 65, 63]}
df = pd.DataFrame(data)
columns = ['b', 'c', 'd']
# groupby columns without id
df[columns].groupby('id').mean()
# results in
KeyError: 'id'
Alternatives, which will work
# 1
df[['b', 'c', 'd']].groupby(df['id']).mean()
# 2
df[columns + ['id']].groupby('id').mean()