Merging rows when some columns are the same using Pandas Python

Question

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Now I have a dataframe, I want to merge rows. The value B is determined by the order in the strings in a list L = ['xx','yy','zz']

    A   B
0   a   xx
1   a   yy
2   b   zz
3   b   yy

For row 0 and 1, the result will be 'a' for column A and 'xx' for column B ('xx' come before 'yy' in L)
For row 2 and 3, the result will be 'b' for column A and 'yy' for column B ('yy' come before 'zz' in L)

Desired outcome:

    A   B
0   a   xx
1   b   yy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:53:55+0000

df['C'] = df['B'].map(dict(zip(L,range(len(L)))))
df.groupby('A')[['B','C']].apply(lambda x: x.iloc[x["C"].argmin()]['B'])
#A
#a    xx
#b    yy

You can get the same result using pandas.Categorical:

df['B'] = pd.Categorical(df['B'], categories = L, ordered = True)
df.groupby('A').min()
#      B
#A
#a    xx
#b    yy