I have 2 dataframes: df1
and df2
import pandas as pd
import numpy as np
num = {'description': ['apple','Red APPLE is good',np.nan,'peach','melon',np.nan,'green apple'],
'code': ['AA','AA','BB','BB','CC','BB','BB']}
df1 = pd.DataFrame(num)
num2 = {'word': ['apple','peach'],
'code': ['ZZ','YY']}
df2 = pd.DataFrame(num2)
If the df1['description]
contains word 'apple'
, then the corresponding code should be 'ZZ'
. If it contains 'peach'
, then the corresponding code should be 'YY'
Before comparing, I should convert first value to uppercase. How can I do that?
I tried to chain.str.upper().contains(...)
but I got an error
for i in range(df2.shape[0]):
df1['code'] = np.where(df1['description'].str.contains(df2['word'][i].upper()), df2['code'][i], df1['code'])
For each NaN
from df1['description']
, the code is changed to 'YY'
and this is not good. How can I avoid this?
Thank you.
question from:
https://stackoverflow.com/questions/66053183/chain-upper-with-contains-on-a-series 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…