pandas - Chain upper() with contains() on a series

Question

Welcome To Ask or Share your Answers For Others

pandas - Chain upper() with contains() on a series

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Chain upper() with contains() on a series

I have 2 dataframes: df1 and df2

import pandas as pd 
import numpy as np 
  
num = {'description': ['apple','Red APPLE is good',np.nan,'peach','melon',np.nan,'green apple'],
       'code': ['AA','AA','BB','BB','CC','BB','BB']} 

df1 = pd.DataFrame(num) 

num2 = {'word': ['apple','peach'],
       'code': ['ZZ','YY']} 

df2 = pd.DataFrame(num2)

If the df1['description] contains word 'apple', then the corresponding code should be 'ZZ'. If it contains 'peach', then the corresponding code should be 'YY'

Before comparing, I should convert first value to uppercase. How can I do that? I tried to chain.str.upper().contains(...) but I got an error

for i in range(df2.shape[0]):
    df1['code'] = np.where(df1['description'].str.contains(df2['word'][i].upper()), df2['code'][i], df1['code'])

For each NaN from df1['description'], the code is changed to 'YY' and this is not good. How can I avoid this?

Thank you.

question from:https://stackoverflow.com/questions/66053183/chain-upper-with-contains-on-a-series

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:10:13+0000

You can just pass case=False:

df1.description.str.contains('apple', case=False)

Output:

0     True
1     True
2      NaN
3    False
4    False
5      NaN
6     True
Name: description, dtype: object

And to get the code:

pattern = '|'.join(df2.word)

df1['new_code'] = (df1.description.str.lower()
                      .str.extract(rf'({pattern})', expand=False)
                      .map(df2.set_index('word')['code'])
                  )

Output:

         description code new_code
0              apple   AA       ZZ
1  Red APPLE is good   AA       ZZ
2                NaN   BB      NaN
3              peach   BB       YY
4              melon   CC      NaN
5                NaN   BB      NaN
6        green apple   BB       ZZ

Categories

pandas - Chain upper() with contains() on a series

pandas - Chain upper() with contains() on a series

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags