Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
412 views
in Technique[技术] by (71.8m points)

python - Matching keywords (strings) with a Pandas Dataframe

I have a Dataframe that I want to match against some keywords (I want to detect the rows that contain those keywords) I managed to get the job this way. But I wonder if there's a better way to do it knowing that I might have up to 10 or 20 different keywords.

df1 = df[df['column1'].str.contains("keyword1") | df['column1'].str.contains('keyword2')]

(I'm a beginner please keep it as simple as possible)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

For or logic you can create a single pattern by joining the words with |. Store your 10-20 words in a list then '|'.join(that_list).

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['foo', 'bar', 'baz', 'foobar', 'boo']})
words = ['foo', 'bar']

df['foo_OR_bar'] = df['col1'].str.contains('|'.join(words))

#     col1  foo_OR_bar
#0     foo        True
#1     bar        True
#2     baz       False
#3  foobar        True
#4     boo       False

#To slice by that Boolean Series
df1 = df.loc[df['col1'].str.contains('|'.join(words))]

If your joining logic is and then we can use np.logical_and.reduce with a list comprehension to keep things compact.

df['foo_AND_bar'] = np.logical_and.reduce([df.col1.str.contains(w) for w in words])

#     col1  foo_OR_bar  foo_AND_bar
#0     foo        True        False
#1     bar        True        False
#2     baz       False        False
#3  foobar        True         True
#4     boo       False        False

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...