python - Searching for words using str.contains and regex in dataframe is slow, is there a better way?

Question

Welcome To Ask or Share your Answers For Others

python - Searching for words using str.contains and regex in dataframe is slow, is there a better way?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Searching for words using str.contains and regex in dataframe is slow, is there a better way?

I have a database with over 2 million rows. I'm trying to find rows that contain both of two words using regex like:

df1 = df[df['my_column'].str.contains(r'(?=.*first_word)(?=.*second_word)')]

However, when trying to process this in jupyter notebook, it either takes over a minute to return these rows or it crashes the kernal and I have to try again.

Is there a more efficient way for me to return rows in a dataframe that contain both words?

question from:https://stackoverflow.com/questions/65920662/searching-for-words-using-str-contains-and-regex-in-dataframe-is-slow-is-there

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:08:06+0000

Use

df['my_column'].apply(lambda x: all(l in x for l in ['first_word', 'second_word']) )

It will make sure the words from the list are all present in the my_column column without an awkward regex.

Categories

python - Searching for words using str.contains and regex in dataframe is slow, is there a better way?

python - Searching for words using str.contains and regex in dataframe is slow, is there a better way?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags