Python regex matching multiple words from a list

Question

Welcome To Ask or Share your Answers For Others

Python regex matching multiple words from a list

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python regex matching multiple words from a list

I have a list of words and a string and would like to get back a list of words from the original list which are found in the string.

Ex:

import re

lof_terms = ['car', 'car manufacturer', 'popular']
str_content = 'This is a very popular car manufacturer.'

pattern = re.compile(r"(?=(" + r"|".join(map(re.escape, lof_terms)) + r"))")
found_terms = re.findall(pattern, str_content)

This will only return ['car', 'popular']. It fails to catch 'car manufacturer'. However it will catch it if I change the source list of terms to lof_terms = ['car manufacturer', 'popular']

Somehow the overlapping between 'car' and 'car manufacturer' seems to be source of this issue.

Any ideas how to get over this?

Many thanks

question from:https://stackoverflow.com/questions/65884809/how-to-match-repeating-words-in-python-regex

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:20:35+0000

The current code can be fixed if you first sort the lof_terms by length in the descending order:

rx = r"(?=({}))".format("|".join(map(re.escape, sorted(lof_terms, key=len, reverse=True))))
pattern = re.compile(rx)

Note that in this case, word boundaries are only used once on either end of the grouping, no need to repeat them around each alternative. See this regex demo.

See the Python demo:

import re

lof_terms = ['car', 'car manufacturer', 'popular']
str_content = 'This is a very popular car manufacturer.'

rx = r"(?=({}))".format("|".join(map(re.escape, sorted(lof_terms, key=len, reverse=True))))
pattern = re.compile(rx)
found_terms = re.findall(pattern, str_content)
print(found_terms)
# => ['popular', 'car manufacturer']

Categories

Python regex matching multiple words from a list

Python regex matching multiple words from a list

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags