The current code can be fixed if you first sort the lof_terms
by length in the descending order:
rx = r"(?=({}))".format("|".join(map(re.escape, sorted(lof_terms, key=len, reverse=True))))
pattern = re.compile(rx)
Note that in this case,
word boundaries are only used once on either end of the grouping, no need to repeat them around each alternative. See this regex demo.
See the Python demo:
import re
lof_terms = ['car', 'car manufacturer', 'popular']
str_content = 'This is a very popular car manufacturer.'
rx = r"(?=({}))".format("|".join(map(re.escape, sorted(lof_terms, key=len, reverse=True))))
pattern = re.compile(rx)
found_terms = re.findall(pattern, str_content)
print(found_terms)
# => ['popular', 'car manufacturer']
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…