You want to replace all the code between the opening and closing tag:
(您要替换开始和结束标记之间的所有代码:)
import re
input = [
"Independence Day (Observed)",
"Another Christmas Eve, Christmas Day (Observed)",
"New Year's Eve <img>scr=[...]</img>, New Year's Day (Observed)"
"Martin Luther King, Jr. <img>scr=[...]</img> Day"
]
for holiday in input:
print(re.sub(r'<img>.*?</img>', '[image placeholder]', holiday))
Output:
(输出:)
Independence Day (Observed)
Another Christmas Eve, Christmas Day (Observed)
New Year's Eve [image placeholder], New Year's Day (Observed)
Martin Luther King, Jr. [image placeholder] Day
The regex matches text starting with <img>
and ending with </img>
with anything in between .*?
(正则表达式将匹配以<img>
开头和</img>
开头的文本,中间以.*?
之间的任何内容.*?
)
. (。) ?
makes it match non greedily, so that it will match until the first closing tag that follows the opening tag - that allows it to replace every pair of tags correctly if there are several in the same input text.
(使其非贪婪地匹配,以便匹配,直到在开始标记之后的第一个结束标记为止-如果在同一输入文本中有多个标记,则它可以正确替换每对标记。)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…