python - Extract part of a regex match

Question

Welcome To Ask or Share your Answers For Others

python - Extract part of a regex match

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Extract part of a regex match

I want a regular expression to extract the title from a HTML page. Currently I have this:

title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
    title = title.replace('<title>', '').replace('</title>', '')

Is there a regular expression to extract just the contents of <title> so I don't have to remove the tags?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T21:23:30+0000

Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)

Categories

python - Extract part of a regex match

python - Extract part of a regex match

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags