python 3.x - Extract title with BeautifulSoup

Question

Welcome To Ask or Share your Answers For Others

python 3.x - Extract title with BeautifulSoup

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python 3.x - Extract title with BeautifulSoup

I have this

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
raw = BeautifulSoup(html, 'html.parser').get_text()
raw.find_all('title', limit=1)
print (raw.find_all("title"))
'<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN'

I want to extract the title of the page using BeautifulSoup but getting this error

Traceback (most recent call last):
  File "C:UsersPassanovaAppDataLocalProgramsPythonPython35-32est.py", line 8, in <module>
    raw.find_all('title', limit=1)
AttributeError: 'str' object has no attribute 'find_all'

Please any suggestions

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:00:53+0000

To navigate the soup, you need a BeautifulSoup object, not a string. So remove your get_text() call to the soup.

Moreover, you can replace raw.find_all('title', limit=1) with find('title') which is equivalent.

Try this :

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('title')

print(title) # Prints the tag
print(title.string) # Prints the tag string content

Categories

python 3.x - Extract title with BeautifulSoup

python 3.x - Extract title with BeautifulSoup

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags