Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

beautifulsoup - why is nothing getting parsed in my web scraping program?

I made this code to search all the top links in google search. But its returning none.

import webbrowser, requests
from bs4 import BeautifulSoup
string = 'selena+gomez'
website = f'http://google.com/search?q={string}'
req_web = requests.get(website).text
parser = BeautifulSoup(req_web, 'html.parser')
gotolink = parser.find('div', class_='r').a["href"]
print(gotolink)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Google needs that you specify User-Agent http header to return correct page. Without the correct User-Agent specified, Google returns page that doesn't contain <div> tags with r class. You can see it when you do print(soup) with and without User-Agent.

For example:

import requests
from bs4 import BeautifulSoup

string = 'selena+gomez'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
website = f'http://google.com/search?hl=en&q={string}'

req_web = requests.get(website, headers=headers).text
parser = BeautifulSoup(req_web, 'html.parser')
gotolink = parser.find('div', class_='r').a["href"]
print(gotolink)

Prints:

https://www.instagram.com/selenagomez/?hl=en

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...