Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

python - 'ascii' codec error in beautifulsoup

I am using beautifulsoup for scraping data from the html page. Till yesterday every thing was fine. But Now i am getting the error:

'ascii' codec can't encode character u'xa9' in position 86700: ordinal not in range(128)

i am using the code:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

This is giving me the error.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A wild guess:

Try specifying the encoding of the page?

soup = BeautifulSoup(page, fromEncoding=<encoding of the page>)

This can also be a problem with the Python installation. If you print non-ASCII characters without BeautifulSoup, do you face the same problem? If yes, then you need to set the encoding:

import sys
sys.setdefaultencoding("utf-8") # or whatever you want the default encoding to be.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...