Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
332 views
in Technique[技术] by (71.8m points)

Can't open Unicode URL with Python

Using Python 2.5.2 and Linux Debian, I'm trying to get the content from a Spanish URL that contains a Spanish char 'í':

import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url).read()

I'm getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'xe1' in position 8: ordinal not in range(128)

I've tried using before passing the url to urllib this:

url = urllib.quote(url)

and this:

url = url.encode('UTF-8')

but they didn't work.

Can you tell me what I am doing wrong ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This works for me:

#!/usr/bin/env python
# define source file encoding, see: http://www.python.org/dev/peps/pep-0263/
# -*- coding: utf-8 -*-

import urllib
url = u'http://example.com/índice.html'
content = urllib.urlopen(url.encode("UTF-8")).read()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...