Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
581 views
in Technique[技术] by (71.8m points)

python - Max retries exceeded with URL in requests

I'm trying to get the content of App Store > Business:

import requests
from lxml import html

page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)

flist = []
plist = []
for i in range(0, 100):
    app = tree.xpath("//div[@class='column first']/ul/li/a/@href")
    ap = app[0]
    page1 = requests.get(ap)

When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:

Traceback (most recent call last):
  File "/home/preetham/Desktop/eg.py", line 17, in <module>
    page1 = requests.get(ap)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)

Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8

error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".

There is an issue at about python.requests lib at Github, check it out here

To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:

try:
    page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
    r.status_code = "Connection refused"

Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)

from time import sleep

All in all requests is awesome python lib, hope that solves your problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...