各位好,
菜鸟这里想通过phantomjs + scrapy爬取网站,但发现随着爬取页面的增长,phantomjs 的内存使用量也一直增加直到内存耗尽,搜了一圈无果。现在简单想法就是每爬取一个网站就把phantomjs 给quit掉,比如直接这样放好像不行,
self.browser.get(response.url)
sel = self.browser.find_element_by_xpath("//pre").text
self..browser.quit()
直接报错,恳求帮忙下
Traceback (most recent call last):
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/app/Project/scrapy/new_stock/new_stock/spiders/newstock.py", line 86, in parse_items
self.browser.get(response.url)
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 250, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/pythonbrew/venvs/Python-2.7.10/flask/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/pythonbrew/pythons/Python-2.7.10/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…