Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
511 views
in Technique[技术] by (71.8m points)

python - Scrapy multiple requests and fill single item

I need to make 2 request to different urls and put that information to the same item. I have tried this method, but the result is written in different rows. The callbacks returns item. I have tried many methods but none seems to work.

def parse_companies(self, response):
    data = json.loads(response.body)
    if data:
        item = ThalamusItem()
        for company in data:
            comp_id = company["id"]

            url = self.request_details_URL + str(comp_id) + ".json"
            request = Request(url, callback=self.parse_company_details)
            request.meta['item'] = item
            yield request

            url2 = self.request_contacts + str(comp_id)
            yield Request(url2, callback=self.parse_company_contacts, meta={'item': item})
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Since scrapy is asynchronious you need to chain your requests manually. For transfering data between requests you can use Request's meta attribute:

def parse(self, response):
    item = dict()
    item['name'] = 'foobar'
    yield request('http://someurl.com', self.parse2,
                  meta={'item': item})

def parse2(self, response):
    print(response.meta['item'])
    # {'name': 'foobar'}

In your case you end up with a split chain when you should have one continuous chain.
Your code should look something like this:

def parse_companies(self, response):
    data = json.loads(response.body)
    if not data:
        return
    for company in data:
        item = ThalamusItem()
        comp_id = company["id"]
        url = self.request_details_URL + str(comp_id) + ".json"
        url2 = self.request_contacts + str(comp_id)
        request = Request(url, callback=self.parse_details,
                          meta={'url2': url2, 'item': item})
        yield request

def parse_details(self, response):
    item = response.meta['item']
    url2 = response.meta['url2']
    item['details'] = ''  # add details
    yield Request(url2, callback=self.parse_contacts, meta={'item': item})

def parse_contacts(self, response):
    item = response.meta['item']
    item['contacts'] = ''  # add details
    yield item

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...