Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
202 views
in Technique[技术] by (71.8m points)

python 3.x - Page with anti-scraping protection in the code?

I am trying to extract information from a web page, when dealing with Xpath helper (chrome extension) it shows the content perfectly, but when taking it to scrapy it returns "None", or "empty": Web: https://cutt.ly/bjj3ohW The number --NN are the forms it tested.

Price

pycharm

I have tried with Xpath (//*[@id="da_price"],//*[@id="da_price"]/text()), .get(''), .extract(), .get('').strip(), Css #da_price,#da_price::text, Also i used beautifulsoup and scrapy_splas hand returns the result none or empty. I still don't want to try to use selenium because the number of links is quite large.

question from:https://stackoverflow.com/questions/65623157/page-with-anti-scraping-protection-in-the-code

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The element you're targeting might be dynamically rendered. I tried this and got it to work, I'm targeting the price lower down on the page instead.

import scrapy

class TestSpider(scrapy.Spider):
    name = 'testspider'

    def start_requests(self):
        return [scrapy.Request(
            url='https://cutt.ly/bjj3ohW',
        )]

    def parse(self, response):
        price = response.css('.price-final > strong::text').get()
        print(price)

A good way to test if it's dynamically rendered is to open inspect panel in Chrome (F12) and look under the Network tab. Reload the page and look and the first response which should be a .html file. Click on that file and then Response. There you can see the html code you can parse in Scrapy. Click ctrl+F and search for the CSS selector you're trying to parse.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...