Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
334 views
in Technique[技术] by (71.8m points)

web scraping - Noticing a warning to limit scraped results with BeautifulSoup in Python

I am trying to scrape sales data from eBay with BeautifulSoup in Python for recently sold items and it works very well with the following code which finds all prices and all dates from sold items.

 price = []
   
    try:
        p = soup.find_all('span', class_='POSITIVE')

    except:
        p = 'nan'
          
    for x in p:
        x = str(x)
        x = x.replace(' ','"')
        x = x.split('"')
        
        if '>Sold' in x:
            continue
        else:
            price.append(x)

Now I am running into a problem though. As seen in the picture below for this URL (https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=babe+ruth+1933+goudey+149+psa+%281.5%29&_sacat=0&LH_TitleDesc=0&_osacat=0&_odkw=babe+ruth+1933+goudey+149+psa+1.5&LH_Complete=1&rt=nc&LH_Sold=1), eBay sometimes suggests other search results if there are not enough for specific search queries. Check out the image

By that, my code not only finds the correct prices but also those of the suggested results below the warning. I was trying to find out where the warning message is located and delete every listing that is being found afterward, but I cannot figure it out. I also thought that I can search for the prices one by one but even then I cannot figure out how to notice when the warning appears.

Is there any other way you guys can think of to solve this?

I am aware that this is really specific

question from:https://stackoverflow.com/questions/65946105/noticing-a-warning-to-limit-scraped-results-with-beautifulsoup-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can scrape the number of results (Shown in picture) and make a loop with the range of the results.

screenshot

The code will be something like:

results = soup.find...
#You have to make the variable a int so replace everything extra
results = int(results)

  
for i in range(1, results):
        price[i] = str(price[i])
        price[i] = price[i].replace(' ','"')
        price[i] = price[i].split()
        
        if '>Sold' in price[i]:
            continue
        else:
      

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...