selenium webdriver - Filter for the Numbers Following a String on a Webpage With a Python Web Scraper

Question

Welcome To Ask or Share your Answers For Others

selenium webdriver - Filter for the Numbers Following a String on a Webpage With a Python Web Scraper

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

selenium webdriver - Filter for the Numbers Following a String on a Webpage With a Python Web Scraper

I'm trying to filter for the total number of followers between all stories on an account. I've managed to do all the basics and put everything needed in the 'header' variable, I just need to filter out everything except the number following "Follows:" but can't find how to do that. Any help is appreciated.

(Yes, I know I have a lot of imports I don't need, this project was kind of slapped together and I copied the imports from another Webscraping project I'm working on)

Edit: The end goal is to add all of the follows together

import bs4
import sys
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

driver = webdriver.Chrome(executable_path='C:/Users/Curious Beats/Downloads/chromedriver.exe')

driver.get("https://www.fictionpress.com/u/541077/Imperfect-Princess")


page_html = driver.page_source
page_soup = soup(page_html, "html.parser")

list_header = []
header = page_soup.find_all("div",{"class":"z-padtop2 xgray"})
for items in header:
        try:
            list_header.append(items.get_text())
        except:
            continue

driver.quit()

question from:https://stackoverflow.com/questions/65855828/filter-for-the-numbers-following-a-string-on-a-webpage-with-a-python-web-scraper

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:28:09+0000

What happens?

You try to find_all() the header info with multiple class names, what wont work this way:

find_all("div",{"class":"z-padtop2 xgray"})

How to fix that?

Use the css selector, think it is much easier to chain the classes this way:

soup.select('div.z-padtop2.xgray')

How to get the number of follows?

If you do not wanna use regex just split() the text and select the value by .index('Follows:')+1

follows = items.get_text().split()
list_header.append(follows[follows.index('Follows:')+1])

Example

from selenium import webdriver
from bs4 import BeautifulSoup as soup
from time import sleep

driver = webdriver.Chrome(executable_path='C:Program FilesChromeDriverchromedriver.exe')
url = "https://www.fictionpress.com/u/541077/Imperfect-Princess"

driver.get(url)
sleep(5)

soup = BeautifulSoup(driver.page_source, "lxml")


list_header = []
header = soup.select('div.z-padtop2.xgray')
for items in header:
        try:
            follows = items.get_text().split()
            list_header.append(follows[follows.index('Follows:')+1])
        except:
            list_header.append('no follows')
            continue

driver.quit()
list_header

Output

['1,174',
 '364',
 '1,965',
 '34',
 '61',
 '215',
 '18',
 '859',
 '320',
 '1,483',
 '224',
 '196',
 '68',
 '57',
 '12',
 '208',
 '31',
 '33',
 '25',
 '7',
 'no follows',
 '20',
 '26',
 '6',
 '31']

Categories

selenium webdriver - Filter for the Numbers Following a String on a Webpage With a Python Web Scraper

selenium webdriver - Filter for the Numbers Following a String on a Webpage With a Python Web Scraper

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

What happens?

How to fix that?

How to get the number of follows?

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags