Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
758 views
in Technique[技术] by (71.8m points)

python - Struggling to scrape a table using selenium

So I am looking forward to make the scraping of the table that appears in this link.

In order scrape I decided to use selenium.

In my first try what I did was:

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
html_source = self.driver.page_source
self.driver.quit()
BeautifulSoup(html_source, "html5lib")
table = soup.find('table', {'class': 'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'})
df = pd.read_html(str(table), flavor='html5lib', header=0, thousands='.', decimal=',')

However it output the error

'no tables found'

Then I tried to make use of expected_conditions class because as I looked up in SO maybe the "Page Source was pulled out even before the child elements have completely rendered"

Therefore I tried something like this:

driver.get(route)
element_present = expected_conditions.presence_of_element_located(
    (By.CLASS_NAME, 'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'))
WebDriverWait(driver, 20).until(element_present)
html_source = driver.page_source 
driver.quit()

However this time it outputs :

selenium.common.exceptions.TimeoutException: Message

Therefore my questions are: How could I obtain the desired output? What am I doing wrong with the use of the expected_conditions class? What is the issue/front-end-technology behind that makes it such a struggle to scrape the table?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Compound class names aren't handled by CLASSNAME selector but you can get it by css selector or xpath. CSS_SELECTOR is more efficient than XPATH

element_present = expected_conditions.presence_of_element_located(
            (By.CSS_SELECTOR, "table[class='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table']"))
    #or by xpath
element_present = expected_conditions.presence_of_element_located(
            (By.XPATH, "//table[@class='heavy-table ncpulse-fav-table ncpulse-sortable compressed-table']"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...