Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
371 views
in Technique[技术] by (71.8m points)

How to get innerHTML of whole page in selenium driver?

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.

Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks

The sample code in Python (Based on the post above, the language seems to not matter too much):

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup


url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)

the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
question from:https://stackoverflow.com/questions/35905517/how-to-get-innerhtml-of-whole-page-in-selenium-driver

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

To get the HTML for the whole page:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")

html = driver.page_source

To get the outer HTML (tag included):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')

To get the inner HTML (tag excluded):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...