Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
727 views
in Technique[技术] by (71.8m points)

python - Selenium Firefox headless returns different results

When i scrape page that contains products with usage of headless option i get different results.
For the same question one time i get results that are not sorted, and another time with proper sorted order.

Selenium firefox browser:

firefox_options = Options()
firefox_options.headless = True
browser = webdriver.Firefox(options=firefox_options, executable_path=firefox_driver)

According to this post:
"firefox does not send different headers when using the headless option".

How to use headless option to get constant results from scraping?

Update:

Its turns out that ads popup window was hiding price sort menu. With setting constant windows size as posted by DebanjanB, problem was solved.

Thanks for any suggestions

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Ideally, using and not using firefox_options.headless = True shouldn't have any major effect on the elements within the DOM Tree getting rendered but may have a significant difference as far as the Viewport is concerned.

As an example, when GeckoDriver/Firefox is initialized along with the --headless option the default Viewport is width = 1366px, height = 768px where as when GeckoDriver/Firefox is initialized without the --headless option the default Viewport is width = 1382px, height = 744px.

  • Example Code:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.FirefoxOptions()
    options.headless = True
    driver = webdriver.Firefox(options=options, executable_path=r'C:UtilityBrowserDriversgeckodriver.exe')
    driver.get("https://www.google.com/")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
    print ("Headless Firefox Initialized")
    size = driver.get_window_size()
    print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
    driver.quit()
    driver = webdriver.Firefox(executable_path=r'C:UtilityBrowserDriversgeckodriver.exe')
    driver.get("https://www.google.com/")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
    print ("Firefox Initialized")
    size = driver.get_window_size()
    print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
    driver.quit()
    
  • Console Output:

    Headless Firefox Initialized
    Window size: width = 1366px, height = 768px
    Firefox Initialized
    Window size: width = 1382px, height = 744px
    

Conclusion

From the above observation it can be inferred that with --headless option GeckoDriver/Firefox opens the Browsing Context with reduced Viewport and hence the number of elements identified can be less.


Solution

While using GeckoDriver/Firefox to initiate a Browsing Context always open in maximized mode or configure through set_window_size() as follows:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.FirefoxOptions()
options.headless = True
#options.add_argument("start-maximized")
options.add_argument("window-size=1400,600")
driver = webdriver.Firefox(options=options, executable_path=r'C:UtilityBrowserDriversgeckodriver.exe')
driver.get("https://www.google.com/")
driver.set_window_size(1920, 1080)

tl; dr

You find a couple of relevant discussion on window size in:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...