Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
457 views
in Technique[技术] by (71.8m points)

Python parallel execution with selenium

I'm confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.

  1. There's a python module called python-wd-parallel which seems to have some functionality to do this, but it's from 2013, is this still useful now? I also found this example.

  2. There's concurrent.futures, this seems a lot newer, but not so easy to implement. Anyone have a working example with parallel execution in selenium?

  3. There's also using just threads and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.

What is the latest way to do parallel execution using selenium?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use joblib's Parallel module to do that, its a great library for parallel execution.

Lets say we have a list of urls named urls and we want to take a screenshot of each one in parallel

First lets import the necessary libraries

from selenium import webdriver
from joblib import Parallel, delayed

Now lets define a function that takes a screenshot as base64

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    phantom.get(url)
    screenshot = phantom.get_screenshot_as_base64()
    phantom.close()

    return screenshot

Now to execute that in parallel what you would do is

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

When this line will finish executing, you will have in screenshots all of the data from all of the processes that ran.

Explanation about Parallel

  • Parallel(n_jobs=-1) means use all of the resources you can
  • delayed(function)(input) is joblib's way of creating the input for the function you are trying to run on parallel

More information can be found on the joblib docs


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...