Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
191 views
in Technique[技术] by (71.8m points)

python multiprocessing : provide one specific argument to each worker

I'm using a Pool of workers and want each of them to get initialized with a specific object. More precisely, the initialization cannot be parallelized, so that I plan to prepare the objects in the main process before to create the workers and give each worker one of these objects.

Here is my attempt :

import multiprocessing
import random
import time

class Foo:
    def __init__(self, param):
        # NO WAY TO PARALLELIZE THIS !!
        print(f"Creating Foo with {param}")
        self.param = param

    def __call__(self, x):
        time.sleep(1)
        print("Do the computation", self)
        return self.param + str(x)

def initializer():
    global myfoo

    param = random.choice(["a", "b", "c", "d", "e"])
    myfoo = Foo(param)

def compute(x):
    return myfoo(x)

multiple_results = []
with multiprocessing.Pool(2, initializer, ()) as pool:
    for i in range(1, 10):
        work = pool.apply_async(compute, (i,))
        multiple_results.append(work)

    print([res.get(timeout=2) for res in multiple_results])

Here is a possible output:

Creating Foo with b
Creating Foo with a
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
['b1', 'a2', 'b3', 'a4', 'b5', 'a6', 'b7', 'a8', 'b9']

What is puzzling me is that the address of the Foo object is always the same while the actual Foo object is different as can be seen by the output: "b1", "a2".

My problem is that the two calls to initializer are parallelized while I do not want to parallelize the construction of Foo.

I want some magical method add_worker to do something like this:

pool = multiprocessing.Pool()
for i in range(0,2):
    foo = Foo()
    poo.add_worker(initializer, (foo,))

Any ideas ?

EDIT: I solved my real live problem by making the import of kera's VGGNet inside the process instead of on top of the file. See this answer For the sake of curiosity, I remain interested in an answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As you said, you can see that instances of Foo are actually different for each worker, even if they apparently have the same address.

As explained in answers to this question and this question, you should not rely on addresses to distinguish between instances on different processes: different instances can show the same address.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...