1) Why did you implement time.sleep in several places in your 2nd method?
In __main__
, time.sleep(0.1)
, to give the started process
a timeslice to startup.
In f2(fq, q)
, to give the queue
a timeslice to flushed all buffered data to the pipe and
as q.get_nowait()
are used.
In w(q)
, are only for testing simulating long run of writer.to_excel(...)
,
i removed this one.
2) What is the difference between pool.map and pool = [mp.Process( . )]?
Using pool.map
needs no Queue
, no parameter passed, shorter code.
The worker_process
have to return immediately the result
and terminates.
pool.map
starts a new process as long as all iteration
are done.
The results
have to be processed after that.
Using pool = [mp.Process( . )]
, starts n processes
.
A process
terminates on queue.Empty
Can you think of a situation where you would prefer one method over the other?
Methode 1: Quick setup, serialized, only interested in the result to continue.
Methode 2: If you want to do all workload parallel.
You could't use global writer
in processes.
The writer
instance has to belong to one process
.
Usage of mp.Pool
, for instance:
def f1(k):
# *** DO SOME STUFF HERE***
results = pd.DataFrame(df_)
return results
if __name__ == '__main__':
pool = mp.Pool()
results = pool.map(f1, range(len(list_of_days)))
writer = pd.ExcelWriter('../test/myfile.xlsx', engine='xlsxwriter')
for k, result in enumerate(results):
result.to_excel(writer, sheet_name=list_of_days[k])
writer.save()
pool.close()
This leads to .to_excel(...)
are called in sequence in the __main__
process.
If you want parallel .to_excel(...)
you have to use mp.Queue()
.
For instance:
The worker
process:
# mp.Queue exeptions have to load from
try:
# Python3
import queue
except:
# Python 2
import Queue as queue
def f2(fq, q):
while True:
try:
k = fq.get_nowait()
except queue.Empty:
exit(0)
# *** DO SOME STUFF HERE***
results = pd.DataFrame(df_)
q.put( (list_of_days[k], results) )
time.sleep(0.1)
The writer
process:
def w(q):
writer = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')
while True:
try:
titel, result = q.get()
except ValueError:
writer.save()
exit(0)
result.to_excel(writer, sheet_name=titel)
The __main__
process:
if __name__ == '__main__':
w_q = mp.Queue()
w_p = mp.Process(target=w, args=(w_q,))
w_p.start()
time.sleep(0.1)
f_q = mp.Queue()
for i in range(len(list_of_days)):
f_q.put(i)
pool = [mp.Process(target=f2, args=(f_q, w_q,)) for p in range(os.cpu_count())]
for p in pool:
p.start()
time.sleep(0.1)
for p in pool:
p.join()
w_q.put('STOP')
w_p.join()
Tested with Python:3.4.2 - pandas:0.19.2 - xlsxwriter:0.9.6