It depends on your workload
By default Dask creates a single process with as many threads as you have logical cores on your machine (as determined by multiprocessing.cpu_count()
).
dask-worker ... --nprocs 1 --nthreads 8 # assuming you have eight cores
dask-worker ... # this is actually the default setting
Using few processes and many threads per process is good if you are doing mostly numeric workloads, such as are common in Numpy, Pandas, and Scikit-Learn code, which is not affected by Python's Global Interpreter Lock (GIL).
However, if you are spending most of your compute time manipulating Pure Python objects like strings or dictionaries then you may want to avoid GIL issues by having more processes with fewer threads each
dask-worker ... --nprocs 8 --nthreads 1
Based on benchmarking you may find that a more balanced split is better
dask-worker ... --nprocs 4 --nthreads 2
Using more processes avoids GIL issues, but adds costs due to inter-process communication. You would want to avoid many processes if your computations require a lot of inter-worker communication..
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…