Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

java - Choosing optimal number of Threads for parallel processing of data

Let's say I have a task with processing 1 million sentences.

For each sentence, I need to do something with it, and it makes no matter what particular order they are processed in.

In my Java program I have a set of futures partitioned from my main chunk of work with a callable that defines the unit of work to be done on a chunk of sentences, and I'm looking for a way to optimize the number of threads I allocate to work through the big block of sentences, and later recombine all the results of each thread.

What would be the maximum number of threads I could use that would give me optimal performance in terms of speed before I saw diminishing returns?

Also, what causes the logic that the more threads allocated, ie more being able to be done at once, to be incorrect?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In practice, it can be difficult to find the optimal number of threads and even that number will likely vary each time you run the program. So, theoretically, the optimal number of threads will be the number of cores you have on your machine. If your cores are "hyper threaded" (as Intel calls it) it can run 2 threads on each core. Then, in that case, the optimal number of threads is double the number of cores on your machine.

Also, what causes the logic that the more threads allocated, i.e. 
more being able to be done at once, to be incorrect?

The reason that as more threads are allocated leads to more work being done concurrently is false because only 1 (or 2 threads if the cores are "hyper threaded") can run at a single time on each core.

So assume I have a quad core machine that is not hyper threaded. In that case, i can run up to 4 threads concurrently. So, my maximum throughput should be achieved with 4 threads. Say if I try to run 8 threads on the same setup. In this case, the kernel would schedule these threads back and forth (by way of a context switch), and would block one thread in order to let another thread run. So, at most, the work of 4 threads can be run at a single time.

For more information on this, it would be extremely helpful to look up "context switch" with a Linux kernel. That will provide you with all the information you ever wanted on this subject.

Also, note that there is a difference between threads called "user level threads" and "kernel level threads". This is an important distinction if you research this topic further, but it is outside the scope of this question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...