When I execute your code with MSVC2015 on a i7, I observe:
- in debug mode, multithread is 14s compared to 26s in monothread. So it's almost twice as fast. The results are as expected.
- in release mode, multithread is 0.3 compared to 0.2 in monothread, so it's slower, as you've reported.
This suggest that your issue is related to the fact that the optimized fill()
is too short compared to the overhead of creating a thread.
Note also that even when there is enought work to do in fill()
(e.g. the unoptimized version), the multithread will not multiply the time by two. Multithreading will increase overall throughput per second on a multicore processor, but each thread taken separately might run a little bit slower than usual.
Edit: additional information
The multithreading performance depends on a lot of factors, among others, for example the number of cores on your processor, the cores used by other processes running during the test, and as remarked by doug in his comment, the profile of the multithreaded task (i.e. memory vs. computing).
To illustrate this, here the results of an informal benchmark that shows that decrease of individual thread throughput is much faster for memory intensive than for floating point intensive computations, and global throughput grows much slower (if at all):
Using the following functions for each thread :
// computation intensive
void mytask(unsigned long long loops)
{
volatile double x;
for (unsigned long long i = 0; i < loops; i++) {
x = sin(sqrt(i) / i*3.14159);
}
}
//memory intensive
void mytask2(vector<unsigned long long>& v, unsigned long long loops)
{
for (unsigned long long i = 0; i < loops; i++) {
v.push_back(i*3+10);
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…