I've read some other questions on this topic.
However, they didn't solve my problem anyway.
I wrote the code as following and I got pthread
version and omp
version both slower than the serial version. I'm very confused.
Compiled under environment:
Ubuntu 12.04 64bit 3.2.0-60-generic
g++ (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Vendor ID: AuthenticAMD
CPU family: 18
Model: 1
Stepping: 0
CPU MHz: 800.000
BogoMIPS: 3593.36
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
NUMA node0 CPU(s): 0,1
Compile command:
g++ -std=c++11 ./eg001.cpp -fopenmp
#include <cmath>
#include <cstdio>
#include <ctime>
#include <omp.h>
#include <pthread.h>
#define NUM_THREADS 5
const int sizen = 256000000;
struct Data {
double * pSinTable;
long tid;
};
void * compute(void * p) {
Data * pDt = (Data *)p;
const int start = sizen * pDt->tid/NUM_THREADS;
const int end = sizen * (pDt->tid + 1)/NUM_THREADS;
for(int n = start; n < end; ++n) {
pDt->pSinTable[n] = std::sin(2 * M_PI * n / sizen);
}
pthread_exit(nullptr);
}
int main()
{
double * sinTable = new double[sizen];
pthread_t threads[NUM_THREADS];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
clock_t start, finish;
start = clock();
int rc;
Data dt[NUM_THREADS];
for(int i = 0; i < NUM_THREADS; ++i) {
dt[i].pSinTable = sinTable;
dt[i].tid = i;
rc = pthread_create(&threads[i], &attr, compute, &dt[i]);
}//for
pthread_attr_destroy(&attr);
for(int i = 0; i < NUM_THREADS; ++i) {
rc = pthread_join(threads[i], nullptr);
}//for
finish = clock();
printf("from pthread: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
sinTable = new double[sizen];
start = clock();
# pragma omp parallel for
for(int n = 0; n < sizen; ++n)
sinTable[n] = std::sin(2 * M_PI * n / sizen);
finish = clock();
printf("from omp: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
sinTable = new double[sizen];
start = clock();
for(int n = 0; n < sizen; ++n)
sinTable[n] = std::sin(2 * M_PI * n / sizen);
finish = clock();
printf("from serial: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
pthread_exit(nullptr);
return 0;
}
Output:
from pthread: 21.150000
from omp: 20.940000
from serial: 20.800000
I wonder whether it was my code's problem so I used pthread to do the same thing.
However, I'm totally wrong, and I wonder whether it might be Ubuntu's problem on OpenMP/pthread.
I have a friend who has AMD CPU and Ubuntu 12.04 as well, and got the same problem there, so I might have some reason to believe that the problem is not limited to only me.
If anyone has the same problem as me, or has some clue on the problem, thanks in advance.
If the code is not good enough, I ran a benchmark and I pasted the result here:
http://pastebin.com/RquLPREc
The benchmark url: http://www.cs.kent.edu/~farrell/mc08/lectures/progs/openmp/microBenchmarks/src/download.html
New infomation:
I ran the code on windows (without pthread version) with VS2012.
I used 1/10 of sizen because windows does not allow me to allocate that great trunk of memory where the results are:
from omp: 1.004
from serial: 1.420
from FreeNickName: 735 (this one is the suggestion improvement by @FreeNickName)
Does this indicate that it could be a problem of Ubuntu OS
??
Problem is solved by using omp_get_wtime
function that is portable among Operating Systems. See the answer by Hristo Iliev
.
Some tests about the controversial topic by FreeNickName
.
(Sorry I need to test it on Ubuntu cause the windows was one of my friends'.)
--1-- Change from delete
to delete []
: (but without memset)(-std=c++11 -fopenmp)
from pthread: 13.491405
from omp: 13.023099
from serial: 20.665132
from FreeNickName: 12.022501
--2-- With memset immediately after new: (-std=c++11 -fopenmp)
from pthread: 13.996505
from omp: 13.192444
from serial: 19.882127
from FreeNickName: 12.541723
--3-- With memset immediately after new: (-std=c++11 -fopenmp -march=native -O2)
from pthread: 11.886978
from omp: 11.351801
from serial: 17.002865
from FreeNickName: 11.198779
--4-- With memset immediately after new, and put FreeNickName's version before OMP for version: (-std=c++11 -fopenmp -march=native -O2)
from pthread: 11.831127
from FreeNickName: 11.571595
from omp: 11.932814
from serial: 16.976979
--5-- With memset immediately after new, and put FreeNickName's version before OMP for version, and set NUM_THREADS
to 5 instead of 2 (I'm dual core).
from pthread: 9.451775
from FreeNickName: 9.385366
from omp: 11.854656
from serial: 16.960101
See Question&Answers more detail:
os