I have a loop that has been parallelized by OpenMP, but due to the nature of the task, there are 4 critical
clauses.
What would be the best way to profile the speed up and find out which of the critical clauses (or maybe non-critical(!) ) take up the most time inside the loop?
I use Ubuntu 10.04 with g++ 4.4.3
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…