c++ - Parallelizing a Breadth-First Search

Question

Welcome To Ask or Share your Answers For Others

c++ - Parallelizing a Breadth-First Search

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Parallelizing a Breadth-First Search

I just taught myself some OpenMP and this could be stupid. Basically I'm trying to parallelize a breadth first search program in c++, with each node taking a long time to process. Here's an example code:

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  for (int i = 0; i < qSize; i++) {
    node* currNode = q.front();
    q.pop();
    doStuff(currNode);
    q.push(currNode);
  }
}

The processing function doStuff() is quite expensive and I want to parallelize it. However if I parallelize the for loop by putting #pragma omp parallel for right before the for line, all kinds of weird error pop up at runtime. I'm guessing the reason is that this way q.front() and q.push() will also get parallelized, and multiple threads will possibly get the same node through q.front() (because they all got processed before any q.push has been processed).

How can I get around this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:23:23+0000

The solution is to protect access to the queue with a critical section.

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  #pragma omp parallel for
  for (int i = 0; i < qSize; i++) {
    node* currNode;
    #pragma omp critical
    {
      currNode = q.front();
      q.pop();
    }
    doStuff(currNode);
    #pragma omp critical
    q.push(currNode);
  }
}

This is similar to having a common mutex and locking it.

There are some limits in efficiency with this version: At the end of the for loop, some threads may idle, despite work being in the queue. Making a version where threads continuously work whenever there is something in the queue is a bit tricky in terms of handling the situations where the queue is empty but some threads are still computing.

Depending of the data size involved in a node, you may also have significant performance impact of cache-effects and false sharing. But that can't be discussed with a specific example. The simple version will probably be sufficiently efficient in many cases, but getting optimal performance may become arbitrarily complex.

In any case, you have to ensure that doStuff does not do modify any global or shared state.

Categories

c++ - Parallelizing a Breadth-First Search

c++ - Parallelizing a Breadth-First Search

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags