Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
98 views
in Technique[技术] by (71.8m points)

c++ - Multithreading nested foor loop with std::thread

I am quite new to c++ and I would really need some advice on multithreading using std::thread. i have the following piece of code, which basically separates a for loop of N = 8^L iterations (up to 8^14) using thread:

void Lanczos::Hamil_vector_multiply(vec& initial_vec, vec& result_vec) {
result_vec.zeros();
        std::vector<arma::vec> result_threaded(num_of_threads);
        std::vector<std::thread> threads;
        threads.reserve(num_of_threads);
        for (int t = 0; t < num_of_threads; t++) {
            u64 start = t * N / num_of_threads;
            u64 stop = ((t + 1) == num_of_threads ? N : N * (t + 1) / num_of_threads);
            result_threaded[t] = arma::vec(stop - start, fill::zeros);
            threads.emplace_back(&Lanczos::Hamil_vector_multiply_kernel, this, start, stop, ref(initial_vec), ref(result_vec));
        }for (auto& t : threads) t.join();
}

where Lanczos is my general class (actually it is not necessary to know what it contains), while the member function Hamil_vector_multiply_kernel is of the form:

void Lanczos::Hamil_vector_multiply_kernel(u64 start, u64 stop, vec& initial_vec, vec& result_vec_threaded){
       // some declarations
    for (u64 k = start; k < stop; k++) {
        // some prealiminary work
        for (int j = 0; j <= L - 1; j++) {
             // a bunch of if-else statements, where result_vec_threaded(k) += something
        }
    }
}

(the code is quite long, so i didn't paste the whole whing here). My problem is that i call the function Hamil_vector_multiply 100-150 times in another function, so i create each time a new vector of threads, which then destroys itself.My questions:

  1. Is it better to create threads in the function which calls Hamil_vector_multiply and then pass a vector of threads to Hamil_vector_multiply in order to avoid creating each time new threads?

  2. Would it be better to asynchronously attack the loop (for instance the first thread to finish an iterations starts the next available? If yes can you point to any literature describing threads asynchronously?

3)Are there maybe better ways of multithreading such a loop? (without multithreading i have a loop from k=0 to k=N=8^14, which takes up a lot of time)

  1. I found several attempts to create a threadpool and job queue, would it be useful to use for instance some workpool like this: https://codereview.stackexchange.com/questions/221617/thread-pool-c-implementation

My code works as it is supposed to (gives the correct result), it boosts up the speed of the programm soemthing like 10 times with 16 cores. But if you have other helpful comments not regarding multithreading I woul be grateful for every piece of advice

Thank you very much in advance!

PS: The function which calls Hamil_vector_multiply 100-150 times is of the form:

void Lanczos::Build_Lanczos_Hamil(vec& initial_vec) {
   vec tmp(N);
   Hamil_vector_multiply(initial_vec, tmp);
   // some calculations
   for(int j=0; j<100; j++{
      // somtheing
      vec tmp2 = ...
      Hamil_vector_multiply(tmp2, tmp);
     // do somthing else  -- not related 
   }
}
question from:https://stackoverflow.com/questions/65889893/multithreading-nested-foor-loop-with-stdthread

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Is it better to create threads in the function which calls Hamil_vector_multiply and then pass a vector of threads to Hamil_vector_multiply in order to avoid creating each time new threads?

If your worried about performance, yes it would help. What your doing right now is essentially allocating a new heap block in every function call (I'm talking about the vector). If you can do it beforehand, it'll give you some performance. There isn't an issue doing this but you could gain some performance.

Would it be better to asynchronously attack the loop (for instance the first thread to finish an iterations starts the next available? If yes can you point to any literature describing threads asynchronously?

This might not be a good idea. You will have to lock resources using mutexes when sharing the same data between multiple threads. This means that you'll get the same amount of performance as processing using one thread because the other thread(s) will have to wait till the resource is unlocked and ready to be used.

Are there maybe better ways of multithreading such a loop? (without multithreading i have a loop from k=0 to k=N=8^14, which takes up a lot of time)

If your goal is to improve performance, if you can put it into multiple threads, and most importantly if multithreading will help, then there isn't a reason to not doing it. From what I can see, your implementation looks pretty neat. But keep in mind, starting a thread itself is a little costly (negligible when compared to your performance gain), and load balancing will definitely improve performance even further.

But if you have other helpful comments not regarding multithreading I woul be grateful for every piece of advice

If your load per thread might vary, it'll be a good investment to think about load balancing. Other than that, I don't see an issue. The major places to improve would be your logic itself. Threads can do so much if your logic takes a hell of a lot time..

Optional:
You can use std::future to implement the same with the added bonus of it starting the thread asynchronously upon destruction, meaning when your thread pool destroys (when the vector goes out of scope), it'll start the threads. But then it might interfere with your first question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...