I am quite new to c++ and I would really need some advice on multithreading using std::thread.
i have the following piece of code, which basically separates a for loop of N = 8^L iterations (up to 8^14) using thread:
void Lanczos::Hamil_vector_multiply(vec& initial_vec, vec& result_vec) {
result_vec.zeros();
std::vector<arma::vec> result_threaded(num_of_threads);
std::vector<std::thread> threads;
threads.reserve(num_of_threads);
for (int t = 0; t < num_of_threads; t++) {
u64 start = t * N / num_of_threads;
u64 stop = ((t + 1) == num_of_threads ? N : N * (t + 1) / num_of_threads);
result_threaded[t] = arma::vec(stop - start, fill::zeros);
threads.emplace_back(&Lanczos::Hamil_vector_multiply_kernel, this, start, stop, ref(initial_vec), ref(result_vec));
}for (auto& t : threads) t.join();
}
where Lanczos is my general class (actually it is not necessary to know what it contains), while the member function Hamil_vector_multiply_kernel is of the form:
void Lanczos::Hamil_vector_multiply_kernel(u64 start, u64 stop, vec& initial_vec, vec& result_vec_threaded){
// some declarations
for (u64 k = start; k < stop; k++) {
// some prealiminary work
for (int j = 0; j <= L - 1; j++) {
// a bunch of if-else statements, where result_vec_threaded(k) += something
}
}
}
(the code is quite long, so i didn't paste the whole whing here). My problem is that i call the function Hamil_vector_multiply 100-150 times in another function, so i create each time a new vector of threads, which then destroys itself.My questions:
Is it better to create threads in the function which calls Hamil_vector_multiply and then pass a vector of threads to Hamil_vector_multiply in order to avoid creating each time new threads?
Would it be better to asynchronously attack the loop (for instance the first thread to finish an iterations starts the next available? If yes can you point to any literature describing threads asynchronously?
3)Are there maybe better ways of multithreading such a loop? (without multithreading i have a loop from k=0 to k=N=8^14, which takes up a lot of time)
- I found several attempts to create a threadpool and job queue, would it be useful to use for instance some workpool like this: https://codereview.stackexchange.com/questions/221617/thread-pool-c-implementation
My code works as it is supposed to (gives the correct result), it boosts up the speed of the programm soemthing like 10 times with 16 cores. But if you have other helpful comments not regarding multithreading I woul be grateful for every piece of advice
Thank you very much in advance!
PS: The function which calls Hamil_vector_multiply 100-150 times is of the form:
void Lanczos::Build_Lanczos_Hamil(vec& initial_vec) {
vec tmp(N);
Hamil_vector_multiply(initial_vec, tmp);
// some calculations
for(int j=0; j<100; j++{
// somtheing
vec tmp2 = ...
Hamil_vector_multiply(tmp2, tmp);
// do somthing else -- not related
}
}
question from:
https://stackoverflow.com/questions/65889893/multithreading-nested-foor-loop-with-stdthread