Hi all,
I am trying to parallelise a for loop in C++ with openmp. In each loop a vector is populated with values from various calculations.
1 2 3 4 5 6 7 8 9
|
std::vector<double> data(max_len);
#pragma omp parallel for
for (size_t i=0; i < max_len; ++i) {
for (size_t j=0; j < max_len; ++j) {
// heavy calculation -> result
data[j] = result; //error!!
}
}
| |
Now the problem is that if I define the vector outside of loop, then it is shared between different threads, and but that does not work, as each thread needs all of the vector for itself.
I can define the vector inside the for loop, making it private:
1 2 3 4 5 6 7 8
|
#pragma omp parallel for
for (size_t i=0; i < max_len; ++i) {
std::vector<double> data(max_len);
for (size_t j=0; j < max_len; ++j) {
// heavy calculation -> result
data[j] = result; // works but slow
}
}
| |
But this means the vector is created and destroyed in every iteration of the first loop. Which is quite expensive.
What I want is that when each thread starts, it will get it's own copy of the vector<double> data, but that copy will stay with it until the thread ends. In this way, there is no creation and destruction, only assignment of the same vector.
I have heard about openmp threadprivate, but I am not sure if and how it would work in this case.
How do I make sure that each thread gets it's own vector, and it lasts through the lifetime of that thread? (Any other advice for optimization is also appreciated!!)