OpenMP自定义缩减变量

我被指派实施减少变量的想法,而不使用减少条款。 我设置了这个基本代码来测试它。

int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; for (int i = 0; i < n; ++i) { val += 1; } sum += val; 

所以最后sum == n

每个线程都应该将val设置为私有变量,然后对sum的加法应该是线程收敛的关键部分,例如

 int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma omp parallel for private(i, val) shared(n) num_threads(nthreads) for (int i = 0; i < n; ++i) { val += 1; } #pragma omp critical { sum += val; } 

我无法弄清楚如何为临界区维护val的私有实例。 我尝试用一​​个更大的pragma包围整个事物,例如

 int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma omp parallel private(val) shared(sum) { #pragma omp parallel for private(i) shared(n) num_threads(nthreads) for (int i = 0; i < n; ++i) { val += 1; } #pragma omp critical { sum += val; } } 

但我没有得到正确的答案。 我应该如何设置pragma和子句来执行此操作?

你的程序有很多缺陷。 让我们看看每个程序(缺陷是作为注释写的)。

计划一

 int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma omp parallel for private(i, val) shared(n) num_threads(nthreads) for (int i = 0; i < n; ++i) { val += 1; } // At end of this, all the openmp threads die. // The reason is the "pragma omp parallel" creates threads, // and the scope of those threads were till the end of that for loop. So, the thread dies // So, there is only one thread (ie the main thread) that will enter the critical section #pragma omp critical { sum += val; } 

计划二

 int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma omp parallel private(val) shared(sum) // pragma omp parallel creates the threads { #pragma omp parallel for private(i) shared(n) num_threads(nthreads) // There is no need to create another set of threads // Note that "pragma omp parallel" always creates threads. // Now you have created nested threads which is wrong for (int i = 0; i < n; ++i) { val += 1; } #pragma omp critical { sum += val; } } 

最好的解决方案是

 int n = 100000000; double sum = 0.0; int nThreads = 5; #pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here. // Also declare the maximum number of threads. // Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads... // num_threads actually limits the number of threads that can be created { double val = 0.0; // val can be declared as local variable (for each thread) #pragma omp for nowait // now pragma for (here you don't need to create threads, that's why no "omp parallel" ) // nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section for (int i = 0; i < n; ++i) { val += 1; } #pragma omp critical { sum += val; } } 

您不需要在OpenMP中显式指定共享变量,因为默认情况下始终共享外部作用域中的变量(除非指定了default(none)子句)。 由于private变量具有未定义的初始值,因此应在累积循环之前将私有副本归零。 循环计数器被自动识别并变为私有 – 无需明确声明它们。 此外,由于您只是更新一个值,因此您应该使用atomic构造,因为它比完整的临界区更轻量级。

 int i = 0; int n = 100000000; double sum = 0.0; double val = 0.0; #pragma omp parallel private(val) { val = 0.0; #pragma omp for num_threads(nthreads) for (int i = 0; i < n; ++i) { val += 1; } #pragma omp atomic update sum += val; } 

update子句被添加到OpenMP 3.1中的atomic构造中,因此如果您的编译器符合早期的OpenMP版本(例如,如果您使用仅支持OpenMP 2.0的MSVC ++,即使在VS2012中),您也必须删除update子句。 由于val不在并行循环之外使用,因此可以在内部作用域中声明,如在veda的答案中那样,然后它自动变为私有变量。

请注意, parallel for是嵌套两个OpenMP结构的快捷方式: parallelfor

 #pragma omp parallel for sharing_clauses scheduling_clauses for (...) { } 

相当于:

 #pragma omp parallel sharing_clauses #pragma omp for scheduling_clauses for (...) { } 

对于其他两个组合结构也是如此: parallel sectionsparallel workshare (仅限Fortran)