OpenMP中的嵌套循环

我需要运行一个短的外循环和一个长的内循环。我想将后者并行化，而不是前者。原因是在内部循环运行后有一个更新的数组。我正在使用的代码如下

#pragma omp parallel{ for(j=0;j<3;j++){ s=0; #pragma omp for reduction(+:s) for(i=0;i<10000;i++) s+=1; A[j]=s; } }

这实际上是挂起的。以下工作正常，但我宁愿避免开始一个新的并行区域的开销，因为这之前是另一个。

 for(j=0;j<3;j++){ s=0; #pragma omp parallel for reduction(+:s) for(i=0;i<10000;i++) s+=1; A[j]=s; }

这样做的正确（和最快）方式是什么？

以下示例应按预期工作：

 #include using namespace std; int main(){ int s; int A[3]; #pragma omp parallel { // Note that I moved the curly bracket for(int j = 0; j < 3; j++) { #pragma omp single s = 0; #pragma omp for reduction(+:s) for(int i=0;i<10000;i++) { s+=1; } // Implicit barrier here #pragma omp single A[j]=s; // This statement needs synchronization } // End of the outer for loop } // End of the parallel region for (int jj = 0; jj < 3; jj++) cout << A[jj] << endl; return 0; }

编译和执行的一个例子是：

 > g++ --version g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > g++ -fopenmp -Wall main.cpp > export OMP_NUM_THREADS=169 > ./a.out 10000 10000 10000

OpenMP中的嵌套循环

C中的短int文字

变量声明，使用libuv

从函数创建并返回一个大对象

Little vs Big Endianess：如何解释测试

scanf需要比请求更多的输入

在Cocoa / Objective-C中创建监视程序的原因和方法

printf支持MSP430微控制器

双指针的盒圆图？

如何使用Msys2和MinGW在Windows上构建OpenLDAP库？

struct addrinfo和struct sockaddr有什么区别