openmp慢了多个线程，想不通

我遇到一个问题，我的以下代码使用openmp运行速度较慢：

chunk = nx/nthreads; int i, j; for(int t = 0; t < n; t++){ #pragma omp parallel for default(shared) private(i, j) schedule(static,chunk) for(i = 1; i < nx/2+1; i++){ for(j = 1; j < nx-1; j++){ T_c[i][j] =0.25*(T_p[i-1][j] +T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]); T_c[nx-i+1][j] = T_c[i][j]; } } copyT(T_p, T_c, nx); } print2file(T_c, nx, file);

问题是当我运行多个线程时，计算时间会更长。

我发现至少有三个问题可能会导致您发布的代码段性能下降：

块大小太小，在线程之间划分时不会显示任何增益。
循环内的parallel区域的打开和关闭可能会损害性能。
两个最里面的循环看起来是独立的，并且只对其中一个进行并行化（失去了利用更宽迭代空间的可能性）。

您可以在下面找到我将对代码进行的一些修改：

 // Moving the omp parallel you open/close the parallel // region only one time, not n times #pragma omp parallel default(shared) for(int t = 0; t < n; t++){ // With collapse you parallelize over an iteration space that is // composed of (nx/2+1)*(nx-1) elements not only (nx/2+1) #pragma omp for collapse(2) schedule(static) for(int i = 1; i < nx/2+1; i++){ for(int j = 1; j < nx-1; j++){ T_c[i][j] =0.25*(T_p[i-1][j] +T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]); T_c[nx-i+1][j] = T_c[i][j]; } } // As the iteration space is very small and the work done // at each iteration is not much, static schedule will likely be the best option // as it is the one that adds the least overhead for scheduling copyT(T_p, T_c, nx); } print2file(T_c, nx, file);

首先，在外循环的每次迭代中重新启动并行区域，从而增加了巨大的开销。

其次，一半的线程只是坐在那里什么都不做，因为你的块大小是它应该的两倍 – 它是nx/nthreads而并行循环的迭代次数是nx/2 ，因此有(nx/2)/(nx/nthreads) = nthreads/2块。除了你试图实现的是复制schedule(static)的行为。

 #pragma omp parallel for (int t = 0; t < n; t++) { #pragma omp for schedule(static) for (int i = 1; i < nx/2+1; i++) { for (int j = 1; j < nx-1; j++) { T_c[i][j] = 0.25*(T_p[i-1][j]+T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]); T_c[nx-i-1][j] = T_c[i][j]; } } #pragma omp single copyT(T_p, T_c, nx); } print2file(T_c, nx, file);

如果修改copyT也使用parallel for ，则应删除single构造。您不需要default(shared)因为这是默认设置。您不要将并行循环的循环变量声明为private - 即使此变量来自外部作用域（因此在区域中隐式共享），OpenMP也会自动将其设置为私有。只需在循环控件中声明所有循环变量，它就会自动运行，并应用默认的共享规则。

第二个半月，你的内循环中可能存在（可能）错误。第二个分配声明应为：

 T_c[nx-i-1][j] = T_c[i][j];

（或T_c[nx-i][j]如果你没有在下方留下光环），否则当i等于1 ，你将访问T_c[nx][...] 。

第三，一般提示：不是将一个数组复制到另一个数组，而是使用指向这些数组的指针，并在每次迭代结束时交换两个指针。

openmp慢了多个线程，想不通

我们如何允许使用前向声明的对象或函数，我们怎么做不到？

如何在C中检查输入是否为数字？

切换机箱组件级别代码

如何读取一系列空格分隔的整数，直到遇到换行符？

是否可以成功包装退出方法？

数组参数中的数组长度

错误代码枚举的C命名建议

gcc内联ARM程序集中的`ldm / stm`

Strdup返回的地址超出范围

跳过一些scanf介于两者之间