MPI_Gather（）将中心元素转换为全局矩阵

这是来自MPI_Gather 2Darrays的后续问题。情况如下：

id = 0 has this submatrix |16.000000| |11.000000| |12.000000| |15.000000| |6.000000| |1.000000| |2.000000| |5.000000| |8.000000| |3.000000| |4.000000| |7.000000| |14.000000| |9.000000| |10.000000| |13.000000| ----------------------- id = 1 has this submatrix |12.000000| |15.000000| |16.000000| |11.000000| |2.000000| |5.000000| |6.000000| |1.000000| |4.000000| |7.000000| |8.000000| |3.000000| |10.000000| |13.000000| |14.000000| |9.000000| ----------------------- id = 2 has this submatrix |8.000000| |3.000000| |4.000000| |7.000000| |14.000000| |9.000000| |10.000000| |13.000000| |16.000000| |11.000000| |12.000000| |15.000000| |6.000000| |1.000000| |2.000000| |5.000000| ----------------------- id = 3 has this submatrix |4.000000| |7.000000| |8.000000| |3.000000| |10.000000| |13.000000| |14.000000| |9.000000| |12.000000| |15.000000| |16.000000| |11.000000| |2.000000| |5.000000| |6.000000| |1.000000| ----------------------- The global matrix: |1.000000| |2.000000| |5.000000| |6.000000| |3.000000| |4.000000| |7.000000| |8.000000| |11.000000| |12.000000| |15.000000| |16.000000| |-3.000000| |-3.000000| |-3.000000| |-3.000000|

我想要做的是只收集全局网格中的中心元素（不在边框中），因此全局网格应该是这样的：

  |1.000000| |2.000000| |5.000000| |6.000000| |3.000000| |4.000000| |7.000000| |8.000000| |9.000000| |10.000000| |13.000000| |14.000000| |11.000000| |12.000000| |15.000000| |16.000000|

而不是像我得到的那个。这是我的代码：

 float **gridPtr; float **global_grid; lengthSubN = N/pSqrt; // N is the dim of global gird and pSqrt the sqrt of the number of processes MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType); MPI_Type_commit(&rowType); if(id == 0) { MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD); MPI_Gather(&gridPtr[2][1], 1, rowType, global_grid[1], 1, rowType, 0, MPI_COMM_WORLD); } else { MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); MPI_Gather(&gridPtr[2][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); } ... float** allocate2D(float** A, const int N, const int M) { int i; float *t0; A = malloc(M * sizeof (float*)); /* Allocating pointers */ if(A == NULL) printf("MALLOC FAILED in A\n"); t0 = malloc(N * M * sizeof (float)); /* Allocating data */ if(t0 == NULL) printf("MALLOC FAILED in t0\n"); for (i = 0; i < M; i++) A[i] = t0 + i * (N); return A; }

编辑：

这是我没有MPI_Gather()尝试，但是使用了子MPI_Gather() ：

  MPI_Datatype mysubarray; int starts[2] = {1, 1}; int subsizes[2] = {lengthSubN, lengthSubN}; int bigsizes[2] = {N_glob, M_glob}; MPI_Type_create_subarray(2, bigsizes, subsizes, starts, MPI_ORDER_C, MPI_FLOAT, &mysubarray); MPI_Type_commit(&mysubarray); MPI_Isend(&(gridPtr[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]); MPI_Type_free(&mysubarray); MPI_Barrier(MPI_COMM_WORLD); if(id == 0) { for(i = 0; i < p; ++i) { MPI_Irecv(&(global_grid[i][0]), lengthSubN * lengthSubN, MPI_FLOAT, i, 3, MPI_COMM_WORLD, &req[0]); } } if(id == 0) print(global_grid, N_glob, N_glob);

但结果是：

 |1.000000| |2.000000| |3.000000| |4.000000| |5.000000| |6.000000| |7.000000| |8.000000| |9.000000| |10.000000| |11.000000| |12.000000| |13.000000| |14.000000| |15.000000| |16.000000|

这不完全是我想要的。我必须找到一种方法来说明它应该以另一种方式放置数据。所以，如果我这样做：

 MPI_Irecv(&(global_grid[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]);

然后我会得到：

 |-3.000000| |-3.000000| |-3.000000| |-3.000000| |-3.000000| |1.000000| |2.000000| |-3.000000| |-3.000000| |3.000000| |4.000000| |-3.000000| |-3.000000| |-3.000000| |-3.000000| |-3.000000|

我无法提供完整的解决方案，但我将解释为什么使用MPI_Gather的原始示例无法按预期工作。

使用lengthSubN=2您定义了一个新的2个浮点数据类型，它们存储在此行的内存中：

 MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType);

现在，让我们来看看第一个MPI_Gather调用，它是：

 if(id == 0) { MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD); } else { MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD); }

它需要1个rowType ，它是从每个等级的元素gridPtr[1][1]开始的2个相邻的float。这些是价值观：

 id 0: 1.0 2.0 id 1: 5.0 6.0 id 2: 9.0 10.0 id 3: 13.0 14.0

并将它们放在global_grid[0]指向的接收缓冲区中。这个指针实际上指向第一行的开头，因此内存充满了：

  1.0 2.0 5.0 6.0 9.0 10.0 13.0 14.0

但是， global_grid每行只有4列，因此最后4个值换行到global_grid[1] （*）指向的第二行。这甚至可能是未定义的行为。因此，在此MPI_Gather之后， MPI_Gather的内容为：

  1.0 2.0 5.0 6.0 9.0 10.0 13.0 14.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0

第二个MPI_Gather以相同的方式工作，并开始在global_grid的第二行global_grid ：

  3.0 4.0 7.0 8.0 11.0 12.0 15.0 16.0

因此它覆盖了上面的一些值，结果如下所示：

  1.0 2.0 5.0 6.0 3.0 4.0 7.0 8.0 11.0 12.0 15.0 16.0 -3.0 -3.0 -3.0 -3.0

（*） allocate2d实际上为2维数据缓冲区分配连续内存。

MPI_Gather（）将中心元素转换为全局矩阵

头文件中的变量定义

如何引用局部变量在C中共享全局变量的同名？

当第一个早期定义的字符串数组的字符串为null时，scanf（）不读取输入字符串

C编程语言的运算符优先级表

在C中递归地反转一个字符串？

调用free（）会导致程序崩溃

如何使用Union在1 C数组中存储2个不同的结构

插入新节点

我该如何处理SIGCHLD？

条件赋值为布尔值