如何使用MPI_Gatherv从包括主节点在内的不同处理器中收集不同长度的字符串？

我试图在主节点处从所有处理器（包括主节点）收集不同长度的不同字符串到单个字符串（字符数组）。这是MPI_Gatherv的原型：

int MPI_Gatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, const int *recvcounts, const int *displs, MPI_Datatype recvtype, int root, MPI_Comm comm)**.

我无法定义一些参数，如recvbuf ， recvcounts和displs 。任何人都可以在C中提供源代码示例吗？

正如已经指出的那样，有很多使用MPI_Gatherv的例子，包括堆栈溢出; 一个答案开始描述散射和收集工作，然后散射/聚集变体如何扩展，可以在这里找到。

至关重要的是，对于更简单的Gather操作，每个块都具有相同的大小，MPI库可以轻松地预先计算每个块应该在最终编译的数组中的位置; 在更一般的收集操作中，如果不太清楚，您可以选择 – 事实上，要求 – 准确说明每个项目应该从哪里开始。

这里唯一的复杂因素是你正在处理字符串，所以你可能不希望所有东西都在一起; 你需要额外的填充空格，当然还有一个空终结符。

所以假设您有五个想要发送字符串的进程：

 Rank 0: "Hello" (len=5) Rank 1: "world!" (len=6) Rank 2: "Bonjour" (len=7) Rank 3: "le" (len=2) Rank 4: "monde!" (len=6)

您希望将其组合成全局字符串：

 Hello world! Bonjour le monde!\0 111111111122222222223 0123456789012345678901234567890 recvcounts={5,6,7,2,6}; /* just the lengths */ displs = {0,6,13,21,24}; /* cumulative sum of len+1 for padding */

您可以看到位移0为0，位移i等于j = 0..i-1的（recvcounts [j] +1）之和：

  i count[i] count[i]+1 displ[i] displ[i]-displ[i-1] ------------------------------------------------------------ 0 5 6 0 1 6 7 6 6 2 7 8 13 7 3 2 3 21 8 4 6 7 24 3

这是直接实施的：

 #include  #include  #include  #include "mpi.h" #define nstrings 5 const char *const strings[nstrings] = {"Hello","world!","Bonjour","le","monde!"}; int main(int argc, char **argv) { MPI_Init(&argc, &argv); int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); /* Everyone gets a string */ int myStringNum = rank % nstrings; char *mystring = (char *)strings[myStringNum]; int mylen = strlen(mystring); printf("Rank %d: %s\n", rank, mystring); /* * Now, we Gather the string lengths to the root process, * so we can create the buffer into which we'll receive the strings */ const int root = 0; int *recvcounts = NULL; /* Only root has the received data */ if (rank == root) recvcounts = malloc( size * sizeof(int)) ; MPI_Gather(&mylen, 1, MPI_INT, recvcounts, 1, MPI_INT, root, MPI_COMM_WORLD); /* * Figure out the total length of string, * and displacements for each rank */ int totlen = 0; int *displs = NULL; char *totalstring = NULL; if (rank == root) { displs = malloc( size * sizeof(int) ); displs[0] = 0; totlen += recvcounts[0]+1; for (int i=1; i\n", rank, totalstring); free(totalstring); free(displs); free(recvcounts); } MPI_Finalize(); return 0; }

跑步给出：

 $ mpicc -o gatherstring gatherstring.c -Wall -std=c99 $ mpirun -np 5 ./gatherstring Rank 0: Hello Rank 3: le Rank 4: monde! Rank 1: world! Rank 2: Bonjour 0:

MPI_Gather+MPI_Gatherv需要计算所有等级的位移，当你的字符串具有几乎相似的长度时，我发现它是不必要的。相反，您可以使用带填充的字符串接收缓冲区的MPI_Allreduce+MPI_Gather 。填充基于使用MPI_Allreduce计算的最长可用字符串MPI_Allreduce 。这是代码：

 #include  #include  #include  #include  #include  int main(int argc, char** argv) { MPI_Init(NULL, NULL); int rank; int nranks; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nranks); srand(time(NULL) + rank); int my_len = (rand() % 10) + 1; // str_len \in [1, 9] int my_char = (rand() % 26) + 65; // str_char \in [65, 90] = [A, Z] char my_str[my_len + 1]; memset(my_str, my_char, my_len); my_str[my_len] = '\0'; printf("rank %d of %d has string=%s with size=%zu\n", rank, nranks, my_str, strlen(my_str)); int max_len = 0; MPI_Allreduce(&my_len, &max_len, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); // + 1 for taking account of null pointer at the end ['\n'] char *my_str_padded[max_len + 1]; memset(my_str_padded, '\0', max_len + 1); memcpy(my_str_padded, my_str, my_len); char *all_str = NULL; if(!rank) { int all_len = (max_len + 1) * nranks; all_str = malloc(all_len * sizeof(char)); memset(all_str, '\0', all_len); } MPI_Gather(my_str_padded, max_len + 1, MPI_CHAR, all_str, max_len + 1, MPI_CHAR, 0, MPI_COMM_WORLD); if(!rank) { char *str_idx = all_str; int rank_idx = 0; while(*str_idx) { printf("rank %d sent string=%s with size=%zu\n", rank_idx, str_idx, strlen(str_idx)); str_idx = str_idx + max_len + 1; rank_idx++; } } MPI_Finalize(); return(0); }

请记住，选择使用置换的MPI_AllReduce+MPI_Gather和有时使用填充的MPI_Gather+MPI_Gatherv之间需要权衡，因为前者需要更多时间来计算位移，而后者需要更多存储空间来对齐接收缓冲区。

我还使用大字符串缓冲区对这两种方法进行了基准测试，但未发现任何重大的运行时差异。

如何使用MPI_Gatherv从包括主节点在内的不同处理器中收集不同长度的字符串？

为什么我的.c编码无法在GCC中编译？

使用赋值而不是memcpy（）在C中复制结构

限制C标准I / O以及为什么我们不能将C标准I / O与套接字一起使用

有没有办法让doxygen显示枚举数值而不改变CSS？

如果可能的话，如何在Objective-C中声明一个typedef的结构枚举

uint32_t和uint8_t的联合未定义的行为？

GTK + gcc：链接时对所有gtk函数的未定义引用

Newton Raphson迭代陷入无限循环

如何将Perl转换为C？

如何使用rand函数在特定范围内生成数字？