MPI Cartesian拓扑:MPI_Neighbor_alltoall收到错误的数据

我有一个MPI笛卡尔拓扑,并希望通过MPI_Neighbor_alltoall将每个节点等级发送给它们的邻居。 我无法弄清楚,错误在哪里,我也实现了我自己的MPI_Neighbor_alltoall,这是行不通的。 我将我的代码最小化为(希望)易于理解的代码片段。

alltoall.c

#include  #include  #include  #include  int main(int argc, char** argv) { // MPI_Cart variables MPI_Comm cart_comm; MPI_Comm mycomm; int ndims=2; int periods[2]={0,0}; int coord[2]; int dims[2]={3,3}; int xdim = dims[0]; int ydim = dims[1]; int comm_size = xdim*ydim; // Initialize the MPI environment and pass the arguments to all the processes. MPI_Init(&argc, &argv); // Get the rank and number of the process int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); // output: dimensions if(rank==0){ printf("dims: [%i] [%i]\n", xdim, ydim); } // enough nodes if(comm_size<=size){ // Communicator count has to match nodecount in dims // so we create a new Communicator with matching nodecount int color; int graphnode; if(rank<comm_size){ //printf("%d=%d\n",rank,comm_size); // not used nodes color=1; graphnode=0; } MPI_Comm_split(MPI_COMM_WORLD, color, rank, &mycomm); MPI_Comm_rank(mycomm, &rank); MPI_Comm_size(mycomm, &size); // ***GRAPHNODE-SECTION*** if(graphnode){ // Create Dimensions MPI_Dims_create(size, ndims, dims); // Create Cartesian MPI_Cart_create(mycomm, ndims, dims, periods, 1, &cart_comm); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int len; MPI_Get_processor_name(processor_name, &len); // Get coordinates MPI_Cart_coords(cart_comm, rank, ndims, coord); // sending int *sendrank = &rank; int recvrank[4]; MPI_Neighbor_alltoall(sendrank , 1, MPI_INT, recvrank, 1, MPI_INT, cart_comm); printf("my rank: %i, received ranks: %i %i %i %i\n", rank, recvrank[0], recvrank[1], recvrank[2], recvrank[3]); } else { // *** SPARE NODES SECTION *** } } else { // not enough nodes reserved if(rank==0) printf("not enough nodes\n"); } // Finalize the MPI environment. MPI_Finalize(); } 

因此,此代码创建了一个3×3笛卡尔拓扑。 如果没有足够的节点并且在节点太多时让备用节点不执行任务,则最终确定。 即使这应该很容易,我仍然做错了,因为输出缺少一些数据。

产量

 $ mpicc alltoall.c $ mpirun -np 9 a.out dims: [3] [3] my rank: 2, received ranks: -813779952 5 0 32621 my rank: 1, received ranks: 1415889936 4 0 21 my rank: 5, received ranks: 9 8 0 32590 my rank: 3, received ranks: 9 6 -266534912 21 my rank: 7, received ranks: 9 32652 0 21 my rank: 8, received ranks: 9 32635 0 32635 my rank: 6, received ranks: 9 32520 1372057600 21 my rank: 0, received ranks: -1815116784 3 -1803923456 21 my rank: 4, received ranks: 9 7 0 21 

正如您在输出中看到的那样,没有人将节点1,2作为邻居,21来自哪里? 等级4应该是唯一的节点,有4个邻居,但那应该是{1,3,5,7}对吗? 我真的不知道我的错误在哪里。

坐标应如下所示:

 [0,0] [1,0] [2,0] [0,1] [1,1] [2,1] [0,2] [1,2] [2,2] 

并且排名如下:

 0 3 6 1 4 7 2 5 8 

您正在访问大量未初始化的数据(包括sendrank和recvrank)

这是一个适合我的测试程序的重写版本

 #include  #include  #include  #include  int main(int argc, char** argv) { // MPI_Cart variables MPI_Comm cart_comm; MPI_Comm mycomm; int ndims=2; int periods[2]={0,0}; int coord[2]; int dims[2]={3,3}; int xdim = dims[0]; int ydim = dims[1]; int comm_size = xdim*ydim; // Initialize the MPI environment and pass the arguments to all the processes. MPI_Init(&argc, &argv); // Get the rank and number of the process int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); // output: dimensions if(rank==0){ printf("dims: [%i] [%i]\n", xdim, ydim); } // enough nodes if(comm_size<=size){ // Communicator count has to match nodecount in dims // so we create a new Communicator with matching nodecount int color; int graphnode; if(rank=%d\n",rank,comm_size); // not used nodes color=1; graphnode=0; } MPI_Comm_split(MPI_COMM_WORLD, color, rank, &mycomm); MPI_Comm_rank(mycomm, &rank); MPI_Comm_size(mycomm, &size); // ***GRAPHNODE-SECTION*** if(graphnode){ // Create Dimensions MPI_Dims_create(size, ndims, dims); // Create Cartesian MPI_Cart_create(mycomm, ndims, dims, periods, 1, &cart_comm); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int len; MPI_Get_processor_name(processor_name, &len); // Get coordinates MPI_Cart_coords(cart_comm, rank, ndims, coord); // sending int sendrank[4]; int recvrank[4]; int i; char * neighbors[4]; for (i=0; i<4; i++) { sendrank[i] = rank; recvrank[i] = -1; } MPI_Neighbor_alltoall(sendrank , 1, MPI_INT, recvrank, 1, MPI_INT, cart_comm); for (i=0; i<4; i++) { if (-1 != recvrank[i]) { asprintf(&neighbors[i], "%d ", recvrank[i]); } else { neighbors[i] = ""; } } printf("my rank: %i, received ranks: %s%s%s%s\n", rank, neighbors[0], neighbors[1], neighbors[2], neighbors[3]); } else { // *** SPARE NODES SECTION *** } } else { // not enough nodes reserved if(rank==0) printf("not enough nodes\n"); } // Finalize the MPI environment. MPI_Finalize(); }