简单的CUDA内核没有按预期返回值

所以，我开始对CUDA感到非常沮丧，所以我决定编写最简单的代码片段，只是为了得到我的支持。但似乎有些事情在我脑海中浮现。在我的代码中，我只是添加两个数组，然后将它们存储在第三个数组中，如下所示：

#include  #include  __global__ void add(int* these, int* those, int* answers) { int tid = blockIdx.x; answers[tid] = these[tid] + those[tid]; } int main() { int these[50]; int those[50]; int answers[50]; int *devthese; int *devthose; int *devanswers; cudaMalloc((void**)&devthese, 50 * sizeof(int)); cudaMalloc((void**)&devthose, 50 * sizeof(int)); cudaMalloc((void**)&devanswers, 50 * sizeof(int)); int i; for(i = 0; i < 50; i++) { these[i] = i; those[i] = 2 * i; } cudaMemcpy(devthese, these, 50 * sizeof(int), cudaMemcpyHostToDevice); cudaMemcpy(devthose, those, 50 * sizeof(int), cudaMemcpyHostToDevice); add<<>>(devthese, devthose, devanswers); cudaMemcpy(answers, devanswers, 50 * sizeof(int), cudaMemcpyDeviceToHost); for(i = 0; i < 50; i++) { fprintf(stderr,"%i\n",answers[i]); } return 0; }

但是，正在打印的int值不遵循3的倍数序列，这正是我所期待的。谁能解释出了什么问题？

从评论来看，问题显然与在编译期间使用不正确的目标体系结构有关，导致无法在OP的GPU上运行的可执行文件。

已添加此社区wiki答案，以便从未应答的队列中取消此答案。如果/当OP返回时它可以被删除并提供更全面的答案。

简单的CUDA内核没有按预期返回值

有没有办法用C ping特定的IP地址？

无序修改和访问指针

CUDA矩阵乘法中断了大型矩阵

创建一个char *的2D数组，用回调函数中的sqlite数据填充它

修改深度第一次遍历树

C初学者编程帮助日历代码更新

什么是android中的init.rc语言？

列出C / C ++函数（Unix中的代码分析）

GDB在arrays初始化时报告“当前上下文中没有符号”

Qt Creator：未解析的外部符号