Tag: gpgpu

OpenCL：将指针存储到本地内存中的全局内存？: 任何解决方案这有可能吗？ __global *float abc; // pointer to global memory stored in private memory 我希望abc存储在本地内存而不是私有内存中。

CUDA上的块间障碍: 我想在CUDA上实现Inter-block障碍，但遇到了严重的问题。我无法弄清楚为什么它不起作用。 #include #include #include #define SIZE 10000000 #define BLOCKS 100 using namespace std; struct Barrier { int *count; __device__ void wait() { atomicSub(count, 1); while(*count) ; } Barrier() { int blocks = BLOCKS; cudaMalloc((void**) &count, sizeof(int)); cudaMemcpy(count, &blocks, sizeof(int), cudaMemcpyHostToDevice); } ~Barrier() { cudaFree(count); } }; __global__ void sum(int* vec, int* cache, int *sum, […]

在Cuda内核中生成变化范围内的随机数: 我试图在cuda内核中生成随机数随机数。我希望从均匀分布和整数forms生成随机数，从1到8开始。随机数对于每个线程都是不同的。可以生成随机数的范围也会因线程而异。一个线程中的最大范围可能低至2，或者在另一个线程中，它可以高达8，但不高于该值。那么，我在下面提供了一个如何生成数字的示例： In thread#1 –> maximum of the range is 2 and so the random number should be between 1 and 2 In thread#2 –> maximum of the range is 6 and so the random number should be between 1 and 6 In thread#3 –> maximum of the range is 5 and […]

使用CUDA添加大整数: 我一直在GPU上开发一种加密算法，目前坚持使用算法来执行大整数加法。大整数以通常的方式表示为一堆32位字。例如，我们可以使用一个线程来添加两个32位字。为简单起见，假设要添加的数字具有相同的长度和每个块的线程数==字数。然后： __global__ void add_kernel(int *C, const int *A, const int *B) { int x = A[threadIdx.x]; int y = B[threadIdx.x]; int z = x + y; int carry = (z < x); /** do carry propagation in parallel somehow ? */ ………… z = z + newcarry; // update the resulting […]

Interesting Posts

在小型c程序中获得分段错误

如何将项目添加到Pidgin菜单

诅咒得到箭头键

浮点数和双变量的比较

计算unsigned int中位转换次数的最快方法

从mach_timebase_info（）创建结构

如何告诉编译器使用ARM的硬件浮点指令

为什么-1 >> 1和0xFFFFFFFF >> 1产生不同的结果？

跨平台替代Winsock？

C：数据结构对齐

除-Ofast以外的任何内容都会导致“未定义的引用”错误

GDB调试器问题 – 没有命名的源文件

每次调整窗口大小时内存都会增加

来自相对路径的绝对URL

sizeof void指针