在Cuda内核中生成变化范围内的随机数

我试图在cuda内核中生成随机数随机数。我希望从均匀分布和整数forms生成随机数，从1到8开始。随机数对于每个线程都是不同的。可以生成随机数的范围也会因线程而异。一个线程中的最大范围可能低至2，或者在另一个线程中，它可以高达8，但不高于该值。那么，我在下面提供了一个如何生成数字的示例：

In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2 In thread#2 --> maximum of the range is 6 and so the random number should be between 1 and 6 In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5

等等…

任何帮助将非常感谢。谢谢。

编辑：我已经编辑了我的答案，以解决其他答案（@tudorturcu）和评论中指出的一些缺陷。

使用CURAND生成介于0.0和1.0之间的均匀分布
然后将其乘以所需范围（最大值 – 最小值+ 0.999999）。
然后添加偏移量（+最小值）。
然后截断为整数。

您的设备代码中有类似的内容：

 int idx = threadIdx.x+blockDim.x*blockIdx.x; // assume have already set up curand and generated state for each thread... // assume ranges vary by thread index float myrandf = curand_uniform(&(my_curandstate[idx])); myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999); myrandf += min_rand_int[idx]; int myrand = (int)truncf(myrandf);

你应该：

 #include

对于truncf

这是一个完整的例子：

 $ cat t527.cu #include  #include  #include  #include  #include  #define MIN 2 #define MAX 7 #define ITER 10000000 __global__ void setup_kernel(curandState *state){ int idx = threadIdx.x+blockDim.x*blockIdx.x; curand_init(1234, idx, 0, &state[idx]); } __global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int, unsigned int *result){ int idx = threadIdx.x + blockDim.x*blockIdx.x; int count = 0; while (count < n){ float myrandf = curand_uniform(my_curandstate+idx); myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999); myrandf += min_rand_int[idx]; int myrand = (int)truncf(myrandf); assert(myrand <= max_rand_int[idx]); assert(myrand >= min_rand_int[idx]); result[myrand-min_rand_int[idx]]++; count++;} } int main(){ curandState *d_state; cudaMalloc(&d_state, sizeof(curandState)); unsigned *d_result, *h_result; unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int; cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned)); h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned)); cudaMalloc(&d_max_rand_int, sizeof(unsigned)); h_max_rand_int = (unsigned *)malloc(sizeof(unsigned)); cudaMalloc(&d_min_rand_int, sizeof(unsigned)); h_min_rand_int = (unsigned *)malloc(sizeof(unsigned)); cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned)); setup_kernel<<<1,1>>>(d_state); *h_max_rand_int = MAX; *h_min_rand_int = MIN; cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result); cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost); printf("Bin: Count: \n"); for (int i = MIN; i <= MAX; i++) printf("%d %d\n", i, h_result[i-MIN]); return 0; } $ nvcc -arch=sm_20 -o t527 t527.cu -lcurand $ cuda-memcheck ./t527 ========= CUDA-MEMCHECK Bin: Count: 2 1665496 3 1668130 4 1667644 5 1667435 6 1665026 7 1666269 ========= ERROR SUMMARY: 0 errors $

@ Robert的例子不会产生完全均匀的分布（尽管生成了范围内的所有数字，并且所有生成的数字都在该范围内）。最小值和最大值都有0.5的概率被选择范围内的其余数字。

在步骤2中，您应该乘以范围中的值的数量:(最大值 – 最小值+ 0.999999 ）。 *

在步骤3，偏移应该是（+最小值）而不是（+最小值+ 0.5）。

步骤1和4保持不变。

*正如@Kamil Czerski所说，1.0包含在发行版中。添加1.0而不是0.99999有时会导致数字超出所需范围。

在Cuda内核中生成变化范围内的随机数

确定范围是否重叠

btree实现中的分段错误

初始化与赋值不同？

捕获子进程的退出状态代码

如何在C中打印errno的符号名称？

根据环境选择C二进制文件

在我自己的代码中不能使用CHOLMOD和CUDA加速

全局变量何时被认为是良好/推荐的做法？

返回指向本地结构的指针

C中的复合条件：if（0.0 <a <1.0）