将工作分成更multithreading需要更多时间，为什么？

我有一个小的C程序，它使用monte-carlo模拟计算pi ，它基本上只测试一个随机点[x，y]，如果它在圆圈内部或外部。

为了近似pi，我必须使用大量的样本n ，其具有O（n）的直接比例复杂度。因此，尝试计算大量样本n，我实现了POSIX线程 api以平衡计算能力。

我的代码如下所示：

pthread_t worker[nthreads]; /* creates workers for each thread */ struct param aparam[nthreads]; /* struct param{ long* hits; long rounds; }; */ long nrounds = nsamples / nthreads; /* divide samples to subsets of equal rounds per thread */ for (int i = 0; i < nthreads; ++i) { /* loop to create threads */ aparam[i].hits = 0; aparam[i].rounds = nrounds; pthread_create(&worker[i], NULL, calc_pi, &aparam[i]); /* calls calc_pi(void* vparam){} */ } long nhits = 0; for (int j = 0; j < nthreads; ++j) { /* collects results */ pthread_join(worker[j], NULL); nhits += (long)aparam[j].hits; /* counts hits inside the cicrle */ }

这就是每个线程正在做的事情：

 void* calc_pi(void* vparam) { /* counts hits inside a circle */ struct param *iparam; iparam = (struct param *) vparam; long hits = 0; float x, y, z; for (long i = 0; i rounds; ++i) { x = (float)rand()/RAND_MAX; y = (float)rand()/RAND_MAX; z = x * x + y * y; if (z hits = (long*)hits; return NULL; }

现在我有一个奇怪的观察。使用相同的样本集n和越来越多的线程，我的程序需要更多的时间而不是更少的时间 。

以下是一些平均运行时间（可重复）：

 ------------------------------------------------- | Threads[1] | Samples[1] | Rounds[1] | Time[s] | ------------------------------------------------- | 32 | 268435456 | 8388608 | 118 | | 16 | 268435456 | 16777216 | 106 | | 8 | 268435456 | 33554432 | 125 | | 4 | 268435456 | 67108864 | 152 | | 2 | 268435456 | 134217728 | 36 | | 1 | 268435456 | 268435456 | 15 | -------------------------------------------------

例如，为什么两个线程执行相同的工作所花费的时间比单个线程多两倍？我的假设是划分工作的两个线程应该将时间减少至少50％。

使用GCC 4.9.1和以下标志编译：

 gcc -O2 -std=gnu11 -pthread pipa.c -lpthread -o pipa

我的硬件是双Intel Xeon E5520（2个处理器，每个4核）@ 2.26 GHz，禁用超线程，运行2.6.18内核的科学linux。

有任何想法吗？

线程执行的最昂贵的操作是调用rand() 。 rand()是一个天真的，简单的，通常是非MT可伸缩的函数（因为它保证相同的种子产生相同的随机数序列）。我认为rand()的锁是序列化所有线程。（*）

确认是否存在问题的一个简单技巧是在调试器下启动程序，然后多次：暂停它，捕获线程的堆栈跟踪，继续。无论堆栈中最常出现的是什么，很可能是瓶颈。

（*）使得它更慢的原因是锁争用会导致额外的性能损失。此外，许multithreading增加了进程调度和上下文切换的额外开销。

将工作分成更multithreading需要更多时间，为什么？

lwip stack netconn api keep connection“keep-alive”

嵌套宏和##

Fortran派生类型构造函数通过C函数定义

如何编写ANSI C控制台屏幕缓冲区？

寻找一个好的C / C ++小波库进行信号处理

C – 生成X字符单词的所有可能性

snprintf错误。 sizeof的参数与destination相同

输入缓冲区刷新

C：我用pow（10,2）和pow（10，j）获得了不同的结果，j = 2;

错误C2143：语法错误：缺少’;’ 在’类型’之前