是GNU gprof越野车吗？

我有一个C程序通过函数pi_calcPiItem()调用函数pi_calcPiItem() 600000000次。因此，要分析我使用GNU gprof的函数所花费的时间。结果似乎是错误的，因为所有调用都归因于main() 。此外，调用图没有任何意义：

 Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name 61.29 9.28 9.28 pi_calcPiItem 15.85 11.68 2.40 pi_calcPiBlock 11.96 13.49 1.81 _mcount_private 9.45 14.92 1.43 __fentry__ 1.45 15.14 0.22 pow 0.00 15.14 0.00 600000000 0.00 0.00 main Call graph granularity: each sample hit covers 4 byte(s) for 0.07% of 15.14 seconds index % time self children called name  [1] 61.3 9.28 0.00 pi_calcPiItem [1] -----------------------------------------------  [2] 15.9 2.40 0.00 pi_calcPiBlock [2] 0.00 0.00 600000000/600000000 main [6] -----------------------------------------------  [3] 12.0 1.81 0.00 _mcount_private [3] -----------------------------------------------  [4] 9.4 1.43 0.00 __fentry__ [4] -----------------------------------------------  [5] 1.5 0.22 0.00 pow [5] ----------------------------------------------- 6 main [6] 0.00 0.00 600000000/600000000 pi_calcPiBlock [2] [6] 0.0 0.00 0.00 600000000+6 main [6] 6 main [6] -----------------------------------------------

这是一个错误还是我必须以某种方式配置程序？

什么是什么意思？

编辑（更深入了解你）

代码是关于pi的计算：

 #define PI_BLOCKSIZE (100000000) #define PI_BLOCKCOUNT (6) #define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT) int32_t main(int32_t argc, char* argv[]) { double result; for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) { pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE); } printf("pi = %f\n",result); return 0; } static void pi_calcPiBlock(double* result, int32_t start, int32_t end) { double piItem; for ( int32_t i = start; i < end; ++i ) { pi_calcPiItem(&piItem, i); *result += piItem; } } static void pi_calcPiItem(double* piItem, int32_t index) { *piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0)); }

这就是我得到结果的方式（在Cygwin的帮助下在Windows上执行）：

 > gcc -std=c99 -o pi *.c -pg -fno-inline-small-functions > ./pi.exe > gprof.exe pi.exe

尝试：

使用noinline ， noclone函数属性而不是-fno-inline-small-functions
- 通过反汇编我可以看到-fno-inline-small-functions不会停止内联
静态链接程序（ -static ）
您还应该在main中将result初始化为0.0

这适用于Linux，x86-64：

 #include  #include  #include  #define PI_BLOCKSIZE (100000000) #define PI_BLOCKCOUNT (6) #define PI_THRESHOLD (PI_BLOCKSIZE * PI_BLOCKCOUNT) static void pi_calcPiItem(double* piItem, int32_t index); static void pi_calcPiBlock(double* result, int32_t start, int32_t end); int32_t main(int32_t argc, char* argv[]) { double result; result = 0.0; for ( int32_t i = 0; i < PI_THRESHOLD; i += PI_BLOCKSIZE ) { pi_calcPiBlock(&result, i, i + PI_BLOCKSIZE); } printf("pi = %f\n",result); return 0; } __attribute__((noinline, noclone)) static void pi_calcPiBlock(double* result, int32_t start, int32_t end) { double piItem; for ( int32_t i = start; i < end; ++i ) { pi_calcPiItem(&piItem, i); *result += piItem; } } __attribute__((noinline, noclone)) static void pi_calcPiItem(double* piItem, int32_t index) { *piItem = 4.0 * (pow(-1.0,index) / (2.0 * index + 1.0)); }

建立守则

 $ cc pi.c -o pi -Os -Wall -g3 -I. -std=c99 -pg -static -lm

产量

 $ ./pi && gprof ./pi pi = 3.141593 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 85.61 22.55 22.55 __ieee754_pow_sse2 4.75 23.80 1.25 pow 4.14 24.89 1.09 600000000 1.82 1.82 pi_calcPiItem 2.54 25.56 0.67 __exp1 0.91 25.80 0.24 pi_calcPiBlock 0.53 25.94 0.14 matherr 0.47 26.07 0.13 __lseek_nocancel 0.38 26.17 0.10 frame_dummy 0.34 26.26 0.09 __ieee754_exp_sse2 0.32 26.34 0.09 __profile_frequency 0.00 26.34 0.00 1 0.00 0.00 main Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.04% of 26.34 seconds index % time self children called name  [1] 85.6 22.55 0.00 __ieee754_pow_sse2 [1] -----------------------------------------------  [2] 5.0 0.24 1.09 pi_calcPiBlock [2] 1.09 0.00 600000000/600000000 pi_calcPiItem [4] -----------------------------------------------  [3] 4.7 1.25 0.00 pow [3] ----------------------------------------------- 1.09 0.00 600000000/600000000 pi_calcPiBlock [2] [4] 4.1 1.09 0.00 600000000 pi_calcPiItem [4] -----------------------------------------------  [5] 2.5 0.67 0.00 __exp1 [5] -----------------------------------------------  [6] 0.5 0.14 0.00 matherr [6] -----------------------------------------------  [7] 0.5 0.13 0.00 __lseek_nocancel [7] -----------------------------------------------  [8] 0.4 0.10 0.00 frame_dummy [8] -----------------------------------------------  [9] 0.3 0.09 0.00 __ieee754_exp_sse2 [9] -----------------------------------------------  [10] 0.3 0.09 0.00 __profile_frequency [10] ----------------------------------------------- 0.00 0.00 1/1 __libc_start_main [827] [11] 0.0 0.00 0.00 1 main [11] -----------------------------------------------

正如预期的那样， pow()是瓶颈。当pi正在运行时， perf top （基于采样的系统分析器）也显示__ieee754_pow_sse2占用60％以上的CPU。将pow(-1.0,index)改为((i & 1) ? -1.0 : 1.0)为@Mike Dunlavey建议使代码大约快4倍。

在’man gprof’页面中，这里是“自发的”解释：

没有自己描述的父母将有他们的异形儿童传播给他们的时间，但他们似乎在调用图表列表中自发调用，并且不会有他们的时间进一步传播。同样地，信号捕捉者，即使是异形，也会看起来是自发的（尽管出于更加模糊的原因）。除非在执行剖析程序期间调用信号捕获器，否则信号捕获器的任何异形子元素都应该正确传播它们的时间，在这种情况下，所有信号捕获器都会丢失。

是GNU gprof越野车吗？

HM-10和Arduino – 发送AT命令，没有从代码结束的行

在位数组中有效地找到’1’的位置

C ++中的固定宽度整数

链接范围IPv6多播数据包突然无法在MacBook Pro上路由？

FFmpeg：Jpeg文件到AVFrame

YUV420到RGB颜色转换错误

如何在Assembly Langauge X86中检测溢出条件

c / c ++指向数组的指针vs指向指针的指针

了解glibc源代码约定

在封闭连接上写入不会立即生成sigpipe