推荐的方法来跟踪C程序中数组越界访问/写入

考虑在C中编写一些不那么明显的算法的实现。例如,让我们在KN King的“C编程:现代方法,第2版”一书中找到它的递归快速排序,它可以从这里获得 。 最有趣的部分包括以下两个定义:

void quicksort(int a[], int low, int high) { int middle; if (low >= high) return; middle = split(a, low, high); quicksort(a, low, middle - 1); quicksort(a, middle + 1, high); } int split(int a[], int low, int high) { int part_element = a[low]; for (;;) { while (low < high && part_element = high) break; a[low++] = a[high]; while (low < high && a[low] = high) break; a[high--] = a[low]; } a[high] = part_element; return high; } 

可以通过删除low < high测试来优化while循环:

 for (;;) { while (part_element = high) break; a[low++] = a[high]; a[high] = part_element; while (a[low] = high) break; a[high--] = a[low]; a[low] = part_element; } 

建议的方法是确保每次访问或写入数组(在堆栈上分配)实际上是否有效(即不引发未定义的行为)? 我已经尝试过的是:

  • 在一些实际数据上使用gdb手动调试
  • 将源代码传递给静态分析工具,如splitcppcheck
  • valgrind--tool=exp-sgcheck开关

例如,有五个元素数组{8, 1, 2, 3, 4}

 #define N 5 int main(void) { int a[N] = {8, 1, 2, 3, 4}, i; quicksort(a, 0, N - 1); printf("After sort:"); for (i = 0; i < N; i++) printf(" %d", a[i]); putchar('\n'); return 0; } 

结果是(最肯定的是它的实现依赖):

 After sort: 1 1 2 4 8 

1. GDB

 (gdb) p low $1 = 3 (gdb) p high $2 = 4 (gdb) pa[low] $3 = 1 (gdb) p part_element $4 = 8 (gdb) s 47 low++; (gdb) s 46 while (a[low] <= part_element) (gdb) s 47 low++; (gdb) s 46 while (a[low] <= part_element) (gdb) p low $5 = 5 (gdb) p high $6 = 4 (gdb) bt full #0 split (a=0x7fffffffe140, low=5, high=4) at qsort.c:46 part_element = 8 #1 0x00000000004005df in quicksort (a=0x7fffffffe140, low=0, high=4) at qsort.c:30 middle =  #2 0x0000000000400656 in main () at qsort.c:14 a = {4, 1, 2, 1, 8} i =  

如您所见, low变量超出了边界:

 (gdb) p low $5 = 5 

2.静态分析工具

 $ splint -retvalint -exportlocal qsort.c Splint 3.1.2 --- 07 Feb 2011 Finished checking --- no warnings $ cppcheck qsort.c Checking qsort.c... 

3. Valgrind与--tool=exp-sgcheck

 $ valgrind --tool=exp-sgcheck ./a.out ==5480== exp-sgcheck, a stack and global array overrun detector ==5480== NOTE: This is an Experimental-Class Valgrind Tool ==5480== Copyright (C) 2003-2012, and GNU GPL'd, by OpenWorks Ltd et al. ==5480== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==5480== Command: ./a.out ==5480== ==5480== Invalid read of size 4 ==5480== at 0x4005A0: split (qsort.c:46) ==5480== by 0x4005DE: quicksort (qsort.c:30) ==5480== by 0x400655: main (qsort.c:14) ==5480== Address 0x7ff000114 expected vs actual: ==5480== Expected: stack array "a" of size 20 in frame 2 back from here ==5480== Actual: unknown ==5480== Actual: is 0 after Expected ==5480== After sort: 1 1 2 4 8 ==5480== ==5480== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) 

at 0x4005A0: split (qsort.c:46)的位置与我手动找到的gdb相同。

建议的方法是确保每次访问或写入数组(在堆栈上分配)实际上是否有效(即不引发未定义的行为)?

如果在Linux上使用clang并使用选项-fsanitize=address-fsanitize=undefined会怎么样? 它也可以在gcc : http : //gcc.gnu.org/gcc-4.8/changes.html 。


使用选项-fsanitize=undefined clang

这是一个例子:

 #include  #define N 5 int main(int argc, char *argv[]) { int a[N] = {8, 1, 2, 3, 4}, i; int r =0; int end = atoi(argv[1]); for (int i = 0; i != end; ++i) r += a[i]; return r; } 

然后

clang -fno-omit-frame-pointer -fsanitize=undefined -g out_boundary.c -o out_boundary_clang

 $ ./out_boundary_clang 5 $ ./out_boundary_clang 6 out_boundary.c:12:10: runtime error: index 5 out of bounds for type 'int [5]' Illegal instruction (core dumped) 

然后分析一个核心文件

 Program terminated with signal 4, Illegal instruction. #0 main (argc=2, argv=0x7fff3a1c28c8) at out_boundary.c:12 12 r += a[i]; (gdb) pi $1 = 5 

使用-fsanitize=address选项进行-fsanitize=address

这是一个引用:

 The tool can detect the following types of bugs: * Out-of-bounds accesses to heap, stack and globals * Use-after-free * Use-after-return (to some extent) * Double-free, invalid free * Memory leaks (experimental) 

clang -fno-omit-frame-pointer -fsanitize=address -g out_boundary.c -o out_boundary_clang

然后:

 $ ./out_boundary_clang 6 2>&1 | asan_symbolize.py ================================================================= ==9634==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff91bb2ad4 at pc 0x459c67 bp 0x7fff91bb2910 sp 0x7fff91bb2908 READ of size 4 at 0x7fff91bb2ad4 thread T0 #0 0x459c66 in main out_boundary.c:12 #1 0x3a1d81ed1c in __libc_start_main ??:0 #2 0x4594ac in _start ??:0 Address 0x7fff91bb2ad4 is located in stack of thread T0 at offset 244 in frame #0 0x45957f in main out_boundary.c:6 This frame has 8 object(s): [32, 36) '' [96, 100) '' [160, 168) '' [224, 244) 'a' [288, 292) 'i' [352, 356) 'r' [416, 420) 'end' [480, 484) 'i1' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) Shadow bytes around the buggy address: 0x10007236e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e530: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 0x10007236e540: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 =>0x10007236e550: 00 f4 f4 f4 f2 f2 f2 f2 00 00[04]f4 f2 f2 f2 f2 0x10007236e560: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 0x10007236e570: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f3 f3 f3 f3 0x10007236e580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==9634==ABORTING 

或者您可以使用这两个选项。 有用的链接: