推荐的方法来跟踪C程序中数组越界访问/写入
考虑在C中编写一些不那么明显的算法的实现。例如,让我们在KN King的“C编程:现代方法,第2版”一书中找到它的递归快速排序,它可以从这里获得 。 最有趣的部分包括以下两个定义:
void quicksort(int a[], int low, int high) { int middle; if (low >= high) return; middle = split(a, low, high); quicksort(a, low, middle - 1); quicksort(a, middle + 1, high); } int split(int a[], int low, int high) { int part_element = a[low]; for (;;) { while (low < high && part_element = high) break; a[low++] = a[high]; while (low < high && a[low] = high) break; a[high--] = a[low]; } a[high] = part_element; return high; }
可以通过删除low < high
测试来优化while
循环:
for (;;) { while (part_element = high) break; a[low++] = a[high]; a[high] = part_element; while (a[low] = high) break; a[high--] = a[low]; a[low] = part_element; }
建议的方法是确保每次访问或写入数组(在堆栈上分配)实际上是否有效(即不引发未定义的行为)? 我已经尝试过的是:
- 在一些实际数据上使用
gdb
手动调试 - 将源代码传递给静态分析工具,如
split
或cppcheck
-
valgrind
与--tool=exp-sgcheck
开关
例如,有五个元素数组{8, 1, 2, 3, 4}
:
#define N 5 int main(void) { int a[N] = {8, 1, 2, 3, 4}, i; quicksort(a, 0, N - 1); printf("After sort:"); for (i = 0; i < N; i++) printf(" %d", a[i]); putchar('\n'); return 0; }
结果是(最肯定的是它的实现依赖):
After sort: 1 1 2 4 8
1. GDB
(gdb) p low $1 = 3 (gdb) p high $2 = 4 (gdb) pa[low] $3 = 1 (gdb) p part_element $4 = 8 (gdb) s 47 low++; (gdb) s 46 while (a[low] <= part_element) (gdb) s 47 low++; (gdb) s 46 while (a[low] <= part_element) (gdb) p low $5 = 5 (gdb) p high $6 = 4 (gdb) bt full #0 split (a=0x7fffffffe140, low=5, high=4) at qsort.c:46 part_element = 8 #1 0x00000000004005df in quicksort (a=0x7fffffffe140, low=0, high=4) at qsort.c:30 middle = #2 0x0000000000400656 in main () at qsort.c:14 a = {4, 1, 2, 1, 8} i =
如您所见, low
变量超出了边界:
(gdb) p low $5 = 5
2.静态分析工具
$ splint -retvalint -exportlocal qsort.c Splint 3.1.2 --- 07 Feb 2011 Finished checking --- no warnings $ cppcheck qsort.c Checking qsort.c...
3. Valgrind与--tool=exp-sgcheck
$ valgrind --tool=exp-sgcheck ./a.out ==5480== exp-sgcheck, a stack and global array overrun detector ==5480== NOTE: This is an Experimental-Class Valgrind Tool ==5480== Copyright (C) 2003-2012, and GNU GPL'd, by OpenWorks Ltd et al. ==5480== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==5480== Command: ./a.out ==5480== ==5480== Invalid read of size 4 ==5480== at 0x4005A0: split (qsort.c:46) ==5480== by 0x4005DE: quicksort (qsort.c:30) ==5480== by 0x400655: main (qsort.c:14) ==5480== Address 0x7ff000114 expected vs actual: ==5480== Expected: stack array "a" of size 20 in frame 2 back from here ==5480== Actual: unknown ==5480== Actual: is 0 after Expected ==5480== After sort: 1 1 2 4 8 ==5480== ==5480== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
at 0x4005A0: split (qsort.c:46)
的位置与我手动找到的gdb
相同。
建议的方法是确保每次访问或写入数组(在堆栈上分配)实际上是否有效(即不引发未定义的行为)?
如果在Linux上使用clang
并使用选项-fsanitize=address
和-fsanitize=undefined
会怎么样? 它也可以在gcc
: http : //gcc.gnu.org/gcc-4.8/changes.html 。
使用选项-fsanitize=undefined
clang
这是一个例子:
#include #define N 5 int main(int argc, char *argv[]) { int a[N] = {8, 1, 2, 3, 4}, i; int r =0; int end = atoi(argv[1]); for (int i = 0; i != end; ++i) r += a[i]; return r; }
然后
clang -fno-omit-frame-pointer -fsanitize=undefined -g out_boundary.c -o out_boundary_clang
$ ./out_boundary_clang 5 $ ./out_boundary_clang 6 out_boundary.c:12:10: runtime error: index 5 out of bounds for type 'int [5]' Illegal instruction (core dumped)
然后分析一个核心文件
Program terminated with signal 4, Illegal instruction. #0 main (argc=2, argv=0x7fff3a1c28c8) at out_boundary.c:12 12 r += a[i]; (gdb) pi $1 = 5
使用-fsanitize=address
选项进行-fsanitize=address
这是一个引用:
The tool can detect the following types of bugs: * Out-of-bounds accesses to heap, stack and globals * Use-after-free * Use-after-return (to some extent) * Double-free, invalid free * Memory leaks (experimental)
clang -fno-omit-frame-pointer -fsanitize=address -g out_boundary.c -o out_boundary_clang
然后:
$ ./out_boundary_clang 6 2>&1 | asan_symbolize.py ================================================================= ==9634==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff91bb2ad4 at pc 0x459c67 bp 0x7fff91bb2910 sp 0x7fff91bb2908 READ of size 4 at 0x7fff91bb2ad4 thread T0 #0 0x459c66 in main out_boundary.c:12 #1 0x3a1d81ed1c in __libc_start_main ??:0 #2 0x4594ac in _start ??:0 Address 0x7fff91bb2ad4 is located in stack of thread T0 at offset 244 in frame #0 0x45957f in main out_boundary.c:6 This frame has 8 object(s): [32, 36) '' [96, 100) '' [160, 168) '' [224, 244) 'a' [288, 292) 'i' [352, 356) 'r' [416, 420) 'end' [480, 484) 'i1' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) Shadow bytes around the buggy address: 0x10007236e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e530: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 0x10007236e540: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 =>0x10007236e550: 00 f4 f4 f4 f2 f2 f2 f2 00 00[04]f4 f2 f2 f2 f2 0x10007236e560: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 0x10007236e570: 04 f4 f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f3 f3 f3 f3 0x10007236e580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007236e5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==9634==ABORTING
或者您可以使用这两个选项。 有用的链接: