如何按指令读取二进制可执行文件?

有没有办法以编程方式从x86架构上的二进制可执行文件中读取给定数量的指令?

如果我有一个简单的C程序hello.c的二进制文件:

 #include  int main(){ printf("Hello world\n"); return 0; } 

在使用gcc编译之后,反汇编函数main如下所示:

 000000000000063a : 63a: 55 push %rbp 63b: 48 89 e5 mov %rsp,%rbp 63e: 48 8d 3d 9f 00 00 00 lea 0x9f(%rip),%rdi # 6e4  645: e8 c6 fe ff ff callq 510  64a: b8 00 00 00 00 mov $0x0,%eax 64f: 5d pop %rbp 650: c3 retq 651: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 658: 00 00 00 65b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 

有没有一种简单的方法在C中读取主要的前三个指令(意思是字节55, 48, 89, e5, 48, 8d, 3d, 9f, 00, 00, 00 )? 不能保证函数看起来像这样 – 第一条指令可能具有所有不同的操作码和大小。

这将打印main函数的10个第一个字节,方法是获取函数的地址并转换为unsigned char的指针,以hex打印。

这个小片段不计算说明。 为此你需要一个指令大小表(不是很困难,只是单调乏味,除非你发现表已经完成, 每个asm指令的大小是多少? )能够预测给定第一个字节的每个指令的大小。

(当然,除非你所针对的处理器有一个固定的指令大小,这使得这个问题很容易解决)

调试器也必须解码操作数,但在某些情况下,如步或跟踪,我怀疑他们有一个方便的表来计算下一个断点地址。

 #include  int main(){ printf("Hello world\n"); const unsigned char *start = (const char *)&main; int i; for (i=0;i<10;i++) { printf("%x\n",start[i]); } return 0; } 

输出:

 Hello world 55 89 e5 83 e4 f0 83 ec 20 e8 

似乎与拆卸相匹配:)

 00401630 <_main>: 401630: 55 push %ebp 401631: 89 e5 mov %esp,%ebp 401633: 83 e4 f0 and $0xfffffff0,%esp 401636: 83 ec 20 sub $0x20,%esp 401639: e8 a2 01 00 00 call 4017e0 <___main> 
 .globl _start _start: bl main b . .globl main main: add r1,#1 add r2,#1 add r3,#1 add r4,#1 b main 

故意错误的架构,架构无关紧要。 将它构建成一个非常流行的elf文件格式,它只是一种文件格式,这是我理解你的问题,读取文件,而不是修改二进制文件从内存中读取程序运行时。

它非常受欢迎,并且有一些工具可以帮助您了解如何运行。

 Disassembly of section .text: 00001000 <_start>: 1000: eb000000 bl 1008 
1004: eafffffe b 1004 <_start+0x4> 00001008
: 1008: e2811001 add r1, r1, #1 100c: e2822001 add r2, r2, #1 1010: e2833001 add r3, r3, #1 1014: e2844001 add r4, r4, #1 1018: eafffffa b 1008

如果我虽然hexdump文件

 00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 00000010 02 00 28 00 01 00 00 00 00 10 00 00 34 00 00 00 |..(.........4...| 00000020 c0 11 00 00 00 02 00 05 34 00 20 00 01 00 28 00 |........4. ...(.| 00000030 06 00 05 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| 00000040 00 00 00 00 1c 10 00 00 1c 10 00 00 05 00 00 00 |................| 00000050 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2 |............. ..| 00001010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11 00 00 |.0...@......A...| 00001020 00 61 65 61 62 69 00 01 07 00 00 00 08 01 00 00 |.aeabi..........| 00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001040 00 00 00 00 00 10 00 00 00 00 00 00 03 00 01 00 |................| 00001050 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................| 00001060 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................| 00001070 06 00 00 00 00 10 00 00 00 00 00 00 00 00 01 00 |................| 00001080 18 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 00001090 09 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 000010a0 17 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 000010b0 55 00 00 00 00 10 00 00 00 00 00 00 10 00 01 00 |U...............| 000010c0 23 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |#...............| 000010d0 2f 00 00 00 08 10 00 00 00 00 00 00 10 00 01 00 |/...............| 000010e0 34 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |4...............| 000010f0 3c 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |<...............| 00001100 43 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |C...............| 00001110 48 00 00 00 00 00 08 00 00 00 00 00 10 00 01 00 |H...............| 00001120 4f 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |O...............| 00001130 00 73 6f 2e 6f 00 24 61 00 5f 5f 62 73 73 5f 73 |.so.o.$a.__bss_s| 00001140 74 61 72 74 5f 5f 00 5f 5f 62 73 73 5f 65 6e 64 |tart__.__bss_end| 00001150 5f 5f 00 5f 5f 62 73 73 5f 73 74 61 72 74 00 6d |__.__bss_start.m| 00001160 61 69 6e 00 5f 5f 65 6e 64 5f 5f 00 5f 65 64 61 |ain.__end__._eda| 00001170 74 61 00 5f 65 6e 64 00 5f 73 74 61 63 6b 00 5f |ta._end._stack._| 00001180 5f 64 61 74 61 5f 73 74 61 72 74 00 00 2e 73 79 |_data_start...sy| 00001190 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e 73 68 |mtab..strtab..sh| 000011a0 73 74 72 74 61 62 00 2e 74 65 78 74 00 2e 41 52 |strtab..text..AR| 000011b0 4d 2e 61 74 74 72 69 62 75 74 65 73 00 00 00 00 |M.attributes....| 000011c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000011e0 00 00 00 00 00 00 00 00 1b 00 00 00 01 00 00 00 |................| 000011f0 06 00 00 00 00 10 00 00 00 10 00 00 1c 00 00 00 |................| 00001200 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................| 00001210 21 00 00 00 03 00 00 70 00 00 00 00 00 00 00 00 |!......p........| 00001220 1c 10 00 00 12 00 00 00 00 00 00 00 00 00 00 00 |................| 00001230 01 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................| 00001240 00 00 00 00 00 00 00 00 30 10 00 00 00 01 00 00 |........0.......| 00001250 04 00 00 00 05 00 00 00 04 00 00 00 10 00 00 00 |................| 00001260 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................| 00001270 30 11 00 00 5c 00 00 00 00 00 00 00 00 00 00 00 |0...\...........| 00001280 01 00 00 00 00 00 00 00 11 00 00 00 03 00 00 00 |................| 00001290 00 00 00 00 00 00 00 00 8c 11 00 00 31 00 00 00 |............1...| 000012a0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 000012b0 

可以谷歌文件格式,并在维基百科上找到很多信息,其中一个链接更多的微笑

有用的标题信息

 00 10 00 00 entrh 34 00 00 00 phoff c0 11 00 00 shoff 00 02 00 05 flags 34 00 ehsize 20 00 phentsize 01 00 phnum 28 00 shentsize 06 00 shnum 05 00shstrndx 

所以,如果我看一下这些部分的开头,就会看到它们的数量

 0x11C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x11E8 1b 00 00 00 01 00 00 00 06 00 00 00 00 10 00 00 00 10 00 00 0x1210 21 00 00 00 03 00 00 70 00 00 00 00 00 00 00 00 1c 10 00 00 0x1238 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 30 10 00 00 0x1260 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 30 11 00 00 0x1288 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 8c 11 00 00 

0x1260 strtab类型偏移量0x1130,它会被分成空终止字符串,直到您达到双重空值

 [0] 00 [1] 73 6f 2e 6f 00 so.o [2] 24 61 00 $a [3] 5f 5f 62 73 73 5f 73 74 61 72 74 5f 5f 00 __bss_start__ [4] 5f 5f 62 73 73 5f 65 6e 64 5f 5f 00 __bss_end__ [5] 5f 5f 62 73 73 5f 73 74 61 72 74 00 __bss_start [6] 6d 61 69 6e 00 main ... 

main位于文件中的地址0x115F,在strtab中偏移0x2F。

0x1238 symtab从每个条目的0x1030,0x10或16字节开始

 00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001040 00 00 00 00 00 10 00 00 00 00 00 00 03 00 01 00 |................| 00001050 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................| 00001060 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................| 00001070 06 00 00 00 00 10 00 00 00 00 00 00 00 00 01 00 |................| 00001080 18 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 00001090 09 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 000010a0 17 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |................| 000010b0 55 00 00 00 00 10 00 00 00 00 00 00 10 00 01 00 |U...............| 000010c0 23 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |#...............| 000010d0 2f 00 00 00 08 10 00 00 00 00 00 00 10 00 01 00 |/...............| 000010e0 34 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |4...............| 000010f0 3c 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |<...............| 00001100 43 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |C...............| 00001110 48 00 00 00 00 00 08 00 00 00 00 00 10 00 01 00 |H...............| 00001120 4f 00 00 00 1c 10 01 00 00 00 00 00 10 00 01 00 |O...............| 

000010d0 2f 00 00 00在符号表中有0x2f偏移,所以这是主要的,从这个条目处理器内存中的地址08 10 00 00或0x1008,遗憾的是由于我选择它的值恰好也是文件偏移量,不要那么困惑。

此部分是类型00000001 PROGBITS

 0x11E8 1b 00 00 00 01 00 00 00 06 00 00 00 00 10 00 00 00 10 00 00 offset 0x1000 in the file 0x1C bytes 

这是程序,机器代码。

 00001000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2 00001010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11 

所以从内存偏移0x1008开始,这是入口点后的8个字节(不幸的是我选择了一个坏地址使用)我们需要将0x8字节偏移量转换为此数据

 01 10 81 e2 01 20 82 e2 00001008 
: 1008: e2811001 add r1, r1, #1 100c: e2822001 add r2, r2, #1 1010: e2833001 add r3, r3, #1

这完全取决于文件,cpu可能不太关心标签,主要只对人类有意义,而不是cpu。

如果我将精灵转换成完全可执行的其他格式:

摩托罗拉的记录:

 S00A0000736F2E7372656338 S1131000000000EBFEFFFFEA011081E2012082E212 S10F1010013083E2014084E2FAFFFFEAB1 S9031000EC 

原始二进制图像

 hexdump -C so.bin 00000000 00 00 00 eb fe ff ff ea 01 10 81 e2 01 20 82 e2 |............. ..| 00000010 01 30 83 e2 01 40 84 e2 fa ff ff ea |.0...@......| 0000001c 

当然,感兴趣的指令字节存在,但符号信息不是。 这取决于您感兴趣的文件格式1)如果您可以找到“main”然后2)打印出该地址的前几个字节。

嗯,有点令人不安,但如果你链接为0x2000 gnu ld烧掉一些磁盘空间并将偏移量放在0x2000,但选择0x20000000并且它会烧掉更多的磁盘空间但不会那么多

 000100d0 2f 00 00 00 08 00 00 20 00 00 00 00 10 00 01 00 

显示文件偏移量为0x010010,但目标空间中的地址为0x20000008

 00010010 01 30 83 e2 01 40 84 e2 fa ff ff ea 41 11 00 00 00010020 00 61 65 61 62 69 00 01 07 00 00 00 08 01 

只是为了演示/强制执行文件偏移和目标内存空间地址是两回事。

对于您想要做的事情,这是一个非常好的格式

 arm-none-eabi-objcopy -O symbolsrec so.elf so.srec cat so.srec $$ so.srec $a $20000000 _bss_end__ $2001001c __bss_start__ $2001001c __bss_end__ $2001001c _start $20000000 __bss_start $2001001c main $20000008 __end__ $2001001c _edata $2001001c _end $2001001c _stack $80000 __data_start $2001001c $$ S0090000736F2E686578A1 S31520000000000000EBFEFFFFEA011081E2012082E200 S31120000010013083E2014084E2FAFFFFEA9F S70520000000DA