一、前言

         通常,操作系统为了加载一个程序,会在编译后的代码的前面添加一个文件头,提供相应的定位信息,这样操作系统才能在加载EXE时将代码段、数据段加载到正确的内存位置。同时,有些编译器还会提供一些调试信息,如符号表等。如果是.o文件,通常称为relocatable file,这种文件没有经过链接,需要进行重定位,不可以执行。如果是EXE文件,称为executable file,经过连接器链接的可以直接执行,这时文件中的虚拟地址是最终的。操作系统可以设定加载的段基地址,也就是操作系统可以将整个EXE加载到任意位置,但是必须按照EXE中的信息将相应的段加载到合适的位置,相对距离不变,这样代码才能正确执行。提供文件头的EXE文件依赖于加载器的加载,如execve()系统调用,然而操作系统的初始阶段是没有加载器的,我们只能直接跳到某条指令开始执行,这时需要纯二进制文件(raw binary),代码的入口即为文件的第一条语句。有工具可以将EXE文件转换为纯二进制文件,即objcopy。这里,我们通过研究64位可执行文件的格式,以及利用工具objdump将编译后的机器指令反汇编为汇编指令,来了解一些EXE的信息。

二、求最大值的GNU汇编代码max.s

#开头的为注释,下同
#数据段
 .section .data
 data_items:
        .long 'H','E','L','L','O','_','W','O','R','L','D','!','!',0#使用long类型是为了看大端和小端
#代码段
 .section .text
#将入口地址声明为全局可见,默认是局部可见
 .globl  _start
 _start:
        #GNU汇编中左边是源操作数,右边是目标操作数,与intel汇编正好相反
        #常数要加$,不加$的符号视为地址,寄存器前面要加%
        movl $0, %edi
        movl data_items(,%edi,4), %eax  # (data_items+ 4*edi) →  eax
        #将data_items的第一个数据放入寄存器ebx中,ebx保存最大值
        movl %eax, %ebx# eax → ebx
 start_loop:
        #数据为0时结束,表示没有数据了
        cmpl $0, %eax
        je loop_exit
        incl %edi
        movl data_items(,%edi,4), %eax# (data_items+ 4*edi) →  eax
        cmpl %ebx, %eax
        jle start_loop# eax <= ebx
        movl %eax, %ebx# eax > ebx,赋给eax → ebx
        jmp start_loop
 loop_exit:
        movl $1, %eax# 1号系统调用,exit(ebx),结束进程
        int $0x80

三、编译和运行

环境:ubuntu 15.04

编译:gcc -c -o max.o max.s

链接:ld -o max max.o

运行./max

运行之后通过echo $?可以查看该命令的退出状态,该状态即为最大值,95。

gcc中有指示编译成32位的选项-m32,此时代码段和数据段的对齐就不会是0x200000,距离会变得比较短。对应ld要加-m elf_i386选项,指明为32位平台。

ld中有指示代码段的加载地址的选项-Ttext,如-Ttext 0,则加载地址为0

四、EXE文件的格式

4.1 查看max的ELF等定位信息

命令:

readelf -a max
 -a表示查看所有ELF信息
 可以得到如下的输出信息:
 
 ELF Header:
#EXE文件的魔数
   Class:                             ELF64
   Data:                              2's complement, little endian
   Version:                           1 (current)
   OS/ABI:                            UNIX - System V
   ABI Version:                       0
#是EXE文件
   Machine:                           Advanced Micro Devices X86-64
   Version:                           0x1
#程序入口地址,虚拟地址
#文件中program headers 的偏移
#文件中section headers的偏移
   Flags:                             0x0
#ELF header的大小
#program headers的大小
#program headers的个数
#section headers的大小
#section headers的个数
   Section header string table index: 3
  
 Section Headers:
   [Nr] Name              Type             Address           Offset
        Size              EntSize          Flags  Link  Info  Align
   [ 0]                   NULL             0000000000000000  00000000
        0000000000000000  0000000000000000           0     0     0
#代码段入口地址0x4000b0,文件偏移地址0xb0,大小为0x2d
   [ 1] .text             PROGBITS         00000000004000b0  000000b0
        000000000000002d  0000000000000000  AX       0     0     1
#数据段入口地址0x6000dd,文件偏移地址0xdd,大小为0x38
   [ 2] .data             PROGBITS         00000000006000dd  000000dd
        0000000000000038  0000000000000000  WA       0     0     1
#节名表入口地址0x0,文件偏移地址0x115,大小为0x27
   [ 3] .shstrtab         STRTAB           0000000000000000  00000115
        0000000000000027  0000000000000000           0     0     1
#符号表入口地址0x0,文件偏移地址0x140,大小为0x108
   [ 4] .symtab           SYMTAB           0000000000000000  00000140
        0000000000000108  0000000000000018           5     7     8
#字符串表入口地址0x0,文件偏移地址0x248,大小为0x48
   [ 5] .strtab           STRTAB           0000000000000000  00000248
        0000000000000048  0000000000000000           0     0     1
 Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
   I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
   O (extra OS processing required) o (OS specific), p (processor specific)
  
 There are no section groups in this file.
#program headers 提供段定位信息
 Program Headers:
   Type           Offset             VirtAddr           PhysAddr
                  FileSiz            MemSiz              Flags  Align
#代码段,读和可执行,虚拟地址0x400000 →物理地址0x400000,文件偏移0,
#长度为#0xdd,对齐为0x200000
#包含ELF header和代码段
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                  0x00000000000000dd 0x00000000000000dd  R E    200000
#数据段,读和写,虚拟地址0x6000dd →物理地址0x6000dd,文件偏移0xdd,长度为#0x38,对齐为0x200000
   LOAD           0x00000000000000dd 0x00000000006000dd 0x00000000006000dd
                  0x0000000000000038 0x0000000000000038  RW     200000
  
  Section to Segment mapping:
   Segment Sections...
    00     .text
    01     .data
  
 There is no dynamic section in this file.
  
 There are no relocations in this file.
  
 The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
#符号表:程序中的符号及其对应的地址
 Symbol table '.symtab' contains 11 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
      1: 00000000004000b0     0 SECTION LOCAL  DEFAULT    1
      2: 00000000006000dd     0 SECTION LOCAL  DEFAULT    2
      3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS max.o
  4: 00000000006000dd     0 NOTYPE  LOCAL  DEFAULT    2 data_items
     5: 00000000004000bf     0 NOTYPE  LOCAL  DEFAULT    1 start_loop
     6: 00000000004000d6     0 NOTYPE  LOCAL  DEFAULT    1 loop_exit
   7: 00000000004000b0     0 NOTYPE  GLOBAL DEFAULT    1 _start
      8: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
      9: 0000000000600115     0 NOTYPE  GLOBAL DEFAULT    2 _edata
     10: 0000000000600118     0 NOTYPE  GLOBAL DEFAULT    2 _end
  
 No version information found in this file.

4.2 反汇编代码

命令:objdump -d max
 -d表示反汇编
 输出:
 
 file format elf64-x86-64
 Disassembly of section .text:
#根据program headers提供的信息,最终代码段将加载到0x4000b0这个位置
 00000000004000b0 <_start>:
   4000b0: bf 00 00 00 00                     mov    $0x0,%edi
#data_items被换成0x6000dd,即数据段的起始地址
dd 00 60
   4000bc: 00
   4000bd: 89 c3                       mov    %eax,%ebx
#start_loop和loop_exit都被换掉
 00000000004000bf <start_loop>:
   4000bf: 83 f8 00                           cmp    $0x0,%eax
   4000c2: 74 12                       je     4000d6 <loop_exit>
   4000c4: ff c7                              inc    %edi
dd 00 60
   4000cd: 00
   4000ce: 39 d8                       cmp    %ebx,%eax
   4000d0: 7e ed                       jle    4000bf <start_loop>
   4000d2: 89 c3                       mov    %eax,%ebx
   4000d4: eb e9                       jmp    4000bf <start_loop>
 00000000004000d6 <loop_exit>:
   4000d6: b8 01 00 00 00                     mov    $0x1,%eax
   4000db: cd 80                       int    $0x80

4.3 max文件的二进制内容及对应关系

命令:xxd -g 1 max

查看整个文件,默认偏移为0

输出:

0000000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  .ELF............#ELF header
0000010: 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00  ..>.......@.....#偏移:0
0000020: 40 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00  @...............
0000030: 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00  ....@.8...@.....#长度:64B
 
#program headers
#偏移:0x40
 #长度: 56B x 2
01 00 00 00 06 00 00 00  .. ..........…
0000080: dd 00 00 00 00 00 00 00 dd 00 60 00 00 00 00 00  ..........`.....
0000090: dd 00 60 00 00 00 00 00 38 00 00 00 00 00 00 00  ..`.....8.......
00000a0: 38 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00  8......... .....
 
#代码段
#偏移:0xb0
48 00 00  ~............H..#长度:0x2d字节
00000e0: 00 45 00 00 00 4c 00 00 00 4c 00 00 00 4f 00 00  .E...L...L...O.. #数据段
00000f0: 00 5f 00 00 00 57 00 00 00 4f 00 00 00 52 00 00  ._...W...O...R..#偏移: 0xdd
0000100: 00 4c 00 00 00 44 00 00 00 21 00 00 00 21 00 00  .L...D...!...!..#长度: 0x38字节
0000110: 00 00 00 00 00 00 2e 73 79 6d 74 61 62 00 2e 73  .......symtab..s#节名表shstrtab
#偏移: 0x115
#长度: 0x27
 
0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#符号表.symtab
0000150: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00  ................#有11条目x 24字节
b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............#对应下面符号的地址
0000170: 00 00 00 00 03 00 02 00 dd 00 60 00 00 00 00 00  ..........`.....#偏移:0x140
0000180: 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff  ................#长度: 0x108
 0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00001a0: 07 00 00 00 00 00 02 00 dd 00 60 00 00 00 00 00  ..........`.....#data_items
00001b0: 00 00 00 00 00 00 00 00 12 00 00 00 00 00 01 00  ................#start_loop
bf 00 40
00001d0: 1d 00 00 00 00 00 01 00 d6 00 40 00 00 00 00 00  ..........@.....#loop_exit
00001e0: 00 00 00 00 00 00 00 00 27 00 00 00 10 00 01 00  ........'.......#_start
b0 00 40
0000200: 2e 00 00 00 10 00 02 00 15 01 60 00 00 00 00 00  ..........`.....#_bss_start
0000210: 00 00 00 00 00 00 00 00 3a 00 00 00 10 00 02 00  ........:.......
15 01 60 00 00 00 00 00 00 00 00 00 00 00 00 00  ..`.............#_edata
0000230: 41 00 00 00 10 00 02 00 18 01 60 00 00 00 00 00  A.........`.....#_end
0000240: 00 00 00 00 00 00 00 00 00 6d 61 78 2e 6f 00 64  .........max.o.d#字符串表strtab
#偏移: 0x248
#长度 : 0x46
 0000270: 73 74 61 72 74 00 5f 5f 62 73 73 5f 73 74 61 72  start.__bss_star
 0000280: 74 00 5f 65 64 61 74 61 00 5f 65 6e 64 00 00 00  t._edata._end…
  
0000290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#section headers
00002a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................偏移:0x290
00002b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#64B x 6
00002c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................#空
 00002d0: 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00  ................
b0 00 40 00 00 00 00 00 b0 00 00 00 00 00 00 00  ..@.............#.text
 00002f0: 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  -............…
 0000300: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0000310: 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00  !...............
0000320: dd 00 60 00 00 00 00 00 dd 00 00 00 00 00 00 00  ..`.............
0000330: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  8...............#.data
0000340: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000350: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00
#.shstrtab
 0000380: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0000390: 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ................
00003a0: 00 00 00 00 00 00 00 00 40 01 00 00 00 00 00 00  ........@.......
00003b0: 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00  ................#.symtab
00003c0: 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................
 00003d0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 48 02 00 00 00 00 00 00  ........H.......#.strtab
 00003f0: 46 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  F...............
 0000400: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................



关系图

5.1 EXE文件中的关系

 

linux支持运行 exe 的 docker linux可以运行exe文件吗_加载

注:箭头未必表示先后关系

5.2 代码文件的结构

ELF header : 64B

program headers : 56B x 2

.text : 45B

.data : 56B

.shstrtab : 39B

.symtab : 24B x 11

.strtab : 70B

section headers : 64B x 6

EXE文件与BIN文件的转换

6.1 抽取代码段和数据段


objcopy -O binary -R .note -R .comment max max_copy

表示将max输出为二进制文件,保存在max_copy中,忽略.note和.comment的字段。

6.2 查看代码段

命令:xxd -g 1 -l 256 max_copy

查看开头的256个字节

得到开头的代码段:


0000000: bf 00 00 00 00 67 8b 04 bd dd 00 60 00 89 c3 83  .....g.....`....
 0000010: f8 00 74 12 ff c7 67 8b 04 bd dd 00 60 00 39 d8  ..t...g.....`.9.
 0000020: 7e ed 89 c3 eb e9 b8 01 00 00 00 cd 80 00 00 00  ~...............
 0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
 00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

6.3 查看数据段

命令:xxd -g 1 -l 256 -s 0x20002d max_copy

-s 表示offset,从0x20002d(= 数据段加载地址0x6000dd - 代码段加载地址0x4000b0)

开始展示,-g 表示每组是1个字节的十六进制,-l表示展示256个字节。

得到数据段:

020002d: 48 00 00 00 45 00 00 00 4c 00 00 00 4c 00 00 00  H...E...L...L...
 020003d: 4f 00 00 00 5f 00 00 00 57 00 00 00 4f 00 00 00  O..._...W...O...
 020004d: 52 00 00 00 4c 00 00 00 44 00 00 00 21 00 00 00  R...L...D...!...
 020005d: 21 00 00 00 00 00 00 00                          !.......

可以看出,max_copy刚好只包含了代码段和数据段,且代码段位于文件开头。