程序员的自我修养-读书笔记-目标文件格式

程序员的自我修养-读书笔记

  • 可执行文件生成流程:

    预处理(.i) ->编译(.s) ->汇编(.o) ->链接(a.out)

  • 分步骤生成可执行文件:(以hello.c为例

1
2
3
4
gcc -E hello.c -o hello.i    预处理
gcc -S hello.i -o hello.s 编译
gcc -c hello.s -o hello.o 汇编,也可使用as命令 as hello.s -o hello.o
ld -static /usr/lib/crt1.o /usr/lib/crti.o ... -L/lib hello.o 链接
  • 链接过程主要包括:地址和空间分配,符号决议,重定位

  • 目标文件:

    源代码编译后生成目标文件(.o) 格式跟可执行文件类似,Windows下为 PE,Linux下为ELF,macOS下为Mach-O
    动态库.dll / .so、静态库 .a / 、可执行文件 a.out / .ext、目标文件.o / .obj 都有相似的格式。
    目标文件组织格式为:header + sections

  • 查看文件目标文件格式使用file命令

1
2
3
4
5
$ file hello.o    # linux-arm
hello.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped

$ file hello.o # macOS-x64
hello.o: Mach-O 64-bit object x86_64

查看目标文件 header 内容

header 输出
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
$ objdump -x hello.o    # linux-arm
hello.o: file format elf32-littlearm
hello.o
architecture: arm, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
private flags = 5000000: [Version5 EABI]

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000064 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 00000000 00000000 00000098 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 000000a0 2**0
ALLOC
3 .rodata 00000004 00000000 00000000 000000a0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 00000024 00000000 00000000 000000a4 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 00000000 00000000 000000c8 2**0
CONTENTS, READONLY
6 .ARM.attributes 0000002f 00000000 00000000 000000c8 2**0
CONTENTS, READONLY
SYMBOL TABLE:
00000000 l df *ABS* 00000000 hello.c
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .rodata 00000000 .rodata
00000004 l O .data 00000004 static_int.5961
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .comment 00000000 .comment
00000000 l d .ARM.attributes 00000000 .ARM.attributes
00000000 g O .data 00000004 global_int
00000000 g F .text 0000002c foo
00000000 *UND* 00000000 printf
0000002c g F .text 00000038 main


RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000018 R_ARM_CALL printf
00000028 R_ARM_ABS32 .rodata
00000050 R_ARM_CALL foo


$ objdump -x -macho hello.o # macOS-x64
hello.o:
Relocation information (__TEXT,__text) 3 entries
address pcrel length extern type scattered symbolnum/value
0000004f True long True BRANCH False _foo
00000018 True long True BRANCH False _printf
00000011 True long False SIGNED False 3 (__TEXT,__cstring)
Relocation information (__LD,__compact_unwind) 2 entries
address pcrel length extern type scattered symbolnum/value
00000020 False quad False UNSIGND False 1 (__TEXT,__text)
00000000 False quad False UNSIGND False 1 (__TEXT,__text)
Sections:
Idx Name Size VMA Type
0 __text 0000005c 0000000000000000 TEXT
1 __data 00000008 000000000000005c DATA
2 __cstring 00000004 0000000000000064 DATA
3 __compact_unwind 00000040 0000000000000068 DATA
4 __eh_frame 00000068 00000000000000a8 DATA

SYMBOL TABLE:
0000000000000060 l O __DATA,__data _main.static_int
0000000000000000 g F __TEXT,__text _foo
000000000000005c g O __DATA,__data _global_int
0000000000000030 g F __TEXT,__text _main
0000000000000000 *UND* _printf
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 X86_64 ALL 0x00 OBJECT 4 600 SUBSECTIONS_VIA_SYMBOLS
Load command 0
cmd LC_SEGMENT_64
cmdsize 472
segname
vmaddr 0x0000000000000000
vmsize 0x0000000000000110
fileoff 632
filesize 272
maxprot rwx
initprot rwx
nsects 5
flags (none)
Section
sectname __text
segname __TEXT
addr 0x0000000000000000
size 0x000000000000005c
offset 632
align 2^4 (16)
reloff 904
nreloc 3
type S_REGULAR
attributes PURE_INSTRUCTIONS SOME_INSTRUCTIONS
reserved1 0
reserved2 0
Section
sectname __data
segname __DATA
addr 0x000000000000005c
size 0x0000000000000008
offset 724
align 2^2 (4)
reloff 0
nreloc 0
type S_REGULAR
attributes (none)
reserved1 0
reserved2 0
Section
sectname __cstring
segname __TEXT
addr 0x0000000000000064
size 0x0000000000000004
offset 732
align 2^0 (1)
reloff 0
nreloc 0
type S_CSTRING_LITERALS
attributes (none)
reserved1 0
reserved2 0
Section
sectname __compact_unwind
segname __LD
addr 0x0000000000000068
size 0x0000000000000040
offset 736
align 2^3 (8)
reloff 928
nreloc 2
type S_REGULAR
attributes DEBUG
reserved1 0
reserved2 0
Section
sectname __eh_frame
segname __TEXT
addr 0x00000000000000a8
size 0x0000000000000068
offset 800
align 2^3 (8)
reloff 0
nreloc 0
type S_COALESCED
attributes NO_TOC STRIP_STATIC_SYMS LIVE_SUPPORT
reserved1 0
reserved2 0
Load command 1
cmd LC_BUILD_VERSION
cmdsize 24
platform macos
sdk 10.15.6
minos 10.15
ntools 0
Load command 2
cmd LC_SYMTAB
cmdsize 24
symoff 944
nsyms 5
stroff 1024
strsize 52
Load command 3
cmd LC_DYSYMTAB
cmdsize 80
ilocalsym 0
nlocalsym 1
iextdefsym 1
nextdefsym 3
iundefsym 4
nundefsym 1
tocoff 0
ntoc 0
modtaboff 0
nmodtab 0
extrefsymoff 0
nextrefsyms 0
indirectsymoff 0
nindirectsyms 0
extreloff 0
nextrel 0
locreloff 0
nlocrel 0

查看目标文件size

size 结果
1
2
3
4
5
6
7
$ size hello.o    # linux-arm
text data bss dec hex filename
104 8 0 112 70 hello.o

$ size hello.o # macOS-x64
__TEXT __DATA __OBJC others dec hex
200 8 0 64 272 110

查看目标文件结构

不同的平台,目标文件结构不同,生成的section也有差异,但是最基本的代码段.text、数据段.data、只读段.rodata / __cstring 都是通用的, bss段为未初始化全局或静态变量,默认都为0,不占据目标文件空间

objdump -h 结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ objdump -h hello.o   # linux-arm
hello.o: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000064 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 00000000 00000000 00000098 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 000000a0 2**0
ALLOC
3 .rodata 00000004 00000000 00000000 000000a0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 00000024 00000000 00000000 000000a4 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 00000000 00000000 000000c8 2**0
CONTENTS, READONLY
6 .ARM.attributes 0000002f 00000000 00000000 000000c8 2**0
CONTENTS, READONLY

$ objdump -h hello.o # macOS-x64
hello.o: file format Mach-O 64-bit x86-64
Sections:
Idx Name Size VMA Type
0 __text 0000005c 0000000000000000 TEXT
1 __data 00000008 000000000000005c DATA
2 __cstring 00000004 0000000000000064 DATA
3 __compact_unwind 00000040 0000000000000068 DATA
4 __eh_frame 00000068 00000000000000a8 DATA

查看目标文件内容,并反编译汇编代码

objdump -s -d 结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
$ objdump -s -d hello.o   # linux-arm
hello.o: file format elf32-littlearm

Contents of section .text:
0000 00482de9 04b08de2 08d04de2 08000be5 .H-.......M.....
0010 08101be5 0c009fe5 feffffeb 0000a0e1 ................
0020 04d04be2 0088bde8 00000000 00482de9 ..K..........H-.
0030 04b08de2 08d04de2 0130a0e3 08300be5 ......M..0...0..
0040 08201be5 0c301be5 033082e0 0300a0e1 . ...0...0......
0050 feffffeb 0c301be5 0300a0e1 04d04be2 .....0........K.
0060 0088bde8 ....
Contents of section .data:
0000 a5000000 5a000000 ....Z...
Contents of section .rodata:
0000 256c640a 00 %ld..
Contents of section .comment:
0000 00474343 3a202852 61737062 69616e20 .GCC: (Raspbian
0010 382e332e 302d362b 72706931 2920382e 8.3.0-6+rpi1) 8.
0020 332e3000 3.0.
Contents of section .ARM.attributes:
0000 412e0000 00616561 62690001 24000000 A....aeabi..$...
0010 05360006 06080109 010a0212 04140115 .6..............
0020 01170318 0119011a 021c011e 062201 .............".

Disassembly of section .text:

00000000 <foo>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: e24dd008 sub sp, sp, #8
c: e50b0008 str r0, [fp, #-8]
10: e51b1008 ldr r1, [fp, #-8]
14: e59f000c ldr r0, [pc, #12] ; 28 <foo+0x28>
18: ebfffffe bl 0 <printf>
1c: e1a00000 nop ; (mov r0, r0)
20: e24bd004 sub sp, fp, #4
24: e8bd8800 pop {fp, pc}
28: 00000000 .word 0x00000000

0000002c <main>:
2c: e92d4800 push {fp, lr}
30: e28db004 add fp, sp, #4
34: e24dd008 sub sp, sp, #8
38: e3a03001 mov r3, #1
3c: e50b3008 str r3, [fp, #-8]
40: e51b2008 ldr r2, [fp, #-8]
44: e51b300c ldr r3, [fp, #-12]
48: e0823003 add r3, r2, r3
4c: e1a00003 mov r0, r3
50: ebfffffe bl 0 <foo>
54: e51b300c ldr r3, [fp, #-12]
58: e1a00003 mov r0, r3
5c: e24bd004 sub sp, fp, #4
60: e8bd8800 pop {fp, pc}

$ objdump -s -d hello.o # macOS-x64
hello.o: file format Mach-O 64-bit x86-64

Contents of section __text:
0000 554889e5 4883ec10 897dfc8b 75fc488d UH..H....}..u.H.
0010 3d4f0000 00b000e8 00000000 4883c410 =O..........H...
0020 5dc3662e 0f1f8400 00000000 0f1f4000 ].f...........@.
0030 554889e5 4883ec10 c745fc00 000000c7 UH..H....E......
0040 45f80100 00008b45 f80345f4 89c7e800 E......E..E.....
0050 0000008b 45f44883 c4105dc3 ....E.H...].
Contents of section __data:
005c a5000000 5a000000 ....Z...
Contents of section __cstring:
0064 25640a00 %d..
Contents of section __compact_unwind:
0068 00000000 00000000 22000000 00000001 ........".......
0078 00000000 00000000 00000000 00000000 ................
0088 30000000 00000000 2c000000 00000001 0.......,.......
0098 00000000 00000000 00000000 00000000 ................
Contents of section __eh_frame:
00a8 14000000 00000000 017a5200 01781001 .........zR..x..
00b8 100c0708 90010000 24000000 1c000000 ........$.......
00c8 38ffffff ffffffff 22000000 00000000 8.......".......
00d8 00410e10 8602430d 06000000 00000000 .A....C.........
00e8 24000000 44000000 40ffffff ffffffff $...D...@.......
00f8 2c000000 00000000 00410e10 8602430d ,........A....C.
0108 06000000 00000000 ........

Disassembly of section __TEXT,__text:

0000000000000000 _foo:
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: 48 83 ec 10 subq $16, %rsp
8: 89 7d fc movl %edi, -4(%rbp)
b: 8b 75 fc movl -4(%rbp), %esi
e: 48 8d 3d 4f 00 00 00 leaq 79(%rip), %rdi
15: b0 00 movb $0, %al
17: e8 00 00 00 00 callq 0 <_foo+0x1c>
1c: 48 83 c4 10 addq $16, %rsp
20: 5d popq %rbp
21: c3 retq
22: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
2c: 0f 1f 40 00 nopl (%rax)

0000000000000030 _main:
30: 55 pushq %rbp
31: 48 89 e5 movq %rsp, %rbp
34: 48 83 ec 10 subq $16, %rsp
38: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
3f: c7 45 f8 01 00 00 00 movl $1, -8(%rbp)
46: 8b 45 f8 movl -8(%rbp), %eax
49: 03 45 f4 addl -12(%rbp), %eax
4c: 89 c7 movl %eax, %edi
4e: e8 00 00 00 00 callq 0 <_main+0x23>
53: 8b 45 f4 movl -12(%rbp), %eax
56: 48 83 c4 10 addq $16, %rsp
5a: 5d popq %rbp
5b: c3 retq

hello.c源码

hello.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <stdio.h>

int global_int = 0xa5;

void foo(int i)
{
printf("%d\n", i);
}

int main(void)
{
static int static_int = 0x5a;

int b = 1;
int c;

foo(b + c);

return c;
}