Compilation Pipeline
The traditional compilation stages are: preprocessor (cpp) → compiler (cc1) → assembler (as) → linker (ld). In modern GCC (e.g., version 8+), the preprocessor is integrated into cc1. You can observe each step by passing -v to gcc.
Static Linking
Static linking combines a set of relocatable object files and libraries into one fully linked executable. The linker performs two main tasks:
- Symbol resolution: Associates each symbol reference (function or variable) with exactly one symbol definition.
- Relocation: Compilers and asssemblers generate code assuming addresses start at 0. The linker assigns final memory addreses to each symbol and modifies references accordingly.
Object File Types
- Relocatable object files (.o) – contain binary code and data that can be merged with other relocatable files at compile time.
- Executable object files (default a.out) – ready to be loaded into memory and executed.
- Shared object files (.so under Linux) – special relocatable files that can be loaded and linked at runtime or load time.
ELF Relocatable Object File Layout
On Linux, object files follow the Executable and Linkable Format. The ELF header is followed by sections described in the section header table. Key sections include:
- .text – compiled machine code
- .rodata – read-only data (e.g., format strings)
- .data – initialized global and static variables
- .bss – uninitialized global/static variables (takes no space in file)
- .symtab – symbol table for functions and global variables
- .rela.text – relocation entries for code section (note: .rela.data is rarely used now; modern toolchains use GOT‑based relocations)
- .debug – debugging symbols (present only with -g)
- .strtab – string table for symbols and section names
Symbol Resolution
Local and Global Symbols
Each relocatable module has a symbol table. Symbols fall in to three categories:
- Global symbols defined in the module and exported
- Global symbols referenced but defined elsewhere
- Local symbols (static) visible only inside the module
The linker resolves a reference by finding the corresponding definition in some input module. For global symbols, the compiler generates a linker symbol table entry, assuming the symbol is defined elsewhere. If not found during linking, an error occurs.
Handling Multiple Definitions
During compilation, symbols are classified by strength:
- Strong symbols: functions and initialized global variables
- Weak symbols: uninitialized global variables (placed in .common by modern compilers)
The linker follows these rules:
- Multiple strong symbols with the same name are forbidden.
- If a strong symbol and one or more weak symbols share a name, the strong one wins.
- If multiple weak symbols share a name, an arbitrary one is chosen.
Static Libraries
Static libraries (.a) are archives of relocatable object files. The linker scans input files sequentially (both .o and .a), maintaining three sets:
- E – set of object files to be merged into the executable
- U – unresolved symbols
- D – symbols already defined
For each archive, the linker examines its members to resolve entries in U. Any member that provides a needed symbol gets added to E and updates U and D. If after processing all inputs U is not empty, linking fails. Because the order matters, you may need to rearrange or repeat libraries on the command line.
Relocation
Relocation assigns runtime addresses to symbols and modifies references accordingly. The process has two steps:
- Merge sections of the same type into larger aggregate sections and assign virtual addresses.
- Patch every symbol reference in code and data with the correct runtime address.
Relocation Types (Modern x86-64)
- R_X86_64_PC32 – PC-relative 32‑bit offset
- R_X86_64_PLT32 – PLT-based 32‑bit offset (for lazy binding)
Older types like R_X86_64_32 are rarely seen in modern compilers (GCC 8+ uses GOT+PLT by default).
The relocation entry structure is defined as:
typedef struct {
long offset; // offset within the section
long type:32; // relocation type
long symbol:32; // symbol index
long addend; // constant adjustment
} Elf64_Rela;
A simplified relocation algorithm (for R_X86_64_PC32):
foreach section s {
foreach entry r in s.rela {
refptr = s + r.offset;
if (r.type == R_X86_64_PC32) {
refaddr = ADDR(s) + r.offset; // runtime address of the reference
*refptr = (ADDR(r.symbol) + r.addend - refaddr);
}
else if (r.type == R_X86_64_PLT32) {
// similar, but uses PLT stub address
*refptr = (ADDR_PLT(r.symbol) + r.addend - refaddr);
}
}
}
Example: Relocation in Action
Consider two source files:
/* main.c */
int sum(int *a, int n);
int array[2] = {1, 2};
int main() {
return sum(array, 2);
}
/* sum.c */
int sum(int *a, int n) {
int s = 0;
for (int i = 0; i < n; i++) s += a[i];
return s;
}
Compile with gcc -c main.c sum.c and examine main.o:
$ objdump -dx main.o
...
0000000000000000 <main>:
0: 48 83 ec 08 sub $0x8,%rsp
4: be 02 00 00 00 mov $0x2,%esi
9: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 10 <main+0x10>
c: R_X86_64_PC32 array-0x4
10: e8 00 00 00 00 callq 15 <main+0x15>
11: R_X86_64_PLT32 sum-0x4
15: 48 83 c4 08 add $0x8,%rsp
19: c3 retq
After linking (executable a.out):
$ objdump -dx a.out
...
0000000000001125 <main>:
1125: 48 83 ec 08 sub $0x8,%rsp
1129: be 02 00 00 00 mov $0x2,%esi
112e: 48 8d 3d f3 2e 00 00 lea 0x2ef3(%rip),%rdi # 4028 <array>
1135: e8 05 00 00 00 callq 113f <sum>
113a: 48 83 c4 08 add $0x8,%rsp
113e: c3 retq
The lea instruction at 0x112e uses PC relative addressing: 0x113a + 0x2ef3 = 0x4028, which is the runtime address of array. The callq to sum is resolved via the PLT stub.
Executable Object Files
An ELF executable resembles a relocatable object but adds an .init section containing the entry point _start. The loader invokes _start when the process begins.
Dynamic Libraries & Position-Independent Code (PIC)
Static libraries have two drawbacks:
- Each process holds its own copy in memory.
- Updating a library requires relinking all programs that use it.
Shared libraries solve this by allowing multiple processes to share the same code. To be shareable, the code must be position-independent: it can be placed anywhere in memory without needing relocation at load time.
PIC Data References
For global variables defined in a shared library, the compiler generates a Global Offset Table (GOT) right after the data segment. Because the distance between the code and data segments is constant at runtime, a PC-relative load from code into GOT gives the address of the variable. Example from a shared library libvector.so:
/* addvec.c */
int addcnt = 0;
void addvec(int *x, int *y, int *z, int n) {
addcnt++;
for (int i = 0; i < n; i++)
z[i] = x[i] + y[i];
}
$ objdump -dx libvector.so
...
00000000000010f5 <addvec>:
10f5: 4c 8b 05 e4 2e 00 00 mov 0x2ee4(%rip),%r8 # 3fe0 <addcnt>
...
Here 0x2ee4 is the constant offset from the code to the GOT entry for addcnt. The GOT entry will be filled by the dynamic linker when the library is loaded.
PIC Function Calls (Lazy Binding via PLT)
To avoid resolving every function at load time, modern systems use lazy binding with a Procedure Linkage Table (PLT) and the GOT. The PLT is an array of 16-byte stubs in the code section; the GOT holds pointers used to jump to the actual function.
$ objdump -dx a.out
...
Disassembly of section .plt:
0000000000001020 <.plt>:
1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) # 4008 <GOT+8>
1026: ff 25 e4 2f 00 00 jmpq *0x2fe4(%rip) # 4010 <GOT+16>
102c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000001030 <printf@plt>:
1030: ff 25 e2 2f 00 00 jmpq *0x2fe2(%rip) # 4018 <printf@GLIBC>
1036: 68 00 00 00 00 pushq $0x0
103b: e9 e0 ff ff ff jmpq 1020 <.plt>
0000000000001040 <addvec@plt>:
1040: ff 25 da 2f 00 00 jmpq *0x2fda(%rip) # 4020 <addvec>
1046: 68 01 00 00 00 pushq $0x1
104b: e9 d0 ff ff ff jmpq 1020 <.plt>
...
How it works for the first call to addvec:
- Code calls addvec@plt.
- The first instruction in addvec@plt jumps through GOT[addvec]. Initially that GOT entry points back to the next instruction (0x1046).
- The PLT stub pushes the function ID (0x1) and jumps to PLT[0].
- PLT[0] pushes a pointer to a resolver structure (via GOT[1]) and jumps to the dynamic linker (via GOT[2]).
- The dynamic linker resolves addvec and overwrites the corresponding GOT entry with the actual address. Then it transfers control to addvec.
Subsequent calls skip the resolution: the GOT entry already contains the real address, so the first jmpq goes directly to the function.
Final Thoughts
Understanding ELF format and the linking process is essential for debugging, performance analysis, and security. The modern toolchain (GCC 8+, ld, GOT+PLT) simplifies many details, but the core concepts (symbol resolution, relocation, PIC) remain unchanged.
References
- Computer Systems: A Programmer's Perspective, Chapter 7 – Linking