Understanding Linkers in Computer Systems: A Practical Deep Dive

Compilation Pipeline

The traditional compilation stages are: preprocessor (cpp) → compiler (cc1) → assembler (as) → linker (ld). In modern GCC (e.g., version 8+), the preprocessor is integrated into cc1. You can observe each step by passing -v to gcc.

Static Linking

Static linking combines a set of relocatable object files and libraries into one fully linked executable. The linker performs two main tasks:

Symbol resolution: Associates each symbol reference (function or variable) with exactly one symbol definition.
Relocation: Compilers and asssemblers generate code assuming addresses start at 0. The linker assigns final memory addreses to each symbol and modifies references accordingly.

Object File Types

Relocatable object files (.o) – contain binary code and data that can be merged with other relocatable files at compile time.
Executable object files (default a.out) – ready to be loaded into memory and executed.
Shared object files (.so under Linux) – special relocatable files that can be loaded and linked at runtime or load time.

ELF Relocatable Object File Layout

On Linux, object files follow the Executable and Linkable Format. The ELF header is followed by sections described in the section header table. Key sections include:

.text – compiled machine code
.rodata – read-only data (e.g., format strings)
.data – initialized global and static variables
.bss – uninitialized global/static variables (takes no space in file)
.symtab – symbol table for functions and global variables
.rela.text – relocation entries for code section (note: .rela.data is rarely used now; modern toolchains use GOT‑based relocations)
.debug – debugging symbols (present only with -g)
.strtab – string table for symbols and section names

Symbol Resolution

Local and Global Symbols

Each relocatable module has a symbol table. Symbols fall in to three categories:

Global symbols defined in the module and exported
Global symbols referenced but defined elsewhere
Local symbols (static) visible only inside the module

The linker resolves a reference by finding the corresponding definition in some input module. For global symbols, the compiler generates a linker symbol table entry, assuming the symbol is defined elsewhere. If not found during linking, an error occurs.

Handling Multiple Definitions

During compilation, symbols are classified by strength:

Strong symbols: functions and initialized global variables
Weak symbols: uninitialized global variables (placed in .common by modern compilers)

The linker follows these rules:

Multiple strong symbols with the same name are forbidden.
If a strong symbol and one or more weak symbols share a name, the strong one wins.
If multiple weak symbols share a name, an arbitrary one is chosen.

Static Libraries

Static libraries (.a) are archives of relocatable object files. The linker scans input files sequentially (both .o and .a), maintaining three sets:

E – set of object files to be merged into the executable
U – unresolved symbols
D – symbols already defined

For each archive, the linker examines its members to resolve entries in U. Any member that provides a needed symbol gets added to E and updates U and D. If after processing all inputs U is not empty, linking fails. Because the order matters, you may need to rearrange or repeat libraries on the command line.

Relocation

Relocation assigns runtime addresses to symbols and modifies references accordingly. The process has two steps:

Merge sections of the same type into larger aggregate sections and assign virtual addresses.
Patch every symbol reference in code and data with the correct runtime address.

Relocation Types (Modern x86-64)

R_X86_64_PC32 – PC-relative 32‑bit offset
R_X86_64_PLT32 – PLT-based 32‑bit offset (for lazy binding)

Older types like R_X86_64_32 are rarely seen in modern compilers (GCC 8+ uses GOT+PLT by default).

The relocation entry structure is defined as:

typedef struct {
    long offset;    // offset within the section
    long type:32;   // relocation type
    long symbol:32; // symbol index
    long addend;    // constant adjustment
} Elf64_Rela;

A simplified relocation algorithm (for R_X86_64_PC32):

foreach section s {
    foreach entry r in s.rela {
        refptr = s + r.offset;
        if (r.type == R_X86_64_PC32) {
            refaddr = ADDR(s) + r.offset;   // runtime address of the reference
            *refptr = (ADDR(r.symbol) + r.addend - refaddr);
        }
        else if (r.type == R_X86_64_PLT32) {
            // similar, but uses PLT stub address
            *refptr = (ADDR_PLT(r.symbol) + r.addend - refaddr);
        }
    }
}

Example: Relocation in Action

Consider two source files:

/* main.c */
int sum(int *a, int n);
int array[2] = {1, 2};
int main() {
    return sum(array, 2);
}

/* sum.c */
int sum(int *a, int n) {
    int s = 0;
    for (int i = 0; i < n; i++) s += a[i];
    return s;
}

Compile with gcc -c main.c sum.c and examine main.o:

$ objdump -dx main.o
...
0000000000000000 <main>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   be 02 00 00 00          mov    $0x2,%esi
   9:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 10 <main+0x10>
                        c: R_X86_64_PC32        array-0x4
  10:   e8 00 00 00 00          callq  15 <main+0x15>
                        11: R_X86_64_PLT32      sum-0x4
  15:   48 83 c4 08             add    $0x8,%rsp
  19:   c3                      retq

After linking (executable a.out):

$ objdump -dx a.out
...
0000000000001125 <main>:
    1125:       48 83 ec 08             sub    $0x8,%rsp
    1129:       be 02 00 00 00          mov    $0x2,%esi
    112e:       48 8d 3d f3 2e 00 00    lea    0x2ef3(%rip),%rdi        # 4028 <array>
    1135:       e8 05 00 00 00          callq  113f <sum>
    113a:       48 83 c4 08             add    $0x8,%rsp
    113e:       c3                      retq

The lea instruction at 0x112e uses PC relative addressing: 0x113a + 0x2ef3 = 0x4028, which is the runtime address of array. The callq to sum is resolved via the PLT stub.

Executable Object Files

An ELF executable resembles a relocatable object but adds an .init section containing the entry point _start. The loader invokes _start when the process begins.

Dynamic Libraries & Position-Independent Code (PIC)

Static libraries have two drawbacks:

Each process holds its own copy in memory.
Updating a library requires relinking all programs that use it.

Shared libraries solve this by allowing multiple processes to share the same code. To be shareable, the code must be position-independent: it can be placed anywhere in memory without needing relocation at load time.

PIC Data References

For global variables defined in a shared library, the compiler generates a Global Offset Table (GOT) right after the data segment. Because the distance between the code and data segments is constant at runtime, a PC-relative load from code into GOT gives the address of the variable. Example from a shared library libvector.so:

/* addvec.c */
int addcnt = 0;
void addvec(int *x, int *y, int *z, int n) {
    addcnt++;
    for (int i = 0; i < n; i++)
        z[i] = x[i] + y[i];
}

$ objdump -dx libvector.so
...
00000000000010f5 <addvec>:
    10f5:   4c 8b 05 e4 2e 00 00    mov    0x2ee4(%rip),%r8        # 3fe0 <addcnt>
...

Here 0x2ee4 is the constant offset from the code to the GOT entry for addcnt. The GOT entry will be filled by the dynamic linker when the library is loaded.

PIC Function Calls (Lazy Binding via PLT)

To avoid resolving every function at load time, modern systems use lazy binding with a Procedure Linkage Table (PLT) and the GOT. The PLT is an array of 16-byte stubs in the code section; the GOT holds pointers used to jump to the actual function.

$ objdump -dx a.out
...
Disassembly of section .plt:

0000000000001020 <.plt>:
    1020:   ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <GOT+8>
    1026:   ff 25 e4 2f 00 00       jmpq   *0x2fe4(%rip)        # 4010 <GOT+16>
    102c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000001030 <printf@plt>:
    1030:   ff 25 e2 2f 00 00       jmpq   *0x2fe2(%rip)        # 4018 <printf@GLIBC>
    1036:   68 00 00 00 00          pushq  $0x0
    103b:   e9 e0 ff ff ff          jmpq   1020 <.plt>

0000000000001040 <addvec@plt>:
    1040:   ff 25 da 2f 00 00       jmpq   *0x2fda(%rip)        # 4020 <addvec>
    1046:   68 01 00 00 00          pushq  $0x1
    104b:   e9 d0 ff ff ff          jmpq   1020 <.plt>
...

How it works for the first call to addvec:

Code calls addvec@plt.
The first instruction in addvec@plt jumps through GOT[addvec]. Initially that GOT entry points back to the next instruction (0x1046).
The PLT stub pushes the function ID (0x1) and jumps to PLT[0].
PLT[0] pushes a pointer to a resolver structure (via GOT[1]) and jumps to the dynamic linker (via GOT[2]).
The dynamic linker resolves addvec and overwrites the corresponding GOT entry with the actual address. Then it transfers control to addvec.

Subsequent calls skip the resolution: the GOT entry already contains the real address, so the first jmpq goes directly to the function.

Final Thoughts

Understanding ELF format and the linking process is essential for debugging, performance analysis, and security. The modern toolchain (GCC 8+, ld, GOT+PLT) simplifies many details, but the core concepts (symbol resolution, relocation, PIC) remain unchanged.

References

Computer Systems: A Programmer's Perspective, Chapter 7 – Linking

Tags: ELF linker GOT PLT PIC

Posted on Sun, 28 Jun 2026 17:02:53 +0000 by Pilli

Freaks City