x86-64 Machine-Level Program Representation: Registers, Instructions, and Control Flow

x86-64 Register Set and Usage

The x86-64 architecture includes 16 general-purpose registers that store 64-bit values. While these register are general-purpose, they follow conventional usage patterns. The registers r8 through r15 were added in the 80386 architecture extension.

%rax: Return value register
%rbx: Callee-saved register
%rcx: 4th function parameter
%rdx: 3rd function parameter
%rsi: 2nd function parameter
%rdi: 1st function parameter
%rbp: Callee-saved register (base pointer)
%rsp: Stack pointer (points to stack top)
%r8: 5th function parameter
%r9: 6th function parameter
%r10: Caller-saved register
%r11: Caller-saved register
%r12: Callee-saved register
%r13: Callee-saved register
%r14: Callee-saved register
%r15: Callee-saved register

Instruction Operands and Addresssing Modes

Most instructions operate on one or more operands that specify source values and destination locations. Operands can be categorized into three types:

Immediate values: Represent constant values
Registers: Access register contents
Memory references: Access memory locations via computed addresses

Addressing Modes

In AT&T syntax, immediate values are prefixed with $ followed by a standard C integer notation. We'll use r_a to represent any register a, and R[r_a] to denote its value, treating registers as an array indexed by register identifiers.

Memory is conceptualized as a large byte array, where M_b[Addr] references b bytes starting at address Addr. For simplicity, we'll omit the byte count subscript.

Type	Format	Operand Value	Name
Immediate	$$Imm $	Imm	Immediate addressing
Register	r_a	R[r_a]	Register addressing
Memory	Imm	M[Imm]	Absolute addressing
Memory	(r_a)	M[R[r_a]]	Indirect addressing
Memory	Imm(r_b)	M[Imm + R[r_b]]	Base + offset addressing
Memory	(r_b, r_i)	M[R[r_b] + R[r_i]]	Indexed addressing
Memory	Imm(r_b, r_i)	M[Imm + R[r_b] + R[r_i]]	Indexed addressing
Memory	(, r_i, s)	M[R[r_i] * s]	Scaled indexed addressing
Memory	Imm(, r_i, s)	M[Imm + R[r_i] * s]	Scaled indexed addressing
Memory	(r_b, r_i, s)	M[R[r_b] + R[r_i] * s]	Scaled indexed addressing
Memory	Imm(r_b, r_i, s)	M[Imm + R[r_b] + R[r_i] * s]	Scaled indexed addressing

Data Transfer Instructions

Instruction	Effect	Description
MOV S, D	D ← S	Move
movabsq I, R	R ← I	Move absolute quadword
MOVZ S, R	R ← Zero-extend(S)	Move with zero extension
MOVS S, R	R ← Sign-extend(S)	Move with sign extension
movsbw S, R		Move sign-extended byte to word
cltq	%rax ← Sign-extend(%eax)	Extend %eax to %rax

Arithmetic and Logical Operations

Instruction	Effect	Description
leaq S, D	D ← &S	Load effective address
INC D	D ← D + 1	Increment
DEC D	D ← D - 1	Decrement
NEG D	D ← -D	Negate
NOT D	D ← ~D	Bitwise NOT
ADD S, D	D ← D + S	Add
SUB S, D	D ← D - S	Subtract
IMUL S, D	D ← D * S	Multiply
XOR S, D	D ← D ^ S	Bitwise XOR
OR S, D	D ← D \| S	Bitwise OR
AND S, D	D ← D & S	Bitwise AND
SAL k, D	D ← D << k	Shift left
SHL k, D	D ← D << k	Shift left (same as SAL)
SAR k, D	D ← D >>_A k	Arithmetic shift right
SHR k, D	D ← D >>_L k	Logical shift right

Special Arithmetic Operations

These instructions support 128-bit (8-word) products of two 64-bit numbers and integer division, operating on high and low 64-bit parts separately.

Instruction	Effect	Description
imulq S	R[%rdx]:R[%rax] ← S × R[%rax]	Signed full multiplication
mulq S	R[%rdx]:R[%rax] ← S × R[%rax]	Unsigned full multiplication
cltq	R[%rdx]:R[%rax] ← Sign-extend R[%rax]	Convert to octword
idivq S	R[%rdx] ← R[%rdx]:R[%rax] mod S R[%rax] ← R[%rdx]:R[%rax] ÷ S	Signed division
divq S	R[%rdx] ← R[%rdx]:R[%rax] mod S R[%rax] ← R[%rdx]:R[%rax] ÷ S	Unsigned division

Control Flow

Condition Codes

CF: Carry flag. Set when most significant bit generates a carry. Used for unsigned overflow detection.
ZF: Zero flag. Set when result is zero.
SF: Sign flag. Set when result is negative.
OF: Overflow flag. Set when two's complement overflow occurs.

Condition codes are modified by:

Arithmetic and logical operations
Comparison and test instructions

Comparison and Test Instructions

These instructions set condition codes without modifying register values.

Instruction	Effect	Description
CMP S_1, S_2	S_2 - S_1	Compare
TEST S_1, S_2	S_1 & S_2	Test

Condition Code Access

Instruction	Synonym	Affect	Description
sete D	setz	D ← ZF	Equal/zero
setne D	setnz	D ← ~ZF	Not equal/non-zero
sets D		D ← SF	Negative
setns D		D ← ~SF	Non-negative
setg D	setnle	D ← ~(SF ∧ OF) & ~ZF	Greater (signed >)
setge D	setnl	D ← ~(SF ∧ OF)	Greater or equal (signed >=)
setl D	setnge	D ← SF ∧ OF	Less (signed <)
setle D	setng	D ← ~(SF ∧ OF) \| ZF	Less or equal (signed <=)
seta D	setnbe	D ← ~CF & ~ZF	Greater (unsigned >)
setae D	setnb	D ← ~CF	Greater or equal (unsigned >=)
setb D	setnae	D ← CF	Less (unsigned <)
setbe D	setna	D ← CF \| ZF	Less or equal (unsigned <=)

Jump Instructions

Instruction	Synonym	Jump Condition	Description
jmp Label		1	Unconditional jump
jmp *Operand		1	Indirect jump
je Label	jz	ZF	Equal/zero
jne Label	jnz	~ZF	Not equal/non-zero
js Label		SF	Negative
jns Label		~SF	Non-negative
jg Label	jnle	~(SF ∧ OF) & ~ZF	Greater (signed >)
jge Label	jnl	~(SF ∧ OF)	Greater or equal (signed >=)
jl Label	jnge	SF ∧ OF	Less (signed <)
jle Label	jng	~(SF ∧ OF) \| ZF	Less or equal (signed <=)
ja Label	jnbe	~CF & ~ZF	Greater (unsigned >)
jae Label	jnb	~CF	Greater or equal (unsigned >=)
jb Label	jnae	CF	Less (unsigned <)
jbe Label	jna	CF \| ZF	Less or equal (unsigned <=)

Jump instructions typically encode the difference between the target address and the instruction following the jump. Consider this C code:

int example_function() {
    for (int i = 0; i < 3; i++)
        if (i == 1)
            return 1;
    return 0;
}

The corresponding assembly:

example_function:
    push %rbp
    mov %rsp, %rbp
    movl $0x0, -0x4(%rbp)
    jmp 1e <example_function+0x1e>
    cmpl $0x1, -0x4(%rbp)
    jne 1a <example_function+0x1a>
    mov $0x1, %eax
    jmp 29 <example_function+0x29>
    addl $0x1, -0x4(%rbp)
    cmpl $0x2, -0x4(%rbp)
    jle d <example_function+0xd>
    mov $0x0, %eax
    pop %rbp
    retq

The first jump at address b targets 1e, calculated as 11 + d. Note that:

Addresses are unsigned
Relative addresses are signed

For the jle instruction at address 0x22, the target is d = 0x24(unsigned) + 0xe9(-0x17,signed), or equivalently, the unsigned sum of addresses modulo overflow. This aligns with the PC pointing to the next instruction, facilitating linking.

Procedure Calls

Procedures (functions) require machine-level support for:

Control transfer: Setting PC to procedure entry and returning to caller
Parameter passing: Transferring arguments between caller and callee
Memory management: Allocating and freeing local variables

Stack Operations

Instruction	Effect	Description
pushq S	R[%rsp] ← R[%rsp] - 8 M[R[%rsp]] ← S	Push quadword
popq D	D ← M[R[%rsp]] R[%rsp] ← R[%rsp] + 8	Pop quadword

Consider this example:

long calculate_sum(long x1, long x2, long x3) {
    return x1 + x2 + x3;
}

long aggregate(long p1, long p2, long p3, long p4, long p5, long p6, long p7, long p8) {
    long t1 = calculate_sum(p1, p2, p3);
    long t2 = calculate_sum(p4, p5, p6);
    long t3 = calculate_sum(p7, p8, 0);
    long total = t1 + t2 + t3;
    return total;
}

int main() {
    long result = aggregate(1, 2, 3, 4, 5, 6, 7, 8);
}

Assembly output:

calculate_sum:
    push %rbp
    mov %rsp, %rbp
    mov %rdi, -0x8(%rbp)
    mov %rsi, -0x10(%rbp)
    mov %rdx, -0x18(%rbp)
    mov -0x8(%rbp), %rdx
    mov -0x10(%rbp), %rax
    add %rax, %rdx
    mov -0x18(%rbp), %rax
    add %rdx, %rax
    pop %rbp
    retq

aggregate:
    push %rbp
    mov %rsp, %rbp
    sub $0x50, %rsp
    mov %rdi, -0x28(%rbp)
    mov %rsi, -0x30(%rbp)
    mov %rdx, -0x38(%rbp)
    mov %rcx, -0x40(%rbp)
    mov %r8, -0x48(%rbp)
    mov %r9, -0x50(%rbp)
    mov -0x38(%rbp), %rdx
    mov -0x30(%rbp), %rcx
    mov -0x28(%rbp), %rax
    mov %rcx, %rsi
    mov %rax, %rdi
    callq calculate_sum
    mov %rax, -0x8(%rbp)
    mov -0x50(%rbp), %rdx
    mov -0x48(%rbp), %rcx
    mov -0x40(%rbp), %rax
    mov %rcx, %rsi
    mov %rax, %rdi
    callq calculate_sum
    mov %rax, -0x10(%rbp)
    mov 0x18(%rbp), %rax
    mov $0x0, %edx
    mov %rax, %rsi
    mov 0x10(%rbp), %rdi
    callq calculate_sum
    mov %rax, -0x18(%rbp)
    mov -0x8(%rbp), %rdx
    mov -0x10(%rbp), %rax
    add %rax, %rdx
    mov -0x18(%rbp), %rax
    add %rdx, %rax
    mov %rax, -0x20(%rbp)
    mov -0x20(%rbp), %rax
    leaveq
    retq

main:
    push %rbp
    mov %rsp, %rbp
    sub $0x10, %rsp
    pushq $0x8
    pushq $0x7
    mov $0x6, %r9d
    mov $0x5, %r8d
    mov $0x4, %ecx
    mov $0x3, %edx
    mov $0x2, %esi
    mov $0x1, %edi
    callq aggregate
    add $0x10, %rsp
    mov %rax, -0x8(%rbp)
    mov $0x0, %eax
    leaveq
    retq

Control Transfer

Transferring control from function P to Q involves setting PC to Q's entry point. The processor must record P's execution context, handled by the call Q instruction:

call pushes the return address (next instruction after call) onto the stack and sets PC to Q's entry point.

Runtime Stack

Stack changes during function calls can be illustrated with aggregate():

Save previous base pointer (%rbx) and establish new stack frame with sub $0x50,%rsp
Save registers and set parameters (first 6 in registers)
Set PC to 1149 and push return address 1205
Repeat for nested calls
Return with leaveq (equivalent to mov %ebp,%esp and pop %ebp) followed by ret to pop return address and jump

At addresses 119f and 11ab, parameters for the third calculate_sum() call reside in the main function's stack frame, demonstrating how stack organization facilitates parameter access.

Tags: x86-64 assembly Computer Architecture System Programming Low-Level Programming

Posted on Sun, 24 May 2026 18:01:07 +0000 by Bilbozilla

Freaks City