Understanding Memory Consistency and Instruction Reordering

Instruction Reordering and Sequential Consistency

Notation Conventions

For clarity in our analysis:

R(a) denotes a read operation on variable a
W(a) denotes a write operation on variable a
We use Op(a) to represent either R(a) or W(a)
Op(a) <p Op(b) indicates a appears before b in program order
Op(a) <m Op(b) indicates a completes before b in actual execution order

The Four Types of Instruction Reordering

Read-Read Reordering

Consider this scenario where two cores communicate through shared memory:

Core 1	Core 2
W(data, NEW)	R(r1, flag)
W(flag, SET)	if r1 == SET: R(r2, data)

Without reordering, Core 2 should only read the new data value after flag is set. However, if the CPU reorders instructions within Core 2 (treating the reads as independent), it might execute them out of program order, potentially reading data before checking flag. This creates a race condition where Core 2 reads stale data.

The prevention condition for read-read reordering is:

if R(a) <p R(b), then R(a) <m R(b)

Write-Write Reordering

Modern CPUs use store buffers to decouple execution from memory operations. This introduces write-write reordering scenarios:

Time	Core 1	Core 2	System State
1	W(data, NEW) [in buffer]
2	W(flag, SET) [in buffer]
3		R(r1, flag) [r1 = SET]	flag written to memory
4		R(r2, data) [r2 = OLD]	data still OLD in memory
5			data finally written to memory

Although Core 1 issued the writes in program order, the store buffer might retire them out of order, allowing Core 2 to observe an inconsistent state where flag is set but data hasn't been updated yet.

The prevention condition for write-write reordering is:

if W(a) <p W(b), then W(a) <m W(b)

Read-Write Reordering

Consider this symmetric scenario with initial values x=0, y=0:

Core 1	Core 2
R(r1, x)	R(r2, y)
W(y, NEW)	W(x, NEW)

Under sequential execution, both cores cannot read NEW simultaneously. However, with load buffers, both reads might proceed before either write commits, resulting in r1=NEW and r2=NEW, which violates program order expectations.

The prevention condition for read-write reordering is:

if R(a) <p W(b), then R(a) <m W(b)

Write-Read Reordering

Inverting the previous scenario:

Core 1	Core 2
W(x, NEW)	W(y, NEW)
R(r1, y)	R(r2, x)

Store buffers can cause both reads to complete before writes are visible, potentially yielding r1=0 and r2=0 despite both writes occurring in program order before the reads.

The prevention condition for write-read reordering is:

if W(a) <p R(b), then W(a) <m R(b)

Sequential Consistency Model

Abstract View

Memory consistency models define the contract between hardware and programmers. From a programmer's perspective, memory operations appear to execute in some order. Sequential Consistency (SC) provides the most intuitive contract: operations execute in program order as written, with interleaving of operations from different cores being the only source of nondeterminism.

SC guarantees that all possible execution results are among those that could arise from some interleaving of sequential program executions. Any result outside this set violates SC.

SC Properties

Sequential Consistency enforces strict ordering:

No read-read reordering
No write-write reordering
No read-write reordering
No write-read reordering

Formally, SC requires:

if Op(a) <p Op(b), then Op(a) <m Op(b) for all operations

This makes SC programmer-friendly - code behaves as written without hidden reordering surprises.

Multi-core SC Execution Paths

In multi-core systems, SC allows many valid interleavings. For example:

Interleaving	Result
C1:W1→C2:W2→C1:R1→C2:R2	r1=NEW, r2=NEW
C1:W1→C1:R1→C2:W2→C2:R2	r1=0, r2=NEW
C2:W2→C1:W1→C1:R1→C2:R2	r1=NEW, r2=NEW
C2:W2→C2:R2→C1:W1→C1:R1	r1=NEW, r2=0

All these results are SC-compliant. Only results impossible under any sequential interleaving (like both cores seeing NEW when that requires both writes before reads) would violate SC.

The key insight is that SC concerns observable results, not microarchitectural implementation details. A CPU may internally reorder operations while still presenting SC-compliant results to the programmer.

Tags: memory consistency instruction reordering sequential consistency cpu architecture concurrent programming

Posted on Fri, 08 May 2026 22:22:07 +0000 by LordTyphon

Freaks City