Instruction Reordering and Sequential Consistency
Notation Conventions
For clarity in our analysis:
- R(a) denotes a read operation on variable a
- W(a) denotes a write operation on variable a
- We use Op(a) to represent either R(a) or W(a)
- Op(a) <p Op(b) indicates a appears before b in program order
- Op(a) <m Op(b) indicates a completes before b in actual execution order
The Four Types of Instruction Reordering
Read-Read Reordering
Consider this scenario where two cores communicate through shared memory:
| Core 1 | Core 2 |
|---|---|
| W(data, NEW) | R(r1, flag) |
| W(flag, SET) | if r1 == SET: R(r2, data) |
Without reordering, Core 2 should only read the new data value after flag is set. However, if the CPU reorders instructions within Core 2 (treating the reads as independent), it might execute them out of program order, potentially reading data before checking flag. This creates a race condition where Core 2 reads stale data.
The prevention condition for read-read reordering is:
if R(a) <p R(b), then R(a) <m R(b)
Write-Write Reordering
Modern CPUs use store buffers to decouple execution from memory operations. This introduces write-write reordering scenarios:
| Time | Core 1 | Core 2 | System State |
|---|---|---|---|
| 1 | W(data, NEW) [in buffer] | ||
| 2 | W(flag, SET) [in buffer] | ||
| 3 | R(r1, flag) [r1 = SET] | flag written to memory | |
| 4 | R(r2, data) [r2 = OLD] | data still OLD in memory | |
| 5 | data finally written to memory |
Although Core 1 issued the writes in program order, the store buffer might retire them out of order, allowing Core 2 to observe an inconsistent state where flag is set but data hasn't been updated yet.
The prevention condition for write-write reordering is:
if W(a) <p W(b), then W(a) <m W(b)
Read-Write Reordering
Consider this symmetric scenario with initial values x=0, y=0:
| Core 1 | Core 2 |
|---|---|
| R(r1, x) | R(r2, y) |
| W(y, NEW) | W(x, NEW) |
Under sequential execution, both cores cannot read NEW simultaneously. However, with load buffers, both reads might proceed before either write commits, resulting in r1=NEW and r2=NEW, which violates program order expectations.
The prevention condition for read-write reordering is:
if R(a) <p W(b), then R(a) <m W(b)
Write-Read Reordering
Inverting the previous scenario:
| Core 1 | Core 2 |
|---|---|
| W(x, NEW) | W(y, NEW) |
| R(r1, y) | R(r2, x) |
Store buffers can cause both reads to complete before writes are visible, potentially yielding r1=0 and r2=0 despite both writes occurring in program order before the reads.
The prevention condition for write-read reordering is:
if W(a) <p R(b), then W(a) <m R(b)
Sequential Consistency Model
Abstract View
Memory consistency models define the contract between hardware and programmers. From a programmer's perspective, memory operations appear to execute in some order. Sequential Consistency (SC) provides the most intuitive contract: operations execute in program order as written, with interleaving of operations from different cores being the only source of nondeterminism.
SC guarantees that all possible execution results are among those that could arise from some interleaving of sequential program executions. Any result outside this set violates SC.
SC Properties
Sequential Consistency enforces strict ordering:
- No read-read reordering
- No write-write reordering
- No read-write reordering
- No write-read reordering
Formally, SC requires:
- if Op(a) <p Op(b), then Op(a) <m Op(b) for all operations
This makes SC programmer-friendly - code behaves as written without hidden reordering surprises.
Multi-core SC Execution Paths
In multi-core systems, SC allows many valid interleavings. For example:
| Interleaving | Result |
|---|---|
| C1:W1→C2:W2→C1:R1→C2:R2 | r1=NEW, r2=NEW |
| C1:W1→C1:R1→C2:W2→C2:R2 | r1=0, r2=NEW |
| C2:W2→C1:W1→C1:R1→C2:R2 | r1=NEW, r2=NEW |
| C2:W2→C2:R2→C1:W1→C1:R1 | r1=NEW, r2=0 |
All these results are SC-compliant. Only results impossible under any sequential interleaving (like both cores seeing NEW when that requires both writes before reads) would violate SC.
The key insight is that SC concerns observable results, not microarchitectural implementation details. A CPU may internally reorder operations while still presenting SC-compliant results to the programmer.