Using UUIDv7 as Database Primary Keys

A persistent misconception in software engineering is that universally unique identifiers (UUIDs) are inherently unsuitable for database primary keys. The rationale stems from the fact that standard random UUIDs cause index fragmentation and frequent B-tree page splits during inserts due to their lack of sequential ordering.

UUID Specifications

Defined by RFC 4122, UUIDs are designed for decentralized identifier generation. They occupy 128 bits (16 bytes), typically rendered as a 36-character string in the 8-4-4-4-12 hexadecimal format (e.g., a1b2c3d4-e5f6-7890-abcd-ef1234567890).

Bit Range  | Field Name                  | Size
-------------------------------------------------------
0 - 31     | time_low                    | 32 bits
32 - 47    | time_mid                    | 16 bits
48 - 63    | time_hi_and_version         | 16 bits
64 - 71    | clock_seq_hi_and_reserved   | 8 bits
72 - 79    | clock_seq_low               | 8 bits
80 - 127   | node                        | 48 bits
  • time_low: Initial 32 bits.
  • time_mid: Next 16 bits.
  • time_hi_and_version: Following 16 bits, containing the 4-bit version tag.
  • clock_seq_hi_and_reserved & clock_seq_low: 16 bits total, with 2 bits for the variant.
  • node: Final 48 bits, often a hardware address or random bits.

Generation Implementations

  1. Time-Based (Version 1): Relies on a 100-nanosecond interval count since the Gregorian calendar reform (Oct 15, 1582). Combines a node ID, a clock sequence to handle clock regressions, and the timestamp.
  2. Name-Based (Versions 3 & 5): Derives the identifier by hashing a namespace and a name. V3 utilizes MD5, whereas V5 relies on SHA-1.
  3. Random (Version 4): Sets the version and variant bits, filling the remaining 122 bits with pseudo-random data.

The Randomness of V4

While V4 dominates modern systems due to its simplicity and strong entropy, its complete lack of spatial locality destroys database insert performance. New records are scattered randomly across the index.

The Temporal Flaws of V1

Although V1 offers temporal ordering, it suffers from critical flaws that prevented widespread adoption:

  1. Privacy Exposure: The node identifier typically exposes the device's MAC address, allowing attackers to trace the generating hardware.
  2. Clock Regression: If the system clock is set backwards, collision risks increase dramatically without proper clock sequence management.
  3. Node Volatility: Mobile devices changing networks result in altered MAC addresses, breaking identifier consistency.
  4. Implementation Overhead: Fetching precise system time and hardware addresses is slower than generating random bytes.
  5. Superior Alternatives: V4 circumvented these privacy and complexity issues, rendering V1 obsolete for most applications.

Evolution: UUIDv6 and UUIDv7

To address these structural flaws, newer proposals introduce time-sortable mechanisms.

UUIDv6

  • Byte Reordering: Rearranges the V1 timestamp bits so that the most significant bits come first, enabling byte-wise sorting that V1 lacked.
  • Backward Compatibility: Designed so that V1 logic could be adapted without a complete architectural rewrite.
  • Privacy Options: Allows swapping the MAC address for a random 48-bit value.
  • Sequence Refinement: Improves clock sequence generation to guarantee monotonicity within the same timestamp.
  • Flexible Timestamp: Moves away from the strict 100ns Gregorian epoch requirement.

UUIDv7

+-------------------------------------------------------+
| Segment        | Size      | Description             |
|----------------|-----------|-------------------------|
| unix_ts_ms     | 48 bits   | Milliseconds since Epoch|
| ver            | 4 bits    | Version indicator       |
| rand_a         | 12 bits   | Random payload A        |
| var            | 2 bits    | Variant indicator       |
| rand_b         | 62 bits   | Random payload B        |
+-------------------------------------------------------+
  • Unix Epoch Integration: Replaces the archaic Gregorian timestamp with standard Unix epoch milliseconds (since Jan 1, 1970), simplifying parsing and generation.
  • Native Sortability: The leading 48-bit timestamp guarantees that identifiers increase over time, preventing the index fragmentation seen in V4 and fixing the scattered timestamp fields of V1/V6.
  • Enhanced Entropy: The 74 bits of random payload (rand_a and rand_b) ensure high uniqueness and low collision rates in high-throughput environments.
  • Streamlined Layout: The architecture is remarkably straightforward, pairing a 48-bit timestamp directly with version/variant bits and random data.
  • Privacy by Design: Hardware identifiers like MAC addresses are entirely excluded, relying solely on timestamps and secure random generators.

Tags: UUID UUIDv7 database Primary Key Distributed Systems

Posted on Sat, 09 May 2026 17:42:27 +0000 by akelavlk