Inside Go's Garbage Collection Architecture and Evolution

Memory management in Go abstracts explicit deallocation away from developers. Escape analysis inserts allocations as needed, while a dedicated garbage collector reclaims unused heap objects. This automation does incur overhead, and the Go runtime team has continuously reworked the GC to minimize pause times. The journey spans several milestones: from early stop‑the‑world mark‑sweep, through parallel sweeping and concurrent tri‑color marking, to the hybrid write barrier that nearly eliminates STW pauses. Understanding these stages clarifies how production services achieve predictable latency today.

Mark‑and‑Sweep Foundation

The runtime uses a mark‑and‑sweep strategy rather than reference counting or copying collectors. Marking begins from roots—global variables and the stacks of everry goroutine—and traverses reachable objects, recording liveness in a dedicated bitmap region in the managed heap. Any heap object that remains unmarked after the traversal is garbage and is swept back to the per‑P mcache free lists for later reuse. Originally, the entire cycle required a stop‑the‑world pause, because a concurrent mutation could alter pointer graphs mid‑mark and cause a live object to be incorrectly freed.

// Early versions required full STW
// During mark phase, any mutation risked missing newly connected objects.
type link struct {
     next *int
}

Parallel Sweep

Observing that only the mark phase needed consistency, Go 1.3 kept marking under STW but allowed sweeping to proceed concurrently with user goroutines. This reduced pause length but still left the marking pause as the primary bottleneck.

Concurrent Tri‑Color Marking

Go 1.5 introduced concurrent marking via the tri‑color algorithm. The abstract states:

  • White: unreached (potential garbage)
  • Grey: reached but its outgoing references not yet scanned
  • Black: reached and fully scenned

The GC walks grey nodes, turning them black after shading their descendants grey. To maintain invariants during concurrent writes, a write barrier entercepts pointer updates and ensures the new referent is shaded grey before the store completes. New allocations during marking are born black, so they are not lost prematurely.

// Before the write barrier, a mutation could hide a live object:
ptr.next = freshNode  // original next leaked if not already grey

// After barrier, freshNode is shaded grey before the store finishes:
shade(freshNode)
ptr.next = freshNode

A critical edge case emerges when a heap pointer is written into a stack variable: stacks were not instrumented by the write barrier. Thus, after the main mark phase, a STW re‑scan of all goroutine stacks was required to catch leaked references. The re‑scan could take tens of milliseconds when goroutine counts were high.

Hybrid Write Barrier (Go 1.8+)

To avoid the stack re‑scan, a hybrid write barrier was added. Before updating a pointer slot, the barrier shades the old value unconditionally and, if the current goroutine's stack hasn't been scanned yet, also shades the new value. This ensures any pointer that escapes detection via a stack store is still visible without a full re‑scan.

// Pseudocode: hybrid barrier
writePointer(slot *unsafe.Pointer, ptr unsafe.Pointer) {
    shade(*slot)          // old referent stays grey
    if currentG.stackGrey {
        shade(ptr)        // new referent becomes grey
    }
    *slot = ptr
}

The cost is a slight conservativeness: an object that truly becomes garbage after a nil write may stay grey and survive until the next collection.

GC Triggers and Pacing

A GC cycle starts when the heap doubles in size relative to the live bytes after the previous cycle. The GOGC environment variable (default 100) controls this ratio. A forced collection occurs after two minutes of inactivity, and runtime.GC() triggers one on demand. The mark assist mechanism ensures allocation outpaces marking only if background workers are given 25% of CPU time; otherwise allocation throttles to let marking catch up.

Other Optimizations

Tiny objects (no pointers) are marked black immediately, bypassing the grey queue because they can't introduce new references. The write barrier itself is heavily optimized to be a few inline instructions. Ongoing research discusses generational hypotheses: separating young, frequently allocated objects from long‑lived ones could reduce the total work per cycle, similar to generational collectors in other runtimes.

Practical Impact

Despite steady improvements, deeply nested pointer graphs still increase marking cost. Data structures like chan map[string][]*string expand the reference tree and make each cycle more expensive. Application‑level object pools rarely reduce GC pressure because gcMark overhead often shifts rather than disappears. An alternative is to allocate large, single spans (over 32 KB) that land directly as largespan objects, along with custom allocators similar to those in open‑source projects like freecache or bigcache. These approaches trade manual management complexity for fewer pointer‑rich heap objects, thereby lightening the GC load.

When building latency‑sensitive systems, profiling with GODEBUG=gctrace=1 and tuning GOGC remain first‑line techniques for aligning GC work with service requirements.

Tags: Golang garbage-collection memory-management tri-color-marking write-barrier

Posted on Sun, 14 Jun 2026 18:17:12 +0000 by idotcom