Garbage Collection (GC) automatically identifies and reclaims memory that is no longer in use. Rather than explicitly marking objects as garbage, modern GC implementations track which objects are still in use and treat everything else as reclaimable. This fundamental inversion forms the basis of automated memory management in the JVM.
This overview covers the general characteristics, key concepts, and implementation algorithms of garbage collection, with a focus on Oracle Hotspot and OpenJDK behavior.
Manual Memory Management
Before automated garbage collection became standard, developers had to explicitly allocate and free memory. Forgetting to release memory resulted in allocated blocks that could never be reused—a condiiton known as memory leak.
The following C example demonstrates the pitfals of manual memory control:
int process_data() {
size_t count = calculate_size();
int *buffer = malloc(count * sizeof(int));
if (load_data(count, buffer) < count) {
// Memory not released on early return!
return -1;
}
// perform operations...
free(buffer);
return 0;
}
In complex codebases, releasing memory at the right moment becomes error-prone. This led to the development of automatic memory management mechanisms.
Reference Counting
Early garbage collection algorithms relied on reference counting. Each object maintains a counter tracking how many references point to it. When the count reaches zero, the object becomes reclaimable. C++ smart pointers illustrate this concept:
int process_items() {
size_t length = determine_length();
std::shared_ptr<std::vector<int>> data
= std::make_shared<std::vector<int>>();
if (populate_data(length, data) < length) {
return -1;
}
return 0;
}
The shared_ptr tracks reference counts automatically. Counts increment when passed as parameters and decrement when leaving scope. When the count reaches zero, the underlying vector is deallocated. Languages like Perl, Python, and PHP use similar mechanisms.
The diagram below illustrates this concept:
Green elements (GC Roots) represent objects currently accessible by the program. Blue circles are reachable objects with their reference counts displayed. Gray circles are objects no longer referenced with in any scope—these are candidates for garbage collection.
However, reference counting suffers from a critical weakness: cyclic references. Objects may reference each other while no external code holds references to the group. This creates a detached cycle where reference counts never reach zero:
Red objects in the diagram form a cycle with references pointing only to each other. No external references exist, yet the reference counts remain non-zero, causing memory to be retained indefinitely.
Some languages address this through weak references or supplementary cycle detection algorithms.
Mark and Sweep Algorithm
The JVM defines object reachability explicitly through Garbage Collection Roots (GC Roots), which include:
- Local variables in currently executing methods
- Active threads
- Static fields
- JNI references
- Other JVM-specific roots
The JVM employs the Mark and Sweep algorithm to identify and reclaim unreachable objects:
Marking Phase: The collector traverses all reachable objects starting from GC Roots and records their status in native memory.
Sweeping Phase: Memory occupied by unmarked (unreachable) objects becomes available for future allocations.
Various JVM implementations use this foundation with adaptations: Parallel Scavenge, Parallel Mark+Copy, and CMS each implement the two-phase approach differently while maintaining the core principles.
A significant advantage of mark-and-sweep over pure reference counting is immunity to cyclic reference problems:
Objects in isolated cycles have no path back to GC Roots, making them properly identified as garbage and eligible for reclamation.
The trade-off involves Stop-The-World (STW) pauses. During garbage collection, application threads must halt because object references continuously change during normal execution. The JVM suspends all threads to perform a consistent snapshot of reachable objects. While various factors can trigger STW pauses, garbage collection remains the primary cause.
This guide explores JVM garbage collection implementation details and strategies for efficient memory management.