Introduction to LFU Caching
LFU (Least Frequently Used) is a caching algorithm that removes the least frequently accessed items when the cache reaches its capacity. Unlike LRU (Least Recently Used), which considers only recency, LFU prioritizes access frequency.
Comparison of LFU and LRU
Consider a cache with capacity 3 and the following access sequence: a, b, c, a, a, a, a, b, c, d.
LRU behavior: After 10 accesses, the cache contains b, c, d. Element a, despite being accessed 5 times, is evicted because it hasn't been accessed recently.
LFU behavior: After 10 accesses, the cache contains b, c, a. Element a remains because of its high access frequency, demonstrating LFU's advatnage in certain scenarios.
Naive Implementation Approach
A straightforward implementation uses a doubly linked list sorted by frequency, then by insertion time:
class Node {
int key;
int value;
int frequency;
Node prev;
Node next;
}
Operations:
- On cache hit: Increment node frequency and reposition in sorted list
- On cache miss: If cache full, remove head node (lowest frequency, oldest)
- Insert new node at appropriate position based on frequency
While functional, this approach has O(n) time complexity for operations.
Optimized O(1) Implementation
An optimal solution requires three data structures:
class LFUCache {
private final int capacity;
private int minFrequency;
private Map<Integer, Node> keyToNode;
private Map<Integer, LinkedHashSet<Node>> frequencyToNodes;
class Node {
int key;
int value;
int frequency;
}
}
Data Structure Design
- keyToNode: HashMap providing O(1) access to nodes by key
- frequencyToNodes: HashMap mapping frequencies to LinkedHashSet of nodes at that frequency
- minFrequency: Tracks current minimum frequency for eviction
Operation Analysis
Get Operation
public int get(int key) {
if (!keyToNode.containsKey(key)) return -1;
Node node = keyToNode.get(key);
updateFrequency(node);
return node.value;
}
Put Operation
public void put(int key, int value) {
if (capacity == 0) return;
if (keyToNode.containsKey(key)) {
Node node = keyToNode.get(key);
node.value = value;
updateFrequency(node);
} else {
if (keyToNode.size() == capacity) {
evict();
}
Node newNode = new Node(key, value);
newNode.frequency = 1;
minFrequency = 1;
keyToNode.put(key, newNode);
frequencyToNodes.computeIfAbsent(1, k -> new LinkedHashSet<>())
.add(newNode);
}
}
Frequency Update
private void updateFrequency(Node node) {
int currentFreq = node.frequency;
frequencyToNodes.get(currentFreq).remove(node);
if (frequencyToNodes.get(currentFreq).isEmpty()) {
frequencyToNodes.remove(currentFreq);
if (currentFreq == minFrequency) {
minFrequency++;
}
}
node.frequency++;
frequencyToNodes.computeIfAbsent(node.frequency, k -> new LinkedHashSet<>())
.add(node);
}
Eviction Process
private void evict() {
LinkedHashSet<Node> nodes = frequencyToNodes.get(minFrequency);
Node nodeToRemove = nodes.iterator().next();
nodes.remove(nodeToRemove);
if (nodes.isEmpty()) {
frequencyToNodes.remove(minFrequency);
}
keyToNode.remove(nodeToRemove.key);
}
Dubbo's LFU Implementation Analysis
Apache Dubbo implmeents LFU differently using an array-based approach:
public class LFUCache<K, V> {
private final CacheDeque<K, V>[] freqTable;
private final Map<K, CacheNode<K, V>> keyToNode;
private final float evictionFactor;
}
Array-Based Frequency Management
The freqTable array uses indices to represent frequencies:
- Index 0 stores nodes with frequency 1
- Index 1 stores nodes with frequency 2
- Index n stores nodes with frequency n+1
Each array element contains a CacheDeque (linked list) storing nodes at that frequency.
Implementation Issues
Two significant issues exist in Dubbo's implementation:
Frequency Limitation
When a node's frequency exceeds the array size, it stays at the last index. This causes high-frequency items to compete with lower-frequency items during eviction, effectively mixing LFU and LRU behaviors.
Race Condition in Eviction
The current implementation adds new items before checking capacity:
// Problematic sequence:
1. Add new item with frequency 1
2. Check if cache exceeds capacity
3. If yes, evict items starting from lowest frequency
// New item might be immediately evicted
The corrected sequence should be:
1. Check if cache at capacity
2. If yes, evict items before adding new item
3. Add new item
Performance Considerations
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Get | O(1) | O(n) where n is capacity |
| Put | O(1) | |
| Eviction | O(1) |
Practical Applications
- Database query caching
- Content delivery networks
- Operating system page replacement
- Web server session management
Testing Considerations
When testing LFU implementations, consider:
- Frequency accumulation over time
- Cache warming behavior
- Memory consumption patterns
- Concurrent access scenarios
- Edge cases with maximum frequency values