Understanding HashMap Internals: A Deep Dive into Source Code

HashMap is one of the most frequently used data structures in Java. Understanding its internal implementation helps developers make better decisions about when and how to use it effectively.

Hash Computation and Index Calculation

The quality of hash distribution directly impacts HashMap performance. The implementation applies a subtle but crucial transformation:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Consider a 32-bit integer hashCode. When the table size is small (such as 16), only the lower bits determine the bucket index. If two hashCodes differ only in their upper 16 bits, they would collide. By XORing the upper 16 bits with the lower 16 bits, the hash function ensures that high-order bits influence the final distribution.

The bitwise AND operation with (n - 1) converts a hash into a valid bucket index:

int index = (table.length - 1) & hash;

This works efficiently becuase table length is always a power of two, making (n - 1) a mask with all lower bits set to 1.

Data Structure: Linked Lists and Red-Black Trees

HashMap uses a hybrid approach: buckets initially contain linked lists. When a bucket accumulates enough elements, the list converts to a red-black tree.

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

The threshold of 8 is not arbitrary. Under random hash distribution, the probability of a bucket containing k elements follows a Poisson distribution with λ = 0.5:

Elements	Probability
0	0.6065
1	0.3033
2	0.0758
3	0.0126
4	0.0016
5	0.0002
6	0.00001
7	0.000001
8	0.00000006

The probability of a bucket having 8 or more elements is extremely low (6 in 100 million). Using trees only when necessary balances space and time complexity.

Core Operations

Putting Values

The putVal method handles the complexity of insertion:

V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab = table;
    int n = tab.length;
    
    if (tab == null || n == 0) {
        n = (tab = resize()).length;
    }
    
    int index = (n - 1) & hash;
    Node<K,V> first = tab[index];
    
    if (first == null) {
        tab[index] = new Node<>(hash, key, value, null);
    } else {
        Node<K,V> e = first;
        
        if (first.hash == hash && keysEqual(first.key, key)) {
            e = first;
        } else if (first instanceof TreeNode) {
            e = ((TreeNode<K,V>)first).insert(key, hash, value);
        } else {
            int count = 0;
            for (Node<K,V> curr = first; ; curr = curr.next) {
                count++;
                if (curr.next == null) {
                    curr.next = new Node<>(hash, key, value, null);
                    if (count >= TREEIFY_THRESHOLD - 1) {
                        convertToTree(tab, hash);
                    }
                    break;
                }
                if (curr.next.hash == hash && keysEqual(curr.next.key, key)) {
                    e = curr.next;
                    break;
                }
            }
        }
        
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null) {
                e.value = value;
            }
            return oldValue;
        }
    }
    
    size++;
    if (size > threshold) {
        resize();
    }
    return null;
}

The algorithm follows these steps:

If the table is uninitialized, invoke resize() to create it
Calculate the bucket index using (n-1) & hash
If the bucket is empty, create a new node directly
If the bucket contains a tree, use tree insertion
If the bucket contains a list, traverse to find a matching key or append
When list length reaches threshold, convert to a tree
If a matching key exists, update its value and return the old value
After inserting a new element, check if resizing is needed

Retrieving Values

The getNode method locates values efficiently:

V get(Object key) {
    Node<K,V> node = findNode(hash(key), key);
    return node == null ? null : node.value;
}

final Node<K,V> findNode(int hash, Object key) {
    Node<K,V>[] tab = table;
    int n = tab.length;
    
    if (tab != null && n > 0) {
        int index = (n - 1) & hash;
        Node<K,V> first = tab[index];
        
        if (first != null) {
            if (first.hash == hash && keysEqual(first.key, key)) {
                return first;
            }
            
            Node<K,V> next = first.next;
            if (next != null) {
                if (first instanceof TreeNode) {
                    return ((TreeNode<K,V>)first).search(hash, key);
                }
                
                do {
                    if (next.hash == hash && keysEqual(next.key, key)) {
                        return next;
                    }
                } while ((next = next.next) != null);
            }
        }
    }
    return null;
}

The search logic first checks the bucket head, then either searches the tree or traverses the list.

Removing Values

The removeNode method handles deletion:

V remove(Object key) {
    Node<K,V> node = deleteNode(hash(key), key, null, false, true);
    return node == null ? null : node.value;
}

final Node<K,V> deleteNode(int hash, Object key, Object value, 
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab = table;
    int n = tab.length;
    int index = (n - 1) & hash;
    Node<K,V> bucket = tab[index];
    
    if (bucket == null) return null;
    
    Node<K,V> target = null;
    Node<K,V> parent = null;
    
    if (bucket.hash == hash && keysEqual(bucket.key, key)) {
        target = bucket;
    } else if (bucket instanceof TreeNode) {
        target = ((TreeNode<K,V>)bucket).findAndRemove(hash, key);
    } else {
        for (Node<K,V> curr = bucket; curr != null; curr = curr.next) {
            if (curr.next.hash == hash && keysEqual(curr.next.key, key)) {
                parent = curr;
                target = curr.next;
                break;
            }
        }
    }
    
    if (target == null) return null;
    
    if (target instanceof TreeNode) {
        ((TreeNode<K,V>)target).remove(tab, movable);
    } else {
        if (parent == null) {
            tab[index] = target.next;
        } else {
            parent.next = target.next;
        }
    }
    
    size--;
    return target;
}

Deletion requires finding the target node and its parent, then reconnecting the chain.

Resize Mechanism

When the number of elements exceeds capacity * loadFactor, the table must grow:

static final float DEFAULT_LOAD_FACTOR = 0.75f;

final Node<K,V>[] resize() {
    Node<K,V>[] oldTable = table;
    int oldCap = oldTable == null ? 0 : oldTable.length;
    int oldThr = threshold;
    
    int newCap, newThr;
    
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTable;
        }
        newCap = oldCap << 1;
        newThr = oldThr << 1;
    } else if (oldThr > 0) {
        newCap = oldThr;
    } else {
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    Node<K,V>[] newTable = new Node[newCap];
    table = newTable;
    
    if (oldTable != null) {
        for (int i = 0; i < oldCap; i++) {
            Node<K,V> node = oldTable[i];
            
            if (node == null) continue;
            oldTable[i] = null;
            
            if (node.next == null) {
                newTable[node.hash & (newCap - 1)] = node;
            } else if (node instanceof TreeNode) {
                ((TreeNode<K,V>)node).split(this, newTable, i, oldCap);
            } else {
                Node<K,V> loHead = null, loTail = null;
                Node<K,V> hiHead = null, hiTail = null;
                
                for (Node<K,V> e = node; e != null; e = e.next) {
                    if ((e.hash & oldCap) == 0) {
                        if (loTail == null) loHead = e;
                        else loTail.next = e;
                        loTail = e;
                    } else {
                        if (hiTail == null) hiHead = e;
                        else hiTail.next = e;
                        hiTail = e;
                    }
                }
                
                if (loTail != null) {
                    loTail.next = null;
                    newTable[i] = loHead;
                }
                if (hiTail != null) {
                    hiTail.next = null;
                    newTable[i + oldCap] = hiHead;
                }
            }
        }
    }
    
    threshold = newThr;
    return newTable;
}

The resize operation has several important characteristics:

Capacity always doubles (oldCap << 1), maintaining power-of-two
Elements redistribute based on a additional bit of their hash
The (hash & oldCap) check determines whether an element stays in its original bucket or moves to index + oldCap
Trees may split or untreeify based on their new size

Why Power-of-Two Capacity

Using power-of-two capacities enables efficient index calculation. For any hash value:

index = hash & (capacity - 1);

When capacity is 16 (10000 in binary), capacity - 1 is 15 (01111). The AND operation extracts the lower 4 bits, which is equivalent to hash % 16 but faster. This optimization eliminates the need for modulo division during every operation.

Key Takeaways

HashMap achieves excellent average-case performance (O(1) for get and put) through careful design decisions. The hash function ensures good distribution, the power-of-two capacity enables fast indexing, and the tree conversion prevents worst-case degradation. Undesrtanding these internals helps developers diagnose performance issues and use HashMap effectively in performance-critical applications.

Tags: java hashmap Data Structures source code analysis Collections Framework

Posted on Sun, 17 May 2026 11:12:11 +0000 by st0rmer

Freaks City