Understanding Hash-Based Collections in Java: HashMap and HashSet Internals

Key-Value vs. Unique-Key Abstractions

Java’s Map and Set interfaces serve distinct roles in efficient data retrieval. Map implements a key-value association model—ideal for scenarios like counting word frequencies or mapping identifiers to attributes. In contrast, Set models uniqueness-only lookups—suitable for membership checks, such as verifying whether a username exists in a registry.

Map<K, V> is not part of the Collection hierarchy; it declares operations for associating keys with values. Keys must be unique; values may repeat. Set<E> extends Collection, enforcing element uniqueness without value pairing.

Core Map Operations

public static void demonstrateMapUsage() {
    Map<String, Integer> frequencyMap = new HashMap<>();
    
    // Insert or update entry
    frequencyMap.put("apple", 1);
    
    // Retrieve value; returns null if absent
    Integer count = frequencyMap.get("apple");
    
    // Retrieve with fallback default
    int safeCount = frequencyMap.getOrDefault("banana", 0);
    
    // Remove mapping
    frequencyMap.remove("apple");
    
    // Check existence
    boolean hasApple = frequencyMap.containsKey("apple");
    boolean hasZero = frequencyMap.containsValue(0);
}

Key behavioral notes:

  • Map implementations include HashMap (average O(1) access, permits null keys/values) and TreeMap (O(log n), sorted iteration, disallows null keys).
  • Keys are immutable in-place; updating a key requires removal + reinsertion.
  • Keys can be extracted via keySet() (a Set<K>), while values form a Collection<V>—not necessarily unique.

Core Set Operations

public static void demonstrateSetUsage() {
    Set<String> userNames = new HashSet<>();
    
    // Add element; no-op if duplicate
    userNames.add("alice");
    
    // Clear all entries
    userNames.clear();
    
    // Membership test
    boolean exists = userNames.contains("bob");
    
    // Remove element
    userNames.remove("alice");
    
    // Size and emptiness
    int size = userNames.size();
    boolean isEmpty = userNames.isEmpty();
}

Important distinctions:

  • HashSet uses hashing for average O(1) operations; TreeSet maintains sorted order via red-black tree (O(log n)).
  • LinkedHashSet preserves insertion order using a doubly-linked list overlay.
  • TreeSet rejects null; HashSet accepts it.
  • Like Map keys, Set elements cannot be mutated post-insertion without removal.

Hash Table Fundamentals

A hash table enables near-constant-time lookup by transforming keys into array indices via a hash function. For example:

int index = key.hashCode() % table.length;

When two distinct keys map to the same index—a collision—the structure must resolve it. Collisions are inevitable when the number of possible keys exceeds table capacity.

Collision Resolution Strategies

Open Addressing (Closed Hashing)

  • Probes alternate slots within the same array upon collision.
  • Linear probing: check index + 1, index + 2, etc., wrapping modulo length.
  • Quadratic probing: try index + i² for i = 1, 2, ... to reduce clustering.
  • Requires tombstone markers for deletions to preserve search integrity.

Separate Chaining (Open Hashing)

  • Each bucket holds a linked list (or other collection) of colliding entries.
  • Lookup involves computing the bucket index, then traversing its chain.
  • Scalable: chains grow dynamically; load factor controls performance degradation.

Load Factor and Resizing

Load factor α = (number of entries) / (table capacity). Java’s HashMap defaults to 0.75. Exceeding this triggers rehashing: a larger table is allocated, and every existing entry is rehashed into the new structure.

Minimal Hash Table Implementation

class SimpleHashMap {
    private static class Entry {
        final int key;
        int value;
        Entry next;
        
        Entry(int k, int v) {
            this.key = k;
            this.value = v;
        }
    }
    
    private Entry[] buckets;
    private int size;
    private static final double LOAD_THRESHOLD = 0.75;
    
    @SuppressWarnings("unchecked")
    SimpleHashMap() {
        this.buckets = new Entry[16];
    }
    
    void put(int key, int value) {
        int idx = Math.abs(key) % buckets.length;
        Entry current = buckets[idx];
        
        // Update existing key
        while (current != null) {
            if (current.key == key) {
                current.value = value;
                return;
            }
            current = current.next;
        }
        
        // Insert new entry at head
        Entry newNode = new Entry(key, value);
        newNode.next = buckets[idx];
        buckets[idx] = newNode;
        size++;
        
        if (size >= LOAD_THRESHOLD * buckets.length) {
            resize();
        }
    }
    
    int get(int key) {
        int idx = Math.abs(key) % buckets.length;
        Entry current = buckets[idx];
        
        while (current != null) {
            if (current.key == key) {
                return current.value;
            }
            current = current.next;
        }
        return -1; // Not found
    }
    
    private void resize() {
        Entry[] oldBuckets = buckets;
        @SuppressWarnings("unchecked")
        Entry[] newBuckets = new Entry[oldBuckets.length * 2];
        buckets = newBuckets;
        size = 0;
        
        for (Entry head : oldBuckets) {
            while (head != null) {
                put(head.key, head.value); // Rehash each entry
                head = head.next;
            }
        }
    }
}

This implementation uses separate chaining with linear probing via linked lists per bucket, automatic resizing based on load factor, and basic integer hashing.

Tags: java hashmap HashSet Hash Table Data Structures

Posted on Sat, 09 May 2026 09:06:09 +0000 by sigmadog