Hash Table Fundamentals
A hash table is a data structure that allows for direct access based on a specific key. It is effectively an array designed for scenarios requiring rapid lookups to determine if an element exists within a collection. The key serves as the array index, enabling O(1) average time complexity for retrieval, as opposed to the O(n) required by linear scanning.
Mapping and Hashing
To store data in a hash table, a hash function is utilized to map input content into a table index. A hashing algorithm converts complex data—such as strings—into numerical values. If the generated hash exceeds the size of the table, a modulo operation is performed to ensure the resulting index remains within the bounds of the array.
Handling Hash Collisions
Collisions occur when multiple keys map to the same index. Two primary resolution strategies exist:
- Chaining: Colliding elements are stored within a linked list at the specific index. This approach requires balancing table size to avoid excessive memory allocation for empty buckets or overly long chains.
- Open Addressing (Linear Probing): If a collision occurs, the algorithm searches for the next available slot in the array. This method requires the total table size to be strictly larger than the dataset to ensure availability.
Common Hash Table Implementations
In C++, standard libraries provide several implementations based on the underlying structure:
| Implementation | Underlaying Structure | Order | Unique Keys | Efficiency |
|---|---|---|---|---|
std::set |
Red-Black Tree | Ordered | Yes | O(log n) |
std::multiset |
Red-Black Tree | Ordered | No | O(log n) |
std::unordered_set |
Hash Table | Unordered | Yes | O(1) |
For general hash table requirements, std::unordered_set and std::unordered_map are preferred due to their superior O(1) lookup and modification performance.
Anagram Validation
To determine if two strings are anagrams, one can use an integer array of size 26 to track the frequency of lowercase letters. Increment the counts for the first string and decrement them for the second. If every index in the array returns to zero, the strings are anagrams.
class Solution {
public:
bool isAnagram(string s, string t) {
if (s.length() != t.length()) return false;
int freq[26] = {0};
for (int i = 0; i < s.length(); i++) {
freq[s[i] - 'a']++;
freq[t[i] - 'a']--;
}
for (int count : freq) {
if (count != 0) return false;
}
return true;
}
};
Intersection of Two Arrays
When finding the intersection of two arrays with unique results, std::unordered_set provides an efficient solution by storing elements from the first array for quick lookups against the second.
class Solution {
public:
vector<int> intersection(vector<int>& nums1, vector<int>& nums2) {
unordered_set<int> elements(nums1.begin(), nums1.end());
unordered_set<int> result;
for (int num : nums2) {
if (elements.count(num)) {
result.insert(num);
}
}
return vector<int>(result.begin(), result.end());
}
};
Happy Number Determination
To detect if a number is "happy," calculate the sum of the squares of its digits iteratively. If the sum eventually reaches 1, it is a happy number. If it enters an infinite cycle, store intermediate sums in an unordered_set to detect repetitive patterns.
class Solution {
public:
int getDigitSum(int n) {
int sum = 0;
while (n > 0) {
int digit = n % 10;
sum += digit * digit;
n /= 10;
}
return sum;
}
bool isHappy(int n) {
unordered_set<int> seen;
while (n != 1 && seen.find(n) == seen.end()) {
seen.insert(n);
n = getDigitSum(n);
}
return n == 1;
}
};
Two Sum Lookup
For the two-sum problem, where you must return the indices of two numbers that add up to a target, an unordered_map is optimal. It maps numerical values to their respective indices, allowing for single-pass lookups.
class Solution {
public:
vector<int> twoSum(vector<int>& nums, int target) {
unordered_map<int, int> indexMap;
for (int i = 0; i < nums.size(); i++) {
int complement = target - nums[i];
if (indexMap.find(complement) != indexMap.end()) {
return {indexMap[complement], i};
}
indexMap[nums[i]] = i;
}
return {};
}
};