1.4 Hash Maps & Hash Sets
Deep Dive: Hashing is a Trade-off Machine
Beginner tip: A HashMap (dictionary in Python) is like a phone book -- you look up a name and instantly get the number. Without it, you would have to scan every entry. HashMaps trade memory (storing the phone book) for speed (instant lookup).
The Fundamental Trade-off
Problem: "Does value X exist in my collection?"
Array scan: O(n) time, O(1) extra space
Sorted + BS: O(log n) time, O(1) space, but O(n log n) to sort
HashMap: O(1) time, O(n) extra space <-- trade space for time
Every HashMap problem is fundamentally: "I want O(1) lookup and I'm willing to pay O(n) space."
Longest Consecutive Sequence -- Why Sets Beat Sorting
nums = [100, 4, 200, 1, 3, 2]
Put all in set: {100, 4, 200, 1, 3, 2}
Check each number: is it a sequence START? (num-1 NOT in set)
100: 99 in set? No -> start! Count: 100. Length = 1.
4: 3 in set? Yes -> skip (not a start)
200: 199 in set? No -> start! Count: 200. Length = 1.
1: 0 in set? No -> start! Count: 1,2,3,4. Length = 4. <-- BEST
3: 2 in set? Yes -> skip
2: 1 in set? Yes -> skip
Why O(n) not O(n^2): Each number is visited at most twice: once in the outer loop, once in the inner counting loop. Non-starts are skipped instantly.
Group Anagrams -- Sort-as-Key Trick
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
The insight: two words are anagrams iff their sorted character lists are identical. So sort each word and use the sorted string as a HashMap key.
"eat" -> sorted = "aet" -> bucket["aet"] = ["eat"]
"tea" -> sorted = "aet" -> bucket["aet"] = ["eat", "tea"]
"tan" -> sorted = "ant" -> bucket["ant"] = ["tan"]
"ate" -> sorted = "aet" -> bucket["aet"] = ["eat", "tea", "ate"]
"nat" -> sorted = "ant" -> bucket["ant"] = ["tan", "nat"]
"bat" -> sorted = "abt" -> bucket["abt"] = ["bat"]
Result: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
Why this works: sorting collapses every permutation of the same letters to the same canonical string. The HashMap then groups by that signature in one pass. Cost is O(n * k log k) where k is the average word length -- almost always faster than the O(n^2 * k) "compare every pair" approach. If k is tight, swap the sorted string for a 26-int char-count tuple and it becomes O(n * k).
Quick Reference (Cheat Sheet)
Before You Start -- How to Think About Hashing
Why Hashing Exists
The bottleneck it solves: "I need to check if something exists / look up a value, and scanning the array each time is O(n)." HashMap gives you: O(1) average lookup, insert, and delete.
The Key Insight
Every time you think "I need to search for something," your first instinct should be HashMap/Set.
When to Reach for a HashMap
1. "Find if complement exists" (Two Sum) -> Store value -> index
2. "Group items by property" (Group Anagrams) -> Property as key, items as value
3. "Count frequency" (Top K, Anagram check) -> Element -> count
4. "Track seen values" (Contains Duplicate) -> Just a HashSet
5. "Subarray sum = K" -> Prefix sum as key (HashMap combo)
Choosing Keys Wisely
The art of HashMap problems is picking the RIGHT key:
| Problem | Key | Value | Why |
|---|---|---|---|
| Two Sum | target - num | index | Finds complement |
| Group Anagrams | sorted string or char count tuple | list of words | Groups by signature |
| Subarray Sum = K | prefix sum | count | Finds valid ranges |
HashMap vs HashSet
- HashMap: Need to associate a key with a value ->
{key: value} - HashSet: Only need to check existence ->
{value}-- simpler, faster
Common Mistakes
- Using a mutable type (list) as a dict key in Python -- use
tupleinstead - Forgetting to handle the case where key doesn't exist (use
.get()with default) - HashMap is O(n) worst case if all keys collide -- but this almost never matters in interviews
Problems
| # | Problem | Difficulty | Pattern | Status |
|---|---|---|---|---|
| 1 | Two Sum | Easy | Value -> Index map | [ ] |
| 2 | Group Anagrams | Medium | Sorted string as key | [ ] |
| 3 | Valid Anagram | Easy | Frequency count | [ ] |
| 4 | Longest Consecutive Sequence | Medium | Set + expand | [ ] |
| 5 | Subarray Sum Equals K | Medium | Prefix sum + HashMap | [ ] |
| 6 | Top K Frequent Elements | Medium | HashMap + Heap | [ ] |
| 7 | Encode and Decode Strings | Medium | Delimiter design | [ ] |