DSA Crash Course
Foundation Data Structures

1.4 Hash Maps & Hash Sets

Mark when done:

Deep Dive: Hashing is a Trade-off Machine

Beginner tip: A HashMap (dictionary in Python) is like a phone book -- you look up a name and instantly get the number. Without it, you would have to scan every entry. HashMaps trade memory (storing the phone book) for speed (instant lookup).

The Fundamental Trade-off

Problem: "Does value X exist in my collection?"

Array scan:   O(n) time, O(1) extra space
Sorted + BS:  O(log n) time, O(1) space, but O(n log n) to sort
HashMap:      O(1) time, O(n) extra space   <-- trade space for time

Every HashMap problem is fundamentally: "I want O(1) lookup and I'm willing to pay O(n) space."

Longest Consecutive Sequence -- Why Sets Beat Sorting

nums = [100, 4, 200, 1, 3, 2]

Put all in set: {100, 4, 200, 1, 3, 2}

Check each number: is it a sequence START? (num-1 NOT in set)
  100: 99 in set? No  -> start! Count: 100. Length = 1.
  4:   3 in set?  Yes -> skip (not a start)
  200: 199 in set? No -> start! Count: 200. Length = 1.
  1:   0 in set?  No  -> start! Count: 1,2,3,4. Length = 4. <-- BEST
  3:   2 in set?  Yes -> skip
  2:   1 in set?  Yes -> skip

Why O(n) not O(n^2): Each number is visited at most twice: once in the outer loop, once in the inner counting loop. Non-starts are skipped instantly.

Group Anagrams -- Sort-as-Key Trick

words = ["eat", "tea", "tan", "ate", "nat", "bat"]

The insight: two words are anagrams iff their sorted character lists are identical. So sort each word and use the sorted string as a HashMap key.

"eat"  -> sorted = "aet"  -> bucket["aet"] = ["eat"]
"tea"  -> sorted = "aet"  -> bucket["aet"] = ["eat", "tea"]
"tan"  -> sorted = "ant"  -> bucket["ant"] = ["tan"]
"ate"  -> sorted = "aet"  -> bucket["aet"] = ["eat", "tea", "ate"]
"nat"  -> sorted = "ant"  -> bucket["ant"] = ["tan", "nat"]
"bat"  -> sorted = "abt"  -> bucket["abt"] = ["bat"]

Result: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Why this works: sorting collapses every permutation of the same letters to the same canonical string. The HashMap then groups by that signature in one pass. Cost is O(n * k log k) where k is the average word length -- almost always faster than the O(n^2 * k) "compare every pair" approach. If k is tight, swap the sorted string for a 26-int char-count tuple and it becomes O(n * k).


Quick Reference (Cheat Sheet)

Before You Start -- How to Think About Hashing

Why Hashing Exists

The bottleneck it solves: "I need to check if something exists / look up a value, and scanning the array each time is O(n)." HashMap gives you: O(1) average lookup, insert, and delete.

The Key Insight

Every time you think "I need to search for something," your first instinct should be HashMap/Set.

When to Reach for a HashMap

1. "Find if complement exists" (Two Sum) -> Store value -> index
2. "Group items by property" (Group Anagrams) -> Property as key, items as value
3. "Count frequency" (Top K, Anagram check) -> Element -> count
4. "Track seen values" (Contains Duplicate) -> Just a HashSet
5. "Subarray sum = K" -> Prefix sum as key (HashMap combo)

Choosing Keys Wisely

The art of HashMap problems is picking the RIGHT key:

ProblemKeyValueWhy
Two Sumtarget - numindexFinds complement
Group Anagramssorted string or char count tuplelist of wordsGroups by signature
Subarray Sum = Kprefix sumcountFinds valid ranges

HashMap vs HashSet

  • HashMap: Need to associate a key with a value -> {key: value}
  • HashSet: Only need to check existence -> {value} -- simpler, faster

Common Mistakes

  • Using a mutable type (list) as a dict key in Python -- use tuple instead
  • Forgetting to handle the case where key doesn't exist (use .get() with default)
  • HashMap is O(n) worst case if all keys collide -- but this almost never matters in interviews

Problems

#ProblemDifficultyPatternStatus
1Two SumEasyValue -> Index map[ ]
2Group AnagramsMediumSorted string as key[ ]
3Valid AnagramEasyFrequency count[ ]
4Longest Consecutive SequenceMediumSet + expand[ ]
5Subarray Sum Equals KMediumPrefix sum + HashMap[ ]
6Top K Frequent ElementsMediumHashMap + Heap[ ]
7Encode and Decode StringsMediumDelimiter design[ ]

Notes