1.8 Heaps / Priority Queues
Deep Dive: When and Why to Use a Heap
Beginner tip: A heap is like a VIP line where the most important person is always at the front. When someone new arrives, they automatically slot into the right position. In a min-heap the smallest value is always at the top. In a max-heap the largest is at the top. Unlike sorting (which organizes everything), a heap only guarantees the top element -- which is exactly what you need for "find the kth largest" or "always process the smallest next."
Kth Largest -- The Min-Heap Trick
nums = [3, 2, 1, 5, 6, 4], k = 2 (find 2nd largest = 5)
Maintain min-heap of size k=2 (keeps the k largest elements):
Process 3: heap=[3] size < k, push
Process 2: heap=[2, 3] size == k
Process 1: 1 < heap[0]=2 skip (too small, not in top-k)
Process 5: 5 > heap[0]=2 pop 2, push 5. heap=[3, 5]
Process 6: 6 > heap[0]=3 pop 3, push 6. heap=[5, 6]
Process 4: 4 < heap[0]=5 skip
Answer: heap[0] = 5 (kth largest)
Why min-heap of size k: The smallest element in the heap is exactly the kth largest overall. Anything smaller gets rejected. O(n log k) beats O(n log n) sorting when k << n.
Two Heaps -- Find Median From Data Stream
max-heap (lower half) min-heap (upper half)
[3, 1] | [5, 7]
top = 3 | top = 5
Median = (3 + 5) / 2 = 4.0
The max-heap holds the smaller half, min-heap holds the larger half. Balance their sizes so they differ by at most 1. Median is always accessible from the tops.
Quick Reference (Cheat Sheet)
Before You Start -- How to Think About Heaps
Why Heaps Exist
The bottleneck: "I need to repeatedly find the min/max element."
- Unsorted array: finding min/max is O(n) each time
- Sorted array: finding min/max is O(1), but inserting is O(n)
- Heap: O(1) access to min/max, O(log n) insert/delete. Best of both worlds.
The Heap Mindset
Whenever you see "K largest", "K smallest", "K most frequent", "median", or "schedule" -> think Heap.
Key Insight: Inverted Heap Trick
To find the K LARGEST elements: use a MIN-heap of size K.
- If heap size > K, pop the smallest. What remains = K largest.
To find the K SMALLEST elements: use a MAX-heap of size K.
- If heap size > K, pop the largest. What remains = K smallest.
Python only has min-heap (heapq). For max-heap: negate values.
The Two-Heap Pattern (Median)
For "running median of a stream":
- Max-heap for the LEFT half (smaller elements)
- Min-heap for the RIGHT half (larger elements)
- Balance so they differ in size by at most 1
- Median = top of the larger heap (or average of both tops)
When You See a Heap Problem, Ask Yourself
1. "Top K" or "Kth largest/smallest"? -> Heap of size K
2. "Median of stream"? -> Two-heap pattern
3. "Merge K sorted lists"? -> Min-heap of K heads
4. "Schedule tasks with cooldown"? -> Max-heap for greedy scheduling
5. "Always process the most/least urgent"? -> Priority queue
Common Mistakes
- Python's
heapqis a MIN-heap. For max-heap, push/pop negative values. - Forgetting to track the source when merging K lists (push
(value, list_index, element_index)) - Heappush is O(log n), not O(1). Building a heap from an array is O(n) with
heapify.
Python heapq Cheat Sheet
A few quirks bite everyone the first time. Memorize these.
1. heapq is min-heap only. There is no MaxHeap class. The module operates on a plain list and always treats the smallest value as the root.
import heapq
h = []
heapq.heappush(h, 3)
heapq.heappush(h, 1)
heapq.heappush(h, 2)
heapq.heappop(h) # -> 1 (smallest)
2. Max-heap by negation. Push the negative, pop and negate back. Three lines, no library swap.
h = []
for x in [3, 1, 5, 2]:
heapq.heappush(h, -x) # push negatives
print(-heapq.heappop(h)) # -> 5 (largest, after un-negating)
3. One-shot top-K with nlargest / nsmallest. When you just want the K biggest or smallest values from an iterable -- and you do not need to maintain a heap across calls -- skip the manual heap entirely.
heapq.nlargest(3, [5, 1, 9, 2, 7, 4]) # -> [9, 7, 5]
heapq.nsmallest(2, [5, 1, 9, 2, 7, 4]) # -> [1, 2]
heapq.nlargest(3, items, key=lambda x: x.score) # key= works too
Internally these use a size-K heap, so they're O(n log k) -- same as the rolling-heap pattern but in one line.
4. Tuples as heap entries -- watch the tie-break. Heap order is the natural order of whatever you push. For (priority, item) tuples, Python falls back to comparing item when priorities tie. If item is something Python cannot compare (e.g., a custom object, or a ListNode), you get TypeError: '<' not supported.
# BREAKS the moment two tasks share a priority:
heapq.heappush(h, (priority, task_object))
# FIX: insert a monotonic counter as the tie-breaker.
import itertools
counter = itertools.count()
heapq.heappush(h, (priority, next(counter), task_object))
The counter is unique and always comparable, so the tie-break is deterministic (FIFO among equal priorities) and task_object is never compared. Use this pattern for Dijkstra with custom nodes, task schedulers, A* -- anywhere your "items" aren't naturally orderable.
Problems
| # | Problem | Difficulty | Pattern | Status |
|---|---|---|---|---|
| 1 | Kth Largest Element | Medium | Min-heap of size K | [ ] |
| 2 | Top K Frequent Elements | Medium | HashMap + Heap | [ ] |
| 3 | Merge K Sorted Lists | Hard | Min-heap of heads | [ ] |
| 4 | Find Median from Data Stream | Hard | Two heaps | [ ] |
| 5 | Task Scheduler | Medium | Max-heap + cooldown | [ ] |
| 6 | Reorganize String | Medium | Max-heap greedy | [ ] |