DSA Crash Course
Foundation Data Structures

1.8 Heaps / Priority Queues

Mark when done:

Deep Dive: When and Why to Use a Heap

Beginner tip: A heap is like a VIP line where the most important person is always at the front. When someone new arrives, they automatically slot into the right position. In a min-heap the smallest value is always at the top. In a max-heap the largest is at the top. Unlike sorting (which organizes everything), a heap only guarantees the top element -- which is exactly what you need for "find the kth largest" or "always process the smallest next."

Kth Largest -- The Min-Heap Trick

nums = [3, 2, 1, 5, 6, 4], k = 2 (find 2nd largest = 5)

Maintain min-heap of size k=2 (keeps the k largest elements):

Process 3: heap=[3]            size < k, push
Process 2: heap=[2, 3]         size == k
Process 1: 1 < heap[0]=2       skip (too small, not in top-k)
Process 5: 5 > heap[0]=2       pop 2, push 5. heap=[3, 5]
Process 6: 6 > heap[0]=3       pop 3, push 6. heap=[5, 6]
Process 4: 4 < heap[0]=5       skip

Answer: heap[0] = 5 (kth largest)

Why min-heap of size k: The smallest element in the heap is exactly the kth largest overall. Anything smaller gets rejected. O(n log k) beats O(n log n) sorting when k << n.

Two Heaps -- Find Median From Data Stream

max-heap (lower half)     min-heap (upper half)
     [3, 1]          |         [5, 7]
   top = 3           |        top = 5

Median = (3 + 5) / 2 = 4.0

The max-heap holds the smaller half, min-heap holds the larger half. Balance their sizes so they differ by at most 1. Median is always accessible from the tops.


Quick Reference (Cheat Sheet)

Before You Start -- How to Think About Heaps

Why Heaps Exist

The bottleneck: "I need to repeatedly find the min/max element."

  • Unsorted array: finding min/max is O(n) each time
  • Sorted array: finding min/max is O(1), but inserting is O(n)
  • Heap: O(1) access to min/max, O(log n) insert/delete. Best of both worlds.

The Heap Mindset

Whenever you see "K largest", "K smallest", "K most frequent", "median", or "schedule" -> think Heap.

Key Insight: Inverted Heap Trick

To find the K LARGEST elements: use a MIN-heap of size K.
  - If heap size > K, pop the smallest. What remains = K largest.
  
To find the K SMALLEST elements: use a MAX-heap of size K.
  - If heap size > K, pop the largest. What remains = K smallest.

Python only has min-heap (heapq). For max-heap: negate values.

The Two-Heap Pattern (Median)

For "running median of a stream":
  - Max-heap for the LEFT half (smaller elements)
  - Min-heap for the RIGHT half (larger elements)
  - Balance so they differ in size by at most 1
  - Median = top of the larger heap (or average of both tops)

When You See a Heap Problem, Ask Yourself

1. "Top K" or "Kth largest/smallest"? -> Heap of size K
2. "Median of stream"? -> Two-heap pattern
3. "Merge K sorted lists"? -> Min-heap of K heads
4. "Schedule tasks with cooldown"? -> Max-heap for greedy scheduling
5. "Always process the most/least urgent"? -> Priority queue

Common Mistakes

  • Python's heapq is a MIN-heap. For max-heap, push/pop negative values.
  • Forgetting to track the source when merging K lists (push (value, list_index, element_index))
  • Heappush is O(log n), not O(1). Building a heap from an array is O(n) with heapify.

Python heapq Cheat Sheet

A few quirks bite everyone the first time. Memorize these.

1. heapq is min-heap only. There is no MaxHeap class. The module operates on a plain list and always treats the smallest value as the root.

import heapq
h = []
heapq.heappush(h, 3)
heapq.heappush(h, 1)
heapq.heappush(h, 2)
heapq.heappop(h)   # -> 1   (smallest)

2. Max-heap by negation. Push the negative, pop and negate back. Three lines, no library swap.

h = []
for x in [3, 1, 5, 2]:
    heapq.heappush(h, -x)        # push negatives
print(-heapq.heappop(h))         # -> 5  (largest, after un-negating)

3. One-shot top-K with nlargest / nsmallest. When you just want the K biggest or smallest values from an iterable -- and you do not need to maintain a heap across calls -- skip the manual heap entirely.

heapq.nlargest(3, [5, 1, 9, 2, 7, 4])    # -> [9, 7, 5]
heapq.nsmallest(2, [5, 1, 9, 2, 7, 4])   # -> [1, 2]
heapq.nlargest(3, items, key=lambda x: x.score)   # key= works too

Internally these use a size-K heap, so they're O(n log k) -- same as the rolling-heap pattern but in one line.

4. Tuples as heap entries -- watch the tie-break. Heap order is the natural order of whatever you push. For (priority, item) tuples, Python falls back to comparing item when priorities tie. If item is something Python cannot compare (e.g., a custom object, or a ListNode), you get TypeError: '<' not supported.

# BREAKS the moment two tasks share a priority:
heapq.heappush(h, (priority, task_object))

# FIX: insert a monotonic counter as the tie-breaker.
import itertools
counter = itertools.count()
heapq.heappush(h, (priority, next(counter), task_object))

The counter is unique and always comparable, so the tie-break is deterministic (FIFO among equal priorities) and task_object is never compared. Use this pattern for Dijkstra with custom nodes, task schedulers, A* -- anywhere your "items" aren't naturally orderable.

Problems

#ProblemDifficultyPatternStatus
1Kth Largest ElementMediumMin-heap of size K[ ]
2Top K Frequent ElementsMediumHashMap + Heap[ ]
3Merge K Sorted ListsHardMin-heap of heads[ ]
4Find Median from Data StreamHardTwo heaps[ ]
5Task SchedulerMediumMax-heap + cooldown[ ]
6Reorganize StringMediumMax-heap greedy[ ]

Notes