DSA Crash Course
Advanced Algorithms

5.3 String Algorithms (Bonus/Optional)

Mark when done:

Deep Dive: What to Actually Know

Expand-Around-Center (Longest Palindrome)

s = "babad". Try each character as center:

Center 'b' (i=0): expand -> "b" (len 1)
Center 'a' (i=1): expand -> "bab" (len 3, 'b'=='b')
Center 'b' (i=2): expand -> "aba" (len 3), then "babad"? no ('b'!='d')
Center 'a' (i=3): expand -> "a" -> "dad"? no ('b'!='d'). Just "a".
Center 'd' (i=4): expand -> "d" (len 1)

Also try between-character centers for even-length:
Between i=0,1: 'b'!='a' -> skip.
Between i=1,2: 'a'!='b' -> skip.
Between i=2,3: 'b'!='a' -> skip.
Between i=3,4: 'a'!='d' -> skip.

Longest: "bab" (or "aba"), length 3.

O(n^2) time, O(1) space. This is always accepted in interviews.


Before You Start -- How to Think About String Matching

When Do You Need These?

For most companies, brute force string matching O(n×m) is enough. For Google / Meta, KMP is fair game — and the failure-function construction is a learnable pattern, not magic. Read the worked example below; if you can build the failure function for one new pattern after, you have it.

KMP: "Is pattern in text?" in O(n+m). Builds a failure function to avoid re-scanning.
Rabin-Karp: Rolling hash to match substrings. Good for multiple pattern matching.
Z-Algorithm: All occurrences of pattern in O(n+m).
Manacher's: Longest palindromic substring in O(n). Interview shortcut: expand-around-center O(n^2) is always accepted.

What to Actually Know for Interviews

1. Brute force string matching: O(nxm) two nested loops. Always accepted.
2. Expand-around-center for palindromes: O(n^2). Clean and simple.
3. Trie for dictionary-based problems: far more common than KMP.
4. Sliding window for substring problems: the real high-ROI pattern.
5. KMP if Google / Meta is on your list — see the worked example below.

KMP — Worked Example

The whole algorithm is "when the pattern fails to match, don't restart from scratch in the text — use the partial match you already had to skip ahead in the pattern." This is encoded in the failure function (also called LPS — Longest Proper Prefix that's also a Suffix).

Failure function for pattern = "ABABC"

i  pattern[i]   lps[i]   why
0     A           0      single char — no proper prefix / suffix
1     B           0      "AB" has no prefix == suffix
2     A           1      "ABA": prefix "A" == suffix "A"  → 1
3     B           2      "ABAB": prefix "AB" == suffix "AB" → 2
4     C           0      "ABABC": no prefix matches suffix → 0

Building it in code (15 lines)

def build_lps(pattern):
    lps = [0] * len(pattern)
    length = 0          # length of the previous longest prefix-suffix
    i = 1
    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length > 0:
            length = lps[length - 1]   # fall back to the next-longest prefix-suffix
        else:
            lps[i] = 0
            i += 1
    return lps

Matching pattern = "ABABC" in text = "ABABABABC"

def kmp_search(text, pattern):
    if not pattern:
        return 0
    lps = build_lps(pattern)
    i = j = 0   # i over text, j over pattern
    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
            if j == len(pattern):
                return i - j        # full match starting at this index
        elif j > 0:
            j = lps[j - 1]          # use failure function — DON'T move i back
        else:
            i += 1
    return -1

Trace on "ABABABABC" looking for "ABABC":

text:    A B A B A B A B C
                ^ mismatch at j=4 (pattern[4]='C', text[4]='A')
                Use lps[j-1] = lps[3] = 2 → set j=2, KEEP i=4 (don't backtrack text)
text:    A B A B A B A B C
                    ^ mismatch again at j=4 (pattern[4]='C', text[6]='A')
                    j=2 again, keep i=6
text:    A B A B A B A B C
                        ^ match j=4=='C' and text[8]=='C' → MATCH found at index 4

Notice the text pointer i never moves backward — that's the linear-time guarantee. Without KMP, naive matching would have re-scanned the prefix three times.

Time: O(n + m). Space: O(m) for the LPS array.

If you can rebuild build_lps from scratch and explain why length = lps[length - 1] is the right fallback, you understand KMP. That's the entire pattern; everything else (Z-algorithm, Aho-Corasick) builds on it.

Algorithms

#AlgorithmComplexityStatus
1KMPO(n+m)[ ]
2Rabin-KarpO(n+m) avg[ ]
3Z-AlgorithmO(n+m)[ ]
4Manacher'sO(n)[ ]

Notes