Suffix Automaton
Problem Overview
| Attribute | Value |
|---|---|
| Difficulty | Hard |
| Category | String Algorithms |
| Key Technique | Suffix Automaton (SAM) |
| Space Complexity | O(n) states, O(n) transitions |
Learning Goals
After studying this topic, you will be able to:
- Understand what a suffix automaton represents and why it uses linear space
- Build a suffix automaton incrementally by adding characters
- Handle clone states correctly during construction
- Count distinct substrings in O(n) time
- Apply SAM to pattern matching and LCS problems
Concept Explanation
What is a Suffix Automaton?
A Suffix Automaton (SAM) is a minimal deterministic finite automaton (DFA) that accepts exactly all suffixes of a given string. The remarkable property is that it also accepts all substrings, since every substring is a prefix of some suffix.
Key Properties:
- Contains at most 2n - 1 states for a string of length n
- Contains at most 3n - 4 transitions
- Each state represents an equivalence class of substrings (ending at the same positions)
Why Linear Space?
Instead of storing each substring explicitly (O(n^2) space), SAM groups substrings that have the same set of ending positions into equivalence classes. Each class becomes one state.
Intuition: States and Transitions
State Representation
Each state in SAM represents a set of substrings that:
- All end at the same positions in the original string
- Form a contiguous chain by suffix relationship
| Component | Meaning |
|---|---|
len |
Length of the longest substring in this equivalence class |
link |
Suffix link - points to state representing next shorter suffix class |
trans[c] |
Transition on character c to another state |
Suffix Link Intuition:
The suffix link of state v points to state u where u represents the longest proper suffix of substrings in v that belongs to a different equivalence class.
Equivalence Classes
For string “abab”:
- Substrings “ab” (at positions 2,4) and “abab” (at position 4) end at different position sets
- Substrings ending at the same positions are grouped together
State 0 (initial): represents empty string
State 1: represents "a" (ends at 1, 3)
State 2: represents "ab" (ends at 2, 4)
State 3: represents "aba", "ba" (ends at 3)
State 4: represents "abab", "bab" (ends at 4)
Dry Run: Building SAM for “abab”
Initial State
States: { 0 }
State 0: len=0, link=-1, trans={}
last = 0
Step 1: Add ‘a’
1. Create new state cur=1 with len=1
2. Walk from last=0:
- State 0 has no 'a' transition, add trans[a]=1
- Reached initial state's link (-1)
3. Set link[1] = 0 (no existing 'a' path)
4. Set last = 1
States after 'a':
State 0: len=0, link=-1, trans={a:1}
State 1: len=1, link=0, trans={}
Step 2: Add ‘b’
1. Create new state cur=2 with len=2
2. Walk from last=1:
- State 1 has no 'b' transition, add trans[b]=2
- State 0 has no 'b' transition, add trans[b]=2
- Reached initial state's link (-1)
3. Set link[2] = 0
4. Set last = 2
States after 'ab':
State 0: len=0, link=-1, trans={a:1, b:2}
State 1: len=1, link=0, trans={b:2}
State 2: len=2, link=0, trans={}
Step 3: Add ‘a’
1. Create new state cur=3 with len=3
2. Walk from last=2:
- State 2 has no 'a' transition, add trans[a]=3
- State 0 already has trans[a]=1, stop here, q=1
3. Check: len[0] + 1 == len[1]? (0+1 == 1) YES
- Set link[3] = 1 (no clone needed)
4. Set last = 3
States after 'aba':
State 0: len=0, link=-1, trans={a:1, b:2}
State 1: len=1, link=0, trans={b:2}
State 2: len=2, link=0, trans={a:3}
State 3: len=3, link=1, trans={}
Step 4: Add ‘b’ (Clone Required!)
1. Create new state cur=4 with len=4
2. Walk from last=3:
- State 3 has no 'b' transition, add trans[b]=4
- State 1 already has trans[b]=2, stop here, q=2
3. Check: len[1] + 1 == len[2]? (1+1 == 2) YES
- Set link[4] = 2 (no clone needed)
4. Set last = 4
Final States for "abab":
State 0: len=0, link=-1, trans={a:1, b:2}
State 1: len=1, link=0, trans={b:2}
State 2: len=2, link=0, trans={a:3}
State 3: len=3, link=1, trans={b:4}
State 4: len=4, link=2, trans={}
Visual Representation
a b a b
[0] -----> [1] -----> [2] -----> [3] -----> [4]
| ^ ^
| | |
+--- b ----+--- a ----+
| |
+-------- b ----------+
Suffix Links:
4 -> 2 -> 0
3 -> 1 -> 0
Common Mistakes
Mistake 1: Forgetting to Clone
# WRONG: Assuming no clone is ever needed
if p != -1 and trans[p][c] exists:
link[cur] = trans[p][c] # WRONG! May need clone
Problem: When len[p] + 1 != len[q], we must clone state q.
Fix: Always check the length condition:
q = trans[p][c]
if len[p] + 1 == len[q]:
link[cur] = q
else:
# Clone q into new state clone
clone = create_clone(q)
Mistake 2: Not Updating All Transitions to Clone
# WRONG: Only updating one transition
trans[p][c] = clone
Problem: All ancestors of p that transition to q must be updated.
Fix: Walk up suffix links and update all:
while p != -1 and trans[p][c] == q:
trans[p][c] = clone
p = link[p]
Mistake 3: Incorrect Suffix Link for Clone
# WRONG
link[clone] = link[cur]
Problem: Clone’s suffix link should be the original’s suffix link, not cur’s.
Fix:
link[clone] = link[q] # Clone inherits q's suffix link
link[q] = clone # Original q now points to clone
link[cur] = clone # New state also points to clone
Edge Cases
| Case | Input | Behavior | Notes |
|---|---|---|---|
| Empty string | ”” | Single initial state | 0 distinct substrings |
| Single char | “a” | 2 states | 1 distinct substring |
| All same chars | “aaaa” | n+1 states, no clones | Linear chain structure |
| All distinct | “abcd” | 2n states possible | Maximum branching |
| Alternating | “abab” | May require clones | Test clone logic |
| Long repeats | “abcabc” | Multiple clones | Stress test |
When to Use Suffix Automaton
Use SAM When:
- Counting distinct substrings (optimal O(n) solution)
- Finding longest common substring of multiple strings
- Pattern matching with wildcards or complex conditions
- Computing substring statistics (count, positions)
- Online string processing (add characters incrementally)
Don’t Use When:
- Simple single pattern matching (use KMP or Z-algorithm)
- Only need suffix array properties (SA might be simpler)
- Memory is extremely constrained (consider suffix array)
- Problem only involves prefixes (simpler solutions exist)
Comparison with Alternatives
| Structure | Space | Build Time | Distinct Substrings |
|---|---|---|---|
| Suffix Automaton | O(n) | O(n) | O(n) |
| Suffix Tree | O(n) | O(n) | O(n) |
| Suffix Array | O(n) | O(n log n) | O(n) with LCP |
| Trie of Suffixes | O(n^2) | O(n^2) | O(n^2) |
Applications
1. Counting Distinct Substrings
Formula: Sum of (len[v] - len[link[v]]) for all states v except initial.
Each state contributes len[v] - len[link[v]] unique substrings.
def count_distinct_substrings(sam):
total = 0
for state in range(1, sam.size): # Skip initial state
total += sam.len[state] - sam.len[sam.link[state]]
return total
2. Longest Common Substring (LCS) of Two Strings
Build SAM on first string, then traverse with second string:
def lcs_two_strings(s, t):
sam = build_sam(s)
v, l, best = 0, 0, 0
for c in t:
while v and c not in sam.trans[v]:
v = sam.link[v]
l = sam.len[v]
if c in sam.trans[v]:
v = sam.trans[v][c]
l += 1
best = max(best, l)
return best
3. LCS of Multiple Strings
For each state, track minimum occurrence length across all strings.
Solution: Python Implementation
class SuffixAutomaton:
"""
Suffix Automaton implementation.
Time: O(n) construction
Space: O(n) states and transitions
"""
def __init__(self):
self.size = 1
self.last = 0
self.len = [0]
self.link = [-1]
self.trans = [{}]
def _add_state(self):
self.len.append(0)
self.link.append(-1)
self.trans.append({})
idx = self.size
self.size += 1
return idx
def add_char(self, c):
cur = self._add_state()
self.len[cur] = self.len[self.last] + 1
p = self.last
while p != -1 and c not in self.trans[p]:
self.trans[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.trans[p][c]
if self.len[p] + 1 == self.len[q]:
self.link[cur] = q
else:
# Clone state q
clone = self._add_state()
self.len[clone] = self.len[p] + 1
self.link[clone] = self.link[q]
self.trans[clone] = self.trans[q].copy()
# Update transitions to point to clone
while p != -1 and self.trans[p].get(c) == q:
self.trans[p][c] = clone
p = self.link[p]
self.link[q] = clone
self.link[cur] = clone
self.last = cur
def build(self, s):
for c in s:
self.add_char(c)
return self
def count_distinct_substrings(self):
"""Count number of distinct substrings."""
total = 0
for i in range(1, self.size):
total += self.len[i] - self.len[self.link[i]]
return total
def solve_distinct_substrings():
"""CSES: Distinct Substrings"""
s = input().strip()
sam = SuffixAutomaton().build(s)
print(sam.count_distinct_substrings())
def solve_lcs(s, t):
"""Longest Common Substring of two strings."""
sam = SuffixAutomaton().build(s)
v, length, best = 0, 0, 0
for c in t:
while v != 0 and c not in sam.trans[v]:
v = sam.link[v]
length = sam.len[v]
if c in sam.trans[v]:
v = sam.trans[v][c]
length += 1
best = max(best, length)
return best
Complexity Analysis
| Operation | Time | Space |
|---|---|---|
| Build SAM | O(n) | O(n) |
| Count distinct substrings | O(n) | O(1) |
| LCS of two strings | O(n + m) | O(n) |
| Pattern search | O(m) | O(1) |
Why O(n) Space?
- At most 2n-1 states (each char adds at most 2 states: cur and possibly clone)
- At most 3n-4 transitions (amortized across construction)
Related CSES Problems
| Problem | Technique |
|---|---|
| Distinct Substrings | Direct SAM application |
| Repeating Substring | SAM with occurrence counting |
| Substring Order I | SAM with lexicographic traversal |
| Substring Order II | SAM with DP on states |
| Finding Patterns | Multi-pattern matching |
Key Takeaways
- Core Idea: SAM compresses all substrings into O(n) equivalence classes based on ending positions
- Clone Mechanism: Essential for maintaining minimality when suffix link lengths mismatch
- Suffix Links: Form a tree structure enabling efficient substring navigation
- Applications: Distinct substrings, LCS, pattern matching - all in linear time
Practice Checklist
Before moving on, make sure you can:
- Explain what each SAM state represents
- Trace through SAM construction on paper
- Identify when cloning is necessary
- Implement SAM from scratch in under 15 minutes
- Solve distinct substrings problem using SAM
Additional Resources
- CP-Algorithms: Suffix Automaton
- E-Maxx: Suffix Automaton (Russian)
- CSES Distinct Substrings - Suffix automaton application