Pattern Positions

Problem Overview

Attribute	Value
Difficulty	Hard
Category	String Algorithms
Time Limit	1 second
Key Technique	Aho-Corasick Automaton
CSES Link	Pattern Positions

Learning Goals

After solving this problem, you will be able to:

Understand when to use Aho-Corasick vs KMP algorithm
Build a trie structure for multiple patterns
Construct failure links (similar to KMP’s LPS array)
Implement efficient multi-pattern matching in O(n + m + z) time

Problem Statement

Problem: Given a text string and k pattern strings, find the first occurrence position of each pattern in the text.

Input:

Line 1: Text string
Line 2: Number of patterns k
Lines 3 to k+2: Pattern strings (one per line)

Output:

For each pattern, print the 1-indexed position of its first occurrence, or -1 if not found

Constraints:

1 <= text <= 10^5
1 <= k <= 500
1 <= pattern_i <= 10^5
Total length of all patterns <= 10^6
All strings contain only lowercase English letters (a-z)

Example

Input:
aybabtu
3
bab
abc
tu

Output:
3
-1
6

Explanation:

“bab” first occurs at position 3 (1-indexed): aybabtu
“abc” does not occur in the text: -1
“tu” first occurs at position 6 (1-indexed): aybabtu

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: Why can’t we just use KMP k times?

If we use KMP for each pattern separately, the time complexity becomes O(n * k + total_pattern_length). When k is large, this becomes inefficient. The key insight is that we can search for ALL patterns simultaneously in a single pass through the text.

Why Aho-Corasick?

Think of it like searching for 500 words in a book. Instead of scanning 500 times (once per word), scan ONCE while tracking all words simultaneously using a state machine.

Approach	Time Complexity	When to Use
KMP per pattern	O(n * k + m)	Few patterns (k < 10)
Rabin-Karp	O(n * k + m) avg	When patterns same length
Aho-Corasick	O(n + m + z)	Many patterns, different lengths

Where n = text length, m = total pattern length, k = number of patterns, z = total matches.

Solution 1: Brute Force (KMP per Pattern)

Idea

Apply KMP algorithm for each pattern independently. Simple but inefficient for many patterns.

Algorithm

For each pattern, build its LPS array
Run KMP matching against the text
Record the first match position (or -1)

Code

def solve_brute_force(text, patterns):
  """Apply KMP for each pattern separately. Time: O(n * k + m)"""
  def kmp_first_match(text, pattern):
    # Build LPS array
    m, lps = len(pattern), [0] * len(pattern)
    length, i = 0, 1
    while i < m:
      if pattern[i] == pattern[length]:
        length += 1
        lps[i] = length
        i += 1
      elif length:
        length = lps[length - 1]
      else:
        i += 1
    # KMP search
    i = j = 0
    while i < len(text):
      if text[i] == pattern[j]:
        i, j = i + 1, j + 1
      if j == m:
        return i - j
      elif i < len(text) and text[i] != pattern[j]:
        j = lps[j - 1] if j else 0
        if not j:
          i += 1
    return -1

  return [kmp_first_match(text, p) + 1 if kmp_first_match(text, p) != -1 else -1
      for p in patterns]

Complexity

Metric	Value	Explanation
Time	O(n * k + m)	KMP is O(n + pattern_len) per pattern
Space	O(max_pattern_len)	LPS array for each pattern

Why This Works (But Is Slow)

Correctness is guaranteed since KMP correctly finds all occurrences. However, scanning the text k times is wasteful when patterns share common prefixes or when k is large.

Solution 2: Aho-Corasick Algorithm (Optimal)

Key Insight

The Trick: Build a trie of all patterns, add failure links (like KMP’s LPS), then scan text ONCE while tracking all pattern matches simultaneously.

Component	Purpose
Trie	Store all patterns as a tree
Failure Links	Point to longest suffix that is also a prefix of some pattern
Output	Track which patterns end at each node

Algorithm: (1) Build trie, (2) Compute failure links via BFS, (3) Process text char-by-char

Dry Run Example

Let’s trace through with text = “ushers”, patterns = [“he”, “she”, “hers”]:

Trie with Failure Links:
       root(0)
      /      \
    h(1)     s(2)         Failure links:
    |          \           - node 4 (sh) --fail--> node 1 (h)
   e(3)        h(4)        - node 5 (she) --fail--> node 3 (he)
  [he]          \
    |          e(5)
   r(6)       [she]
    |
   s(7)
  [hers]

Processing "ushers":
  i=0 'u': stay at root (no child 'u')
  i=1 's': root -> node 2
  i=2 'h': node 2 -> node 4
  i=3 'e': node 4 -> node 5 [she] -> also fail to node 3 [he]
           MATCH "she" at pos 3, "he" at pos 4 (1-indexed)
  i=4 'r': node 5 has no 'r' -> fail to node 3 -> node 6
  i=5 's': node 6 -> node 7 [hers]
           MATCH "hers" at pos 4 (1-indexed)

Output: he=4, she=3, hers=4

Code

Python Solution:

from collections import deque, defaultdict

class AhoCorasick:
  def __init__(self):
    self.goto = [{}]          # Trie transitions
    self.fail = [0]           # Failure links
    self.output = [[]]        # Pattern indices ending at each node
    self.node_count = 1

  def add_pattern(self, pattern, index):
    """Add a pattern to the trie with its index."""
    node = 0
    for char in pattern:
      if char not in self.goto[node]:
        self.goto[node][char] = self.node_count
        self.goto.append({})
        self.fail.append(0)
        self.output.append([])
        self.node_count += 1
      node = self.goto[node][char]
    self.output[node].append(index)

  def build(self):
    """Build failure links using BFS."""
    queue = deque()

    # Initialize: depth-1 nodes fail to root
    for char, node in self.goto[0].items():
      queue.append(node)
      # fail[node] = 0 already set

    # BFS to compute failure links
    while queue:
      curr = queue.popleft()

      for char, next_node in self.goto[curr].items():
        queue.append(next_node)

        # Find failure link for next_node
        fail_state = self.fail[curr]
        while fail_state and char not in self.goto[fail_state]:
          fail_state = self.fail[fail_state]

        self.fail[next_node] = self.goto[fail_state].get(char, 0)

        # Merge output from failure link
        if self.fail[next_node] != next_node:
          self.output[next_node] += self.output[self.fail[next_node]]

  def search(self, text, pattern_lengths):
    """Find first occurrence of each pattern (1-indexed, -1 if not found)."""
    result = [-1] * len(pattern_lengths)
    node = 0

    for i, char in enumerate(text):
      while node and char not in self.goto[node]:
        node = self.fail[node]
      node = self.goto[node].get(char, 0)

      # Check outputs via failure chain
      temp = node
      while temp:
        for idx in self.output[temp]:
          if result[idx] == -1:
            result[idx] = i - pattern_lengths[idx] + 2  # 1-indexed start
        temp = self.fail[temp]

    return result


def solve(text, patterns):
  """Find first occurrence of each pattern using Aho-Corasick."""
  ac = AhoCorasick()
  lengths = [len(p) for p in patterns]
  for i, p in enumerate(patterns):
    ac.add_pattern(p, i)
  ac.build()
  return ac.search(text, lengths)


def main():
  text = input().strip()
  k = int(input().strip())
  patterns = [input().strip() for _ in range(k)]

  result = solve(text, patterns)
  for pos in result:
    print(pos)


if __name__ == "__main__":
  main()

Complexity

Metric	Value	Explanation
Time	O(n + m + z)	n = text length, m = total pattern length, z = matches
Space	O(m * SIGMA)	Trie storage, SIGMA = alphabet size (26)

Common Mistakes

Mistake	Problem	Fix
Only check current node outputs	Shorter patterns via failure link are missed	Traverse entire failure chain for outputs
Use end position as start	Pattern “abc” at i=5 starts at 3, not 5	`start = i - len(pattern) + 1`
Direct child access without check	KeyError when character not in trie	Follow failure links until found or root

# WRONG: Only current node    |  # CORRECT: Follow failure chain
for idx in out[node]: ...     |  temp = node
              |  while temp:
              |      for idx in out[temp]: ...
              |      temp = fail[temp]

Edge Cases

Case	Input	Expected	Why
No patterns match	text=”xyz”, patterns=[“abc”]	-1	Pattern not in text
Pattern equals text	text=”abc”, patterns=[“abc”]	1	Exact match at position 1
Overlapping patterns	text=”abab”, patterns=[“ab”,”bab”]	1, 2	Both found at different positions
Pattern is prefix of another	text=”abcd”, patterns=[“ab”,”abc”]	1, 1	Both start at same position
Single character patterns	text=”aaa”, patterns=[“a”]	1	First occurrence
Duplicate patterns	text=”abc”, patterns=[“ab”,”ab”]	1, 1	Same result for duplicates
Pattern longer than text	text=”ab”, patterns=[“abc”]	-1	Cannot match

When to Use This Pattern

Scenario	Best Algorithm
Multiple patterns (k > 10)	Aho-Corasick
Single pattern	KMP or Z-algorithm
All patterns same length	Rabin-Karp with rolling hash
Approximate matching	Edit distance DP
Streaming text + multiple patterns	Aho-Corasick (stateful)

Level	Problem	Key Concept
Easier	String Matching	Single pattern KMP
Easier	Finding Borders	LPS/failure function
Similar	Counting Patterns	Count instead of position
Similar	Word Combinations	DP + Aho-Corasick
Harder	Stream of Characters	Reverse Aho-Corasick
Harder	Word Search II	Trie + backtracking

Key Takeaways

The Core Idea: Build a trie with failure links to search for all patterns in a single text scan.
Time Optimization: From O(n * k) with separate KMP to O(n + m + z) with Aho-Corasick.
Space Trade-off: O(m) space for trie enables linear-time multi-pattern matching.
Pattern: Failure links generalize KMP’s LPS array to multiple patterns.

Practice Checklist

Explain why Aho-Corasick beats running KMP k times
Build trie and compute failure links using BFS
Trace the automaton on a sample input
Handle edge cases (overlapping patterns, no matches)

Additional Resources

CP-Algorithms: Aho-Corasick
CSES Pattern Positions - Multi-pattern string matching

Pattern Positions

Problem Overview

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Why Aho-Corasick?

Solution 1: Brute Force (KMP per Pattern)

Idea

Algorithm

Code

Complexity

Why This Works (But Is Slow)

Solution 2: Aho-Corasick Algorithm (Optimal)

Key Insight

Dry Run Example

Code

Complexity

Common Mistakes

Edge Cases

When to Use This Pattern

Related Problems

Key Takeaways

Practice Checklist

Additional Resources