String Algorithms

Welcome to the String Algorithms section! This category covers fundamental and advanced algorithms for text processing and pattern matching.

🎯 Visual Example

String Algorithm Techniques Overview

1. Pattern Matching:
   Text:    "ababcababc"
   Pattern: "ababc"
   Result:  Found at positions 0, 5

2. String Rotation:
   Original: "abacaba"
   Rotations: "bacabaa", "acabaab", "cabaaba", ...
   Minimal:   "aabacab"

3. Palindrome Detection:
   String: "racecar"
   Check:  r-a-c-e-c-a-r
   Result: True (palindrome)

4. String Compression:
   Input:  "aaabbbcc"
   Output: "a3b3c2"

Algorithm Complexity Comparison

Algorithm          | Time    | Space   | Use Case
-------------------|---------|---------|------------------
KMP                | O(n+m)  | O(m)    | Pattern matching
Z-Algorithm        | O(n+m)  | O(n+m)  | Border detection
Manacher's         | O(n)    | O(n)    | Palindrome
Booth's            | O(n)    | O(n)    | Minimal rotation
Run-length         | O(n)    | O(n)    | Compression

Key Concepts & Techniques

Basic String Operations

String Comparison

When to use: Lexicographic ordering, sorting strings
Time: O(min( s1 , s2 ))
Implementation: Compare character by character
Applications: String sorting, binary search on strings

Substring Operations

When to use: Extracting parts of strings, pattern matching
Time: O(k) for substring of length k
Implementation: Use string slicing or character iteration
Applications: Text processing, data extraction

String Concatenation

When to use: Building strings from parts
Time: O( s1 + s2 )
Implementation: Use string concatenation operators
Applications: String construction, text generation

String Rotation

When to use: Cyclic shifts, finding minimal rotation
Time: O(n) for rotation, O(n) for minimal rotation
Implementation: Use modular arithmetic for indices
Applications: String matching, lexicographic ordering

Pattern Matching Algorithms

Knuth-Morris-Pratt (KMP) Algorithm

When to use: Single pattern matching, finding all occurrences
Time: O(n + m) where n is text length, m is pattern length
Space: O(m) for failure function
Applications: Text search, DNA sequence analysis
Implementation: Precompute failure function, then match

Z Algorithm

When to use: Pattern matching, finding borders, periods
Time: O(n + m)
Space: O(n + m)
Applications: String matching, border detection
Implementation: Compute Z-array using previous values

Rabin-Karp Algorithm

When to use: Multiple pattern matching, rolling hash
Time: O(n + m) average, O(nm) worst case
Space: O(1) additional space
Applications: Plagiarism detection, multiple pattern search
Implementation: Rolling hash with collision handling

Boyer-Moore Algorithm

When to use: Single pattern matching in long texts
Time: O(n/m) best case, O(nm) worst case
Space: O(m)
Applications: Text editors, search engines
Implementation: Bad character rule and good suffix rule

Advanced String Techniques

String Hashing

When to use: Fast string comparison, rolling hash
Time: O(n) preprocessing, O(1) comparison
Space: O(n)
Applications: String matching, duplicate detection
Implementation: Polynomial rolling hash with large prime

Suffix Array

When to use: String indexing, multiple pattern matching
Time: O(n log n) construction, O(m log n) per query
Space: O(n)
Applications: Genome analysis, text compression
Implementation: Sort all suffixes, use binary search

Suffix Tree

When to use: Multiple pattern matching, string analysis
Time: O(n) construction, O(m) per query
Space: O(n)
Applications: Bioinformatics, text processing
Implementation: Ukkonen’s algorithm

Suffix Automaton

When to use: String analysis, pattern matching
Time: O(n) construction, O(m) per query
Space: O(n)
Applications: String processing, pattern analysis
Implementation: Incremental construction

Specialized String Algorithms

Manacher’s Algorithm

When to use: Finding all palindromes, longest palindrome
Time: O(n)
Space: O(n)
Applications: Palindrome detection, string analysis
Implementation: Expand around centers with optimization

Aho-Corasick Algorithm

When to use: Multiple pattern matching
Time: O(n + m + z) where z is number of matches
Space: O(m)
Applications: Virus scanning, text mining
Implementation: Trie with failure links

Trie (Prefix Tree)

When to use: Prefix matching, dictionary operations
Time: O(m) per operation where m is string length
Space: O(ALPHABET_SIZE * N * M)
Applications: Autocomplete, spell checkers
Implementation: Tree with character-based edges

Rolling Hash

When to use: Sliding window problems, string comparison
Time: O(1) per update
Space: O(1)
Applications: Pattern matching, duplicate detection
Implementation: Update hash using previous value

String Analysis Techniques

Border Detection

When to use: Finding string borders, period detection
Time: O(n)
Space: O(n)
Applications: String compression, pattern analysis
Implementation: Use KMP failure function or Z-array

Period Detection

When to use: Finding repeating patterns, string compression
Time: O(n)
Space: O(n)
Applications: Data compression, pattern recognition
Implementation: Use border properties or Z-array

Palindrome Detection

When to use: Finding palindromic substrings
Time: O(n²) naive, O(n) with Manacher’s
Space: O(n)
Applications: String analysis, text processing
Implementation: Expand around centers or use Manacher’s

String Compression

When to use: Reducing string size, pattern recognition
Time: O(n)
Space: O(n)
Applications: Data compression, text analysis
Implementation: Run-length encoding or pattern-based compression

Optimization Techniques

Space Optimization

Rolling Arrays: When only previous values needed
- When to use: DP problems on strings
- Example: Edit distance with rolling array
In-place Processing: Modify string in place
- When to use: When original string not needed
- Example: Remove duplicates in place

Time Optimization

Preprocessing: Compute values once
- When to use: Multiple queries on same string
- Example: Suffix array for multiple pattern searches
Early Termination: Stop when condition met
- When to use: When exact match not needed
- Example: Stop KMP when first match found

Memory Optimization

String Interning: Share common strings
- When to use: Many duplicate strings
- Example: Dictionary with string interning
Lazy Evaluation: Compute on demand
- When to use: When not all values needed
- Example: Lazy suffix array construction

Problem Categories

String Properties

Finding Borders - Border detection in strings
Finding Periods - Periodic patterns
Minimal Rotation - Lexicographically minimal rotation

Pattern Matching

String Matching - Basic pattern matching
Pattern Positions - Find all pattern occurrences
Longest Palindrome - Palindrome detection

String Construction

Word Combinations - Dictionary-based construction
Repeating Substring - Finding repetitions

Advanced String Problems

Distinct Strings - Counting unique strings
Distinct Substrings - Counting unique substrings

Detailed Examples and Implementations

Classic String Algorithms with Code

1. Knuth-Morris-Pratt (KMP) Algorithm

def kmp_search(text, pattern):
  def build_failure_function(pattern):
    m = len(pattern)
    failure = [0] * m
    j = 0
    
    for i in range(1, m):
      while j > 0 and pattern[i] != pattern[j]:
        j = failure[j - 1]
      
      if pattern[i] == pattern[j]:
        j += 1
      
      failure[i] = j
    
    return failure
  
  n, m = len(text), len(pattern)
  if m == 0:
    return 0
  
  failure = build_failure_function(pattern)
  j = 0
  
  for i in range(n):
    while j > 0 and text[i] != pattern[j]:
      j = failure[j - 1]
    
    if text[i] == pattern[j]:
      j += 1
    
    if j == m:
      return i - m + 1
  
  return -1

def kmp_find_all(text, pattern):
  def build_failure_function(pattern):
    m = len(pattern)
    failure = [0] * m
    j = 0
    
    for i in range(1, m):
      while j > 0 and pattern[i] != pattern[j]:
        j = failure[j - 1]
      
      if pattern[i] == pattern[j]:
        j += 1
      
      failure[i] = j
    
    return failure
  
  n, m = len(text), len(pattern)
  if m == 0:
    return list(range(n + 1))
  
  failure = build_failure_function(pattern)
  j = 0
  matches = []
  
  for i in range(n):
    while j > 0 and text[i] != pattern[j]:
      j = failure[j - 1]
    
    if text[i] == pattern[j]:
      j += 1
    
    if j == m:
      matches.append(i - m + 1)
      j = failure[j - 1]
  
  return matches

2. Z Algorithm

def z_algorithm(text):
  n = len(text)
  z = [0] * n
  l, r = 0, 0
  
  for i in range(1, n):
    if i <= r:
      z[i] = min(r - i + 1, z[i - l])
    
    while i + z[i] < n and text[z[i]] == text[i + z[i]]:
      z[i] += 1
    
    if i + z[i] - 1 > r:
      l, r = i, i + z[i] - 1
  
  return z

def z_search(text, pattern):
  combined = pattern + '$' + text
  z = z_algorithm(combined)
  m = len(pattern)
  matches = []
  
  for i in range(m + 1, len(combined)):
    if z[i] == m:
      matches.append(i - m - 1)
  
  return matches

3. Rabin-Karp Algorithm

def rabin_karp(text, pattern, base=256, mod=10**9 + 7):
  n, m = len(text), len(pattern)
  if m == 0:
    return 0
  if m > n:
    return -1
  
  # Calculate hash of pattern and first window of text
  pattern_hash = 0
  text_hash = 0
  h = 1
  
  # Calculate h = base^(m-1) % mod
  for i in range(m - 1):
    h = (h * base) % mod
  
  # Calculate initial hashes
  for i in range(m):
    pattern_hash = (base * pattern_hash + ord(pattern[i])) % mod
    text_hash = (base * text_hash + ord(text[i])) % mod
  
  # Slide the pattern over text
  for i in range(n - m + 1):
    if pattern_hash == text_hash:
      # Check character by character
      if text[i:i+m] == pattern:
        return i
    
    if i < n - m:
      # Remove leading digit, add trailing digit
      text_hash = (base * (text_hash - ord(text[i]) * h) + ord(text[i + m])) % mod
      if text_hash < 0:
        text_hash += mod
  
  return -1

def rabin_karp_multiple_patterns(text, patterns, base=256, mod=10**9 + 7):
  n = len(text)
  results = {}
  
  for pattern in patterns:
    m = len(pattern)
    if m == 0:
      results[pattern] = [0]
      continue
    if m > n:
      results[pattern] = []
      continue
    
    pattern_hash = 0
    text_hash = 0
    h = 1
    
    for i in range(m - 1):
      h = (h * base) % mod
    
    for i in range(m):
      pattern_hash = (base * pattern_hash + ord(pattern[i])) % mod
      text_hash = (base * text_hash + ord(text[i])) % mod
    
    matches = []
    for i in range(n - m + 1):
      if pattern_hash == text_hash:
        if text[i:i+m] == pattern:
          matches.append(i)
      
      if i < n - m:
        text_hash = (base * (text_hash - ord(text[i]) * h) + ord(text[i + m])) % mod
        if text_hash < 0:
          text_hash += mod
    
    results[pattern] = matches
  
  return results

4. Manacher’s Algorithm for Palindromes

def manacher_algorithm(text):
  # Transform string to handle even-length palindromes
  transformed = '#'.join('^{}$'.format(text))
  n = len(transformed)
  p = [0] * n
  center, right = 0, 0
  
  for i in range(1, n - 1):
    # Mirror index
    mirror = 2 * center - i
    
    if i < right:
      p[i] = min(right - i, p[mirror])
    
    # Try to expand palindrome centered at i
    try:
      while transformed[i + (1 + p[i])] == transformed[i - (1 + p[i])]:
        p[i] += 1
    except IndexError:
      pass
    
    # Update center and right if palindrome extends beyond right
    if i + p[i] > right:
      center, right = i, i + p[i]
  
  # Find maximum length and center
  max_len = 0
  center_index = 0
  for i in range(1, n - 1):
    if p[i] > max_len:
      max_len = p[i]
      center_index = i
  
  start = (center_index - max_len) // 2
  return text[start:start + max_len]

def find_all_palindromes(text):
  transformed = '#'.join('^{}$'.format(text))
  n = len(transformed)
  p = [0] * n
  center, right = 0, 0
  palindromes = []
  
  for i in range(1, n - 1):
    mirror = 2 * center - i
    
    if i < right:
      p[i] = min(right - i, p[mirror])
    
    try:
      while transformed[i + (1 + p[i])] == transformed[i - (1 + p[i])]:
        p[i] += 1
    except IndexError:
      pass
    
    if i + p[i] > right:
      center, right = i, i + p[i]
    
    # Store palindrome if length > 1
    if p[i] > 0:
      start = (i - p[i]) // 2
      length = p[i]
      palindromes.append((start, start + length))
  
  return palindromes

Advanced String Data Structures

1. Suffix Array Construction

def build_suffix_array(text):
  n = len(text)
  k = 1
  c = [ord(text[i]) for i in range(n)]
  sa = list(range(n))
  
  while k < n:
    # Sort suffixes by first k characters
    sa.sort(key=lambda x: (c[x], c[x + k]) if x + k < n else (c[x], -1))
    
    # Update equivalence classes
    new_c = [0] * n
    new_c[sa[0]] = 0
    
    for i in range(1, n):
      curr = (c[sa[i]], c[sa[i] + k]) if sa[i] + k < n else (c[sa[i]], -1)
      prev = (c[sa[i-1]], c[sa[i-1] + k]) if sa[i-1] + k < n else (c[sa[i-1]], -1)
      
      if curr == prev:
        new_c[sa[i]] = new_c[sa[i-1]]
      else:
        new_c[sa[i]] = new_c[sa[i-1]] + 1
    
    c = new_c
    k *= 2
  
  return sa

def build_lcp_array(text, sa):
  n = len(text)
  lcp = [0] * n
  rank = [0] * n
  
  for i in range(n):
    rank[sa[i]] = i
  
  k = 0
  for i in range(n):
    if rank[i] == n - 1:
      k = 0
      continue
    
    j = sa[rank[i] + 1]
    while i + k < n and j + k < n and text[i + k] == text[j + k]:
      k += 1
    
    lcp[rank[i]] = k
    if k > 0:
      k -= 1
  
  return lcp

2. Trie Implementation

class TrieNode:
  def __init__(self):
    self.children = {}
    self.is_end_of_word = False
    self.count = 0

class Trie:
  def __init__(self):
    self.root = TrieNode()
  
  def insert(self, word):
    node = self.root
    for char in word:
      if char not in node.children:
        node.children[char] = TrieNode()
      node = node.children[char]
      node.count += 1
    node.is_end_of_word = True
  
  def search(self, word):
    node = self.root
    for char in word:
      if char not in node.children:
        return False
      node = node.children[char]
    return node.is_end_of_word
  
  def starts_with(self, prefix):
    node = self.root
    for char in prefix:
      if char not in node.children:
        return False
      node = node.children[char]
    return True
  
  def count_prefix(self, prefix):
    node = self.root
    for char in prefix:
      if char not in node.children:
        return 0
      node = node.children[char]
    return node.count
  
  def delete(self, word):
    def delete_helper(node, word, index):
      if index == len(word):
        if not node.is_end_of_word:
          return False
        node.is_end_of_word = False
        return len(node.children) == 0
      
      char = word[index]
      if char not in node.children:
        return False
      
      should_delete = delete_helper(node.children[char], word, index + 1)
      
      if should_delete:
        del node.children[char]
        return len(node.children) == 0 and not node.is_end_of_word
      
      return False
    
    delete_helper(self.root, word, 0)

3. Aho-Corasick Algorithm

class AhoCorasickNode:
  def __init__(self):
    self.children = {}
    self.failure = None
    self.output = []
    self.is_end = False

class AhoCorasick:
  def __init__(self):
    self.root = AhoCorasickNode()
  
  def add_pattern(self, pattern):
    node = self.root
    for char in pattern:
      if char not in node.children:
        node.children[char] = AhoCorasickNode()
      node = node.children[char]
    node.is_end = True
    node.output.append(pattern)
  
  def build_failure_links(self):
    queue = []
    
    # Initialize failure links for first level
    for char, child in self.root.children.items():
      child.failure = self.root
      queue.append(child)
    
    # Build failure links for remaining levels
    while queue:
      current = queue.pop(0)
      
      for char, child in current.children.items():
        queue.append(child)
        failure = current.failure
        
        while failure is not None and char not in failure.children:
          failure = failure.failure
        
        if failure is not None:
          child.failure = failure.children[char]
        else:
          child.failure = self.root
        
        # Merge output sets
        child.output.extend(child.failure.output)
  
  def search(self, text):
    results = []
    current = self.root
    
    for i, char in enumerate(text):
      # Follow failure links until we find a match or reach root
      while current is not None and char not in current.children:
        current = current.failure
      
      if current is not None:
        current = current.children[char]
      else:
        current = self.root
      
      # Collect all matches at current position
      for pattern in current.output:
        results.append((i - len(pattern) + 1, pattern))
    
    return results

String Processing Techniques

1. Rolling Hash Implementation

class RollingHash:
  def __init__(self, text, base=256, mod=10**9 + 7):
    self.text = text
    self.base = base
    self.mod = mod
    self.n = len(text)
    self.powers = [1] * (self.n + 1)
    
    # Precompute powers
    for i in range(1, self.n + 1):
      self.powers[i] = (self.powers[i-1] * base) % mod
    
    # Precompute prefix hashes
    self.prefix_hashes = [0] * (self.n + 1)
    for i in range(self.n):
      self.prefix_hashes[i+1] = (self.prefix_hashes[i] * base + ord(text[i])) % mod
  
  def get_hash(self, start, end):
    if start >= end:
      return 0
    
    hash_value = (self.prefix_hashes[end] - 
          self.prefix_hashes[start] * self.powers[end - start]) % self.mod
    
    return hash_value if hash_value >= 0 else hash_value + self.mod
  
  def get_substring_hash(self, start, length):
    return self.get_hash(start, start + length)

2. String Compression

def run_length_encoding(text):
  if not text:
    return ""
  
  result = []
  current_char = text[0]
  count = 1
  
  for i in range(1, len(text)):
    if text[i] == current_char:
      count += 1
    else:
      result.append(f"{count}{current_char}")
      current_char = text[i]
      count = 1
  
  result.append(f"{count}{current_char}")
  return "".join(result)

def run_length_decoding(encoded):
  if not encoded:
    return ""
  
  result = []
  i = 0
  
  while i < len(encoded):
    # Extract count
    count_str = ""
    while i < len(encoded) and encoded[i].isdigit():
      count_str += encoded[i]
      i += 1
    
    if i < len(encoded):
      char = encoded[i]
      count = int(count_str)
      result.append(char * count)
      i += 1
  
  return "".join(result)

def lz77_compression(text, window_size=4096, lookahead_size=18):
  compressed = []
  i = 0
  
  while i < len(text):
    # Find longest match in sliding window
    best_match = (0, 0)  # (offset, length)
    
    # Search in sliding window
    start = max(0, i - window_size)
    for j in range(start, i):
      match_length = 0
      while (match_length < lookahead_size and 
        i + match_length < len(text) and
        j + match_length < i and
        text[j + match_length] == text[i + match_length]):
        match_length += 1
      
      if match_length > best_match[1]:
        best_match = (i - j, match_length)
    
    if best_match[1] > 0:
      # Encode as (offset, length, next_char)
      next_char = text[i + best_match[1]] if i + best_match[1] < len(text) else ""
      compressed.append((best_match[0], best_match[1], next_char))
      i += best_match[1] + 1
    else:
      # Encode as literal
      compressed.append((0, 0, text[i]))
      i += 1
  
  return compressed

Advanced String Analysis

1. Period Detection

def find_period(text):
  n = len(text)
  
  # Check all possible periods
  for period in range(1, n + 1):
    if n % period != 0:
      continue
    
    is_period = True
    for i in range(period, n):
      if text[i] != text[i % period]:
        is_period = False
        break
    
    if is_period:
      return period
  
  return n

def find_all_periods(text):
  n = len(text)
  periods = []
  
  for period in range(1, n + 1):
    if n % period != 0:
      continue
    
    is_period = True
    for i in range(period, n):
      if text[i] != text[i % period]:
        is_period = False
        break
    
    if is_period:
      periods.append(period)
  
  return periods

def border_array(text):
  n = len(text)
  border = [0] * n
  j = 0
  
  for i in range(1, n):
    while j > 0 and text[i] != text[j]:
      j = border[j - 1]
    
    if text[i] == text[j]:
      j += 1
    
    border[i] = j
  
  return border

2. String Rotation and Cyclic Shifts

def is_rotation(s1, s2):
  if len(s1) != len(s2):
    return False
  
  return s2 in s1 + s1

def find_rotation_point(text):
  n = len(text)
  left, right = 0, n - 1
  
  while left < right:
    mid = (left + right) // 2
    
    if text[mid] > text[right]:
      left = mid + 1
    else:
      right = mid
  
  return left

def lexicographically_minimal_rotation(text):
  n = len(text)
  text = text + text  # Double the string
  i, j = 0, 1
  
  while j < n:
    k = 0
    while k < n and text[i + k] == text[j + k]:
      k += 1
    
    if text[i + k] > text[j + k]:
      i = max(i + k + 1, j)
      j = i + 1
    else:
      j = j + k + 1
  
  return text[i:i + n]

Tips for Success

Master Basic Operations: String manipulation
Understand KMP: Essential for pattern matching
Learn Hashing: Fast string comparison
Practice Implementation: Code common algorithms
Study Data Structures: Trie, Suffix Array, etc.
Handle Edge Cases: Empty strings, single characters

Common Pitfalls to Avoid

Time Limits: With naive algorithms
Memory Usage: With large strings
Hash Collisions: In string hashing
Edge Cases: Empty strings, single characters
Index Errors: Off-by-one mistakes
Unicode Issues: Multi-byte characters

Advanced Topics

String Data Structures

Suffix Array: String searching
Suffix Tree: Pattern matching
Trie: Prefix matching
Aho-Corasick: Multiple pattern matching

Optimization Techniques

Rolling Hash: Fast comparison
Z Algorithm: Linear time matching
Suffix Links: Fast transitions
Failure Functions: Pattern matching

Special Cases

Palindromes: Symmetric strings
Periodic Strings: Repeating patterns
Rotations: Cyclic shifts
Substrings: Continuous subsequences

Algorithm Complexities

Pattern Matching

Naive Algorithm: O(nm) time
KMP Algorithm: O(n+m) time
Z Algorithm: O(n+m) time
Rabin-Karp: O(n+m) average time

String Processing

String Hashing: O(n) preprocessing
Suffix Array: O(n log n) construction
Trie: O(n) construction
Manacher’s: O(n) palindrome finding

📚 Additional Learning Resources

LeetCode Pattern Integration

For interview preparation and pattern recognition, complement your CSES learning with these LeetCode resources:

Awesome LeetCode Resources - Comprehensive collection of LeetCode patterns, templates, and curated problem lists
15 LeetCode Patterns - Essential patterns including Sliding Window, Two Pointers, and String Manipulation
String Algorithm Patterns - Specific string processing templates and techniques

Practice these LeetCode problems to reinforce string algorithm concepts:

Sliding Window Pattern: Longest Substring Without Repeating Characters, Minimum Window Substring
Two Pointers Pattern: Valid Palindrome, Reverse String
String Hashing: Repeated String Match, Rabin-Karp Pattern Matching

Ready to start? Begin with Finding Borders and work your way through the problems in order of difficulty!

String Algorithms

🎯 Visual Example

String Algorithm Techniques Overview

Algorithm Complexity Comparison

Key Concepts & Techniques

Basic String Operations

String Comparison

Substring Operations

String Concatenation

String Rotation

Pattern Matching Algorithms

Knuth-Morris-Pratt (KMP) Algorithm

Z Algorithm

Rabin-Karp Algorithm

Boyer-Moore Algorithm

Advanced String Techniques

String Hashing

Suffix Array

Suffix Tree

Suffix Automaton

Specialized String Algorithms

Manacher’s Algorithm

Aho-Corasick Algorithm

Trie (Prefix Tree)

Rolling Hash

String Analysis Techniques

Border Detection

Period Detection

Palindrome Detection

String Compression

Optimization Techniques

Space Optimization

Time Optimization

Memory Optimization

Problem Categories

String Properties

Pattern Matching

String Construction

Advanced String Problems

Detailed Examples and Implementations

Classic String Algorithms with Code

1. Knuth-Morris-Pratt (KMP) Algorithm

2. Z Algorithm

3. Rabin-Karp Algorithm

4. Manacher’s Algorithm for Palindromes

Advanced String Data Structures

1. Suffix Array Construction

2. Trie Implementation

3. Aho-Corasick Algorithm

String Processing Techniques

1. Rolling Hash Implementation

2. String Compression

Advanced String Analysis

1. Period Detection

2. String Rotation and Cyclic Shifts

Tips for Success

Common Pitfalls to Avoid

Advanced Topics

String Data Structures

Optimization Techniques

Special Cases

Algorithm Complexities

Pattern Matching

String Processing

📚 Additional Learning Resources

LeetCode Pattern Integration

Related LeetCode Problems