Substring Order II

Problem Overview

Attribute	Value
Problem Link	CSES 2109 - Substring Order II
Difficulty	Hard
Category	String Algorithms
Time Limit	1 second
Key Technique	Suffix Array + LCP Array + Binary Search

Learning Goals

After solving this problem, you will be able to:

Build suffix arrays efficiently using the doubling algorithm
Compute LCP arrays using Kasai’s algorithm
Count substrings (with repetitions) using suffix array and LCP
Apply binary search on cumulative substring counts
Extract the k-th lexicographically smallest substring

Problem Statement

Problem: Given a string, find the k-th lexicographically smallest substring. Substrings are counted with repetitions - if a substring appears multiple times, it is counted multiple times.

Input:

Line 1: String s (lowercase letters only)
Line 2: Integer k

Output:

The k-th smallest substring

Constraints:

1 <= s <= 10^5
1 <= k <= n(n+1)/2 (guaranteed valid)

Example

Input:
baabaa
10

Output:
baa

Explanation: Substrings in lexicographical order (with repetitions):

“a” (position 2)
“a” (position 3)
“a” (position 5)
“aa” (position 2)
“aa” (position 4 - note: only one ‘a’ follows, so this might vary)
“aab” (position 2)
“aabaa” (position 2)
“abaa” (position 3)
“b” (position 0)
“baa” (position 0) <-- Answer

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How can we enumerate all substrings in sorted order without generating them explicitly?

The suffix array gives us all suffixes in sorted order. Each suffix starting at position i contributes (n - i) substrings (all prefixes of that suffix). These prefixes are automatically sorted because the suffixes are sorted.

Breaking Down the Problem

What are we looking for? The k-th substring in lexicographical order (counting duplicates)
What information do we have? The sorted order of suffixes (via suffix array) and shared prefixes between adjacent suffixes (via LCP array)
What’s the relationship? Each suffix contributes substrings; LCP tells us how many are duplicates with the previous suffix

Key Insight

For suffix at position sa[i] with length len = n - sa[i]:

It contributes len total substrings (all prefixes)
But lcp[i] of them are duplicates of the previous suffix’s prefixes
New substrings contributed = len - lcp[i]

However, for Substring Order II we count WITH repetitions, so each suffix contributes ALL its (n - sa[i]) substrings.

Solution 1: Brute Force

Idea

Generate all substrings, sort them, and return the k-th one.

Code

def solve_brute_force(s, k):
  """
  Generate all substrings, sort, return k-th.

  Time: O(n^3 log n) - generating and sorting all substrings
  Space: O(n^2) - storing all substrings
  """
  n = len(s)
  substrings = []

  for i in range(n):
    for j in range(i + 1, n + 1):
      substrings.append(s[i:j])

  substrings.sort()
  return substrings[k - 1]

Complexity

Metric	Value	Explanation
Time	O(n^3 log n)	O(n^2) substrings, O(n log n) to sort strings
Space	O(n^2)	Store all substrings

Why This Works (But Is Slow)

Correctness is guaranteed since we enumerate everything. But with n = 10^5, we’d have ~10^10 substrings - far too many.

Solution 2: Optimal - Suffix Array + Binary Search

Key Insight

The Trick: Use suffix array to get sorted suffix order. Each suffix contributes substrings in sorted order. Binary search to find which suffix contains the k-th substring.

Algorithm Overview

Build suffix array (sorted positions of all suffixes)
For each suffix at sa[i], it contributes (n - sa[i]) substrings
Compute prefix sums of substring counts
Binary search to find which suffix contains substring k
Extract the appropriate prefix of that suffix

Dry Run Example

Let’s trace through with s = "baa":

String: "baa" (n = 3)

Step 1: Build Suffix Array
  Suffixes:
    Position 0: "baa"
    Position 1: "aa"
    Position 2: "a"

  Sorted order: "a" < "aa" < "baa"
  Suffix Array: sa = [2, 1, 0]

Step 2: Count substrings per suffix
  sa[0] = 2: suffix "a"   -> contributes 1 substring:  "a"
  sa[1] = 1: suffix "aa"  -> contributes 2 substrings: "a", "aa"
  sa[2] = 0: suffix "baa" -> contributes 3 substrings: "b", "ba", "baa"

Step 3: Prefix sums
  cumulative = [1, 3, 6]

  Substring mapping:
    k=1: "a"    (from sa[0])
    k=2: "a"    (from sa[1], length 1)
    k=3: "aa"   (from sa[1], length 2)
    k=4: "b"    (from sa[2], length 1)
    k=5: "ba"   (from sa[2], length 2)
    k=6: "baa"  (from sa[2], length 3)

Step 4: Query k=4
  Binary search: cumulative[2] = 6 >= 4, cumulative[1] = 3 < 4
  So k=4 falls in suffix sa[2] = 0 ("baa")
  Offset within this suffix: 4 - 3 = 1
  Length = 1
  Answer: s[0:1] = "b"

Visual Diagram

Suffix Array for "baabaa":

Position:  0    1    2    3    4    5
String:    b    a    a    b    a    a

Sorted Suffixes:
Rank | SA[i] | Suffix        | Substrings (count = n - SA[i])
-----|-------|---------------|--------------------------------
  0  |   5   | "a"           | 1: "a"
  1  |   2   | "abaa"        | 4: "a", "ab", "aba", "abaa"
  2  |   4   | "aa"          | 2: "a", "aa"
  3  |   1   | "aabaa"       | 5: "a", "aa", "aab", "aaba", "aabaa"
  4  |   3   | "baa"         | 3: "b", "ba", "baa"
  5  |   0   | "baabaa"      | 6: "b", "ba", "baa", "baab", "baaba", "baabaa"

Cumulative: [1, 5, 7, 12, 15, 21]

To find k=10:
- Binary search finds index 4 (cumulative[3]=12 < 10... wait, 12 >= 10)
- Actually: cumulative = [1, 5, 7, 12, 15, 21]
- k=10: cumulative[2]=7 < 10 <= cumulative[3]=12
- Falls in suffix sa[3]=1 ("aabaa")
- Offset = 10 - 7 = 3, so length = 3
- Answer = s[1:1+3] = "aab"

Code (Python)

def build_suffix_array(s):
  """
  Build suffix array using doubling algorithm.
  Time: O(n log n)
  """
  n = len(s)
  sa = list(range(n))
  rank = [ord(c) for c in s]
  tmp = [0] * n
  k = 1

  while k < n:
    def key(i):
      r1 = rank[i]
      r2 = rank[i + k] if i + k < n else -1
      return (r1, r2)

    sa.sort(key=key)

    tmp[sa[0]] = 0
    for i in range(1, n):
      tmp[sa[i]] = tmp[sa[i-1]]
      if key(sa[i]) != key(sa[i-1]):
        tmp[sa[i]] += 1

    rank = tmp[:]
    if rank[sa[n-1]] == n - 1:
      break
    k *= 2

  return sa

def solve(s, k):
  """
  Find k-th lexicographically smallest substring (with repetitions).

  Time: O(n log n)
  Space: O(n)
  """
  n = len(s)
  sa = build_suffix_array(s)

  # Compute prefix sums of substring counts
  # Each suffix sa[i] contributes (n - sa[i]) substrings
  cumulative = []
  total = 0
  for i in range(n):
    total += n - sa[i]
    cumulative.append(total)

  # Binary search for which suffix contains k-th substring
  left, right = 0, n - 1
  while left < right:
    mid = (left + right) // 2
    if cumulative[mid] < k:
      left = mid + 1
    else:
      right = mid

  # Suffix at sa[left] contains our answer
  idx = left
  prev_count = cumulative[idx - 1] if idx > 0 else 0
  offset = k - prev_count  # Which prefix of this suffix (1-indexed)

  start = sa[idx]
  length = offset

  return s[start:start + length]

# Read input and solve
import sys
input = sys.stdin.readline

s = input().strip()
k = int(input())
print(solve(s, k))

Complexity

Metric	Value	Explanation
Time	O(n log n)	Suffix array construction dominates
Space	O(n)	Store suffix array and cumulative counts

Common Mistakes

Mistake 1: Using 32-bit integers for k

Problem: With n = 10^5, total substrings = ~5 * 10^9, exceeding int range. Fix: Use long long for k and cumulative counts.

Mistake 2: Confusing with Substring Order I (distinct)

# WRONG - This counts DISTINCT substrings (Substring Order I)
new_substrings = n - sa[i] - lcp[i]

# CORRECT for Substring Order II - count ALL substrings (with repetitions)
all_substrings = n - sa[i]

Problem: Substring Order I counts distinct substrings (subtract LCP), Order II counts with repetitions. Fix: For Order II, each suffix contributes (n - sa[i]) substrings without LCP subtraction.

Mistake 3: Off-by-one in binary search

# WRONG - Wrong boundary condition
while left <= right:
  mid = (left + right) // 2
  if cumulative[mid] < k:
    left = mid + 1
  else:
    right = mid - 1  # May skip the correct answer

# CORRECT
while left < right:
  mid = (left + right) // 2
  if cumulative[mid] < k:
    left = mid + 1
  else:
    right = mid  # Keep mid as potential answer

Edge Cases

Case	Input	Expected Output	Why
Single character	`s="a", k=1`	`"a"`	Only one substring
k=1	Any string	Smallest character	First in lex order
k=max	`s="ab", k=3`	`"b"`	Substrings: “a”, “ab”, “b”
Repeated characters	`s="aaa", k=6`	`"aaa"`	All substrings exist
Large k	`s="ab", k=3`	`"b"`	Must handle cumulative sums

When to Use This Pattern

Use Suffix Array + Counting When:

Finding k-th smallest/largest substring
Problems involving lexicographical ordering of substrings
Counting substrings with specific properties
String comparison operations at scale

Don’t Use When:

Simple string matching (use KMP or Z-algorithm)
Only need to check substring existence (use hashing)
Working with multiple separate strings (consider suffix automaton)

Pattern Recognition Checklist:

Need to order ALL substrings? -> Suffix Array
Need to count distinct substrings? -> Suffix Array + LCP (Substring Order I)
Need to count with repetitions? -> Suffix Array only (Substring Order II)
Need k-th element? -> Binary search on prefix sums

Prerequisites (Do These First)

Problem	Why It Helps
CSES - Distinct Substrings	Learn suffix array basics
CSES - Substring Order I	Similar but counts distinct

Similar Difficulty

Problem	Key Difference
CSES - Longest Repeating Substring	Use max LCP value
CSES - String Matching	Pattern matching with suffix array

Harder (Do These After)

Problem	New Concept
CSES - Repeating Substring	Finding repeated patterns
CSES - Required Substring	DP + string matching

Key Takeaways

The Core Idea: Suffix array gives sorted order of all suffixes; each suffix’s prefixes are substrings in sorted order
Time Optimization: From O(n^3) brute force to O(n log n) via suffix array
Key Difference from Order I: Order II counts with repetitions (no LCP subtraction)
Pattern: Binary search on cumulative counts to find k-th element

Practice Checklist

Before moving on, make sure you can:

Build a suffix array in O(n log n) time
Explain why suffix array gives lexicographically sorted substrings
Distinguish between Substring Order I (distinct) and II (with repetitions)
Implement binary search on cumulative substring counts
Handle large k values with appropriate data types

Additional Resources

CP-Algorithms: Suffix Array
CP-Algorithms: Kasai’s LCP Algorithm
CSES Substring Order II - K-th smallest substring

Substring Order II

Problem Overview

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Breaking Down the Problem

Key Insight

Solution 1: Brute Force

Idea

Code

Complexity

Why This Works (But Is Slow)

Solution 2: Optimal - Suffix Array + Binary Search

Key Insight

Algorithm Overview

Dry Run Example

Visual Diagram

Code (Python)

Complexity

Common Mistakes

Mistake 1: Using 32-bit integers for k

Mistake 2: Confusing with Substring Order I (distinct)

Mistake 3: Off-by-one in binary search

Edge Cases

When to Use This Pattern

Use Suffix Array + Counting When:

Don’t Use When:

Pattern Recognition Checklist:

Related Problems

Prerequisites (Do These First)

Similar Difficulty

Harder (Do These After)

Key Takeaways

Practice Checklist

Additional Resources