Substring Order I

Problem Overview

Attribute	Value
CSES Link	https://cses.fi/problemset/task/2108
Difficulty	Hard
Category	String Algorithms
Time Limit	1 second
Key Technique	Suffix Array + LCP Array

Learning Goals

After solving this problem, you will be able to:

Build a suffix array and understand its lexicographical properties
Construct an LCP (Longest Common Prefix) array efficiently
Use LCP to count distinct substrings without duplication
Apply binary search to find the k-th element in a cumulative sum array

Problem Statement

Problem: Given a string and a value k, find the k-th lexicographically smallest distinct substring.

Input:

Line 1: A string s of lowercase letters
Line 2: An integer k

Output:

The k-th lexicographically smallest distinct substring

Constraints:

1 <= s <= 10^5
1 <= k <= number of distinct substrings

Example

Input:
ababab
3

Output:
aba

Explanation: The distinct substrings sorted lexicographically:

“a”
“ab”
“aba” <-- k=3
“abab”
“ababab”
“b”
“ba”
“bab”
“baba”
“babab”

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How do we efficiently enumerate all distinct substrings in sorted order?

The suffix array gives us all suffixes sorted lexicographically. Each suffix contributes multiple substrings (its prefixes). The LCP array tells us how many of those prefixes are duplicates from the previous suffix.

Breaking Down the Problem

What are we looking for? The k-th smallest distinct substring
What information do we have? A string and a position k
What’s the relationship? Suffix array orders suffixes; each suffix’s prefixes are substrings in sorted order

Key Insight

For suffix at position i in suffix array:
- Total prefixes = n - SA[i]  (all substrings starting at SA[i])
- Duplicate prefixes = LCP[i]  (shared with previous suffix)
- New distinct substrings = (n - SA[i]) - LCP[i]

Solution 1: Brute Force

Idea

Generate all substrings, remove duplicates, sort them, and return the k-th one.

Code

def brute_force(s, k):
  """
  Time: O(n^2 log n) for generation and sorting
  Space: O(n^2) to store all substrings
  """
  substrings = set()
  n = len(s)
  for i in range(n):
    for j in range(i + 1, n + 1):
      substrings.add(s[i:j])

  sorted_subs = sorted(substrings)
  return sorted_subs[k - 1]

Complexity

Metric	Value	Explanation
Time	O(n^2 log n)	Generate O(n^2) substrings, sort them
Space	O(n^2)	Store all distinct substrings

Why this is too slow: For n = 10^5, we cannot store 10^10 substrings.

Solution 2: Suffix Array + LCP (Optimal)

Key Insight

The Trick: Use suffix array for sorted order and LCP to skip duplicates. Binary search finds which suffix contains the k-th substring.

Algorithm Overview

Build suffix array SA[] - gives suffixes in sorted order
Build LCP array - tells overlap with previous suffix
For each suffix i: new substrings = (n - SA[i]) - LCP[i]
Build cumulative count array
Binary search to find which suffix contains k-th substring
Extract the substring at the correct offset

Dry Run Example

Let’s trace through s = "aba", k = 3:

Step 1: Build Suffix Array
Suffixes:     Sorted:
0: "aba"      1: "a"    (SA[0] = 2)
1: "ba"       0: "aba"  (SA[1] = 0)
2: "a"        1: "ba"   (SA[2] = 1)

SA = [2, 0, 1]

Step 2: Build LCP Array
LCP[0] = 0  (no previous suffix)
LCP[1] = 1  (lcp("a", "aba") = "a" -> length 1)
LCP[2] = 0  (lcp("aba", "ba") = "" -> length 0)

LCP = [0, 1, 0]

Step 3: Count New Substrings per Suffix
i=0: suffix "a",   new = (3-2) - 0 = 1  -> adds "a"
i=1: suffix "aba", new = (3-0) - 1 = 2  -> adds "ab", "aba"
i=2: suffix "ba",  new = (3-1) - 0 = 2  -> adds "b", "ba"

Cumulative: [1, 3, 5]

Step 4: Find k=3
Binary search: k=3 falls in suffix i=1 (cum[1]=3)
Offset within suffix = 3 - 1 = 2  (1 is cum[0])
Length = LCP[1] + offset = 1 + 2 = 3

Result: s[SA[1] : SA[1]+3] = s[0:3] = "aba"

Visual Diagram

String: "aba" (n=3)

Suffix Array (sorted suffixes):
+-------+--------+------+-------+------------------+

| Index | SA[i]  | LCP  | Suffix| New Substrings   |
+-------+--------+------+-------+------------------+

|   0   |   2    |  0   | "a"   | 1: "a"           |
|   1   |   0    |  1   | "aba" | 2: "ab", "aba"   |
|   2   |   1    |  0   | "ba"  | 2: "b", "ba"     |
+-------+--------+------+-------+------------------+

Cumulative count: [1, 3, 5]
               k=1  k=2,3  k=4,5

For k=3: Find suffix 1, extract "aba"

Code (Python)

def build_suffix_array(s):
  """Build suffix array using O(n log n) algorithm."""
  n = len(s)
  sa = list(range(n))
  rank = [ord(c) for c in s]
  tmp = [0] * n
  k = 1

  while k < n:
    def key(i):
      return (rank[i], rank[i + k] if i + k < n else -1)
    sa.sort(key=key)

    tmp[sa[0]] = 0
    for i in range(1, n):
      tmp[sa[i]] = tmp[sa[i-1]]
      if key(sa[i]) != key(sa[i-1]):
        tmp[sa[i]] += 1
    rank = tmp[:]
    k *= 1

  return sa

def build_lcp(s, sa):
  """Build LCP array using Kasai's algorithm - O(n)."""
  n = len(s)
  rank = [0] * n
  for i in range(n):
    rank[sa[i]] = i

  lcp = [0] * n
  h = 0
  for i in range(n):
    if rank[i] > 0:
      j = sa[rank[i] - 1]
      while i + h < n and j + h < n and s[i + h] == s[j + h]:
        h += 1
      lcp[rank[i]] = h
      if h > 0:
        h -= 1
  return lcp

def solve(s, k):
  """
  Find k-th lexicographically smallest distinct substring.

  Time: O(n log n) for SA + O(n) for LCP + O(log n) for query
  Space: O(n)
  """
  n = len(s)
  sa = build_suffix_array(s)
  lcp = build_lcp(s, sa)

  # Build cumulative count of new substrings
  cum = []
  total = 0
  for i in range(n):
    new_count = (n - sa[i]) - lcp[i]
    total += new_count
    cum.append(total)

  # Binary search for the suffix containing k-th substring
  lo, hi = 0, n - 1
  while lo < hi:
    mid = (lo + hi) // 2
    if cum[mid] < k:
      lo = mid + 1
    else:
      hi = mid

  # Calculate substring length
  prev_cum = cum[lo - 1] if lo > 0 else 0
  offset = k - prev_cum  # Position within this suffix's new substrings
  length = lcp[lo] + offset

  return s[sa[lo]:sa[lo] + length]

# Main
s = input().strip()
k = int(input())
print(solve(s, k))

Complexity

Metric	Value	Explanation
Time	O(n log n)	SA construction dominates
Space	O(n)	SA, LCP, and cumulative arrays

Common Mistakes

Mistake 1: Integer Overflow

Problem: With n = 10^5, total distinct substrings can reach ~5 * 10^9. Fix: Use long long for cumulative counts.

Mistake 2: Off-by-One in Length Calculation

# WRONG
length = lcp[lo] + offset - 1

# CORRECT
length = lcp[lo] + offset

Problem: The offset represents “which new substring” (1-indexed within the suffix). Fix: Add offset directly to LCP without subtracting 1.

Mistake 3: Forgetting LCP[0] = 0

Edge Cases

Case	Input	Expected Output	Why
Single char	`"a", k=1`	`"a"`	Only one substring
All same chars	`"aaa", k=3`	`"aaa"`	Only 3 distinct: a, aa, aaa
k = 1	`"cab", k=1`	`"a"`	Smallest is lexicographically first char
Max k	`"ab", k=3`	`"b"`	Substrings: a, ab, b

When to Use This Pattern

Use This Approach When:

Finding k-th lexicographically smallest substring
Counting or enumerating distinct substrings
Problems involving lexicographical ordering of substrings
Need to avoid O(n^2) substring generation

Don’t Use When:

Simple pattern matching (use KMP or Z-algorithm)
Finding longest common substring between two strings (use different SA approach)
String has special structure that allows simpler solution

Pattern Recognition Checklist:

Need substrings in sorted order? -> Suffix Array
Need to count distinct substrings? -> SA + LCP
Need k-th element in ordered set? -> Binary search on cumulative counts

Similar Difficulty (CSES)

Problem	Key Difference
Substring Order II	Counts duplicates (non-distinct)
Distinct Substrings	Count only, no need to find k-th
Repeating Substring	Find longest repeating via LCP

Prerequisite Problems (CSES)

Problem	Why It Helps
String Matching	Basic string algorithm foundation
Finding Borders	Understanding string structure

Harder Extensions

Problem	New Concept
Substring Distribution	Count substrings by length
Finding Patterns	Multiple pattern matching

Key Takeaways

Core Idea: Suffix array gives sorted suffixes; each suffix’s prefixes are substrings in sorted order
LCP Optimization: LCP[i] tells how many prefixes of suffix i are duplicates from suffix i-1
Counting Formula: New distinct substrings from suffix i = (n - SA[i]) - LCP[i]
Binary Search: Use cumulative counts to locate which suffix contains the k-th substring

Practice Checklist

Before moving on, make sure you can:

Implement suffix array construction from scratch
Build LCP array using Kasai’s algorithm
Explain why LCP helps count distinct substrings
Derive the substring extraction formula given suffix index and offset
Handle edge cases with single characters and repeated patterns

Substring Order I

Problem Overview

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Breaking Down the Problem

Key Insight

Solution 1: Brute Force

Idea

Code

Complexity

Solution 2: Suffix Array + LCP (Optimal)

Key Insight

Algorithm Overview

Dry Run Example

Visual Diagram

Code (Python)

Complexity

Common Mistakes

Mistake 1: Integer Overflow

Mistake 2: Off-by-One in Length Calculation

Mistake 3: Forgetting LCP[0] = 0

Edge Cases

When to Use This Pattern

Use This Approach When:

Don’t Use When:

Pattern Recognition Checklist:

Related Problems

Similar Difficulty (CSES)

Prerequisite Problems (CSES)

Harder Extensions

Key Takeaways

Practice Checklist