Distinct Values Queries

Problem Overview

Attribute	Value
Difficulty	Hard
Category	Range Queries
Time Limit	1 second
Key Technique	Offline Processing + BIT / Mo’s Algorithm
CSES Link	Distinct Values Queries

Learning Goals

After solving this problem, you will be able to:

Apply offline query processing to transform a hard problem into a manageable one
Use Binary Indexed Tree (BIT) for efficient range sum queries
Track “last occurrence” of elements to avoid double counting
Implement Mo’s algorithm as an alternative approach
Recognize when sorting queries by right endpoint is beneficial

Problem Statement

Problem: Given an array of n integers and q queries, for each query (l, r), count the number of distinct values in the subarray from index l to r (inclusive).

Input:

Line 1: Two integers n and q (array size and number of queries)
Line 2: n integers (the array elements)
Next q lines: Two integers l and r for each query (1-indexed)

Output:

q lines: The number of distinct values in each queried range

Constraints:

1 <= n, q <= 2 * 10^5
1 <= arr[i] <= 10^9
1 <= l <= r <= n

Example

Input:
5 3
3 2 3 1 2
1 3
2 4
1 5

Output:
2
3
3

Explanation:

Query (1,3): Subarray [3, 2, 3] has distinct values {2, 3} = 2 distinct
Query (2,4): Subarray [2, 3, 1] has distinct values {1, 2, 3} = 3 distinct
Query (1,5): Subarray [3, 2, 3, 1, 2] has distinct values {1, 2, 3} = 3 distinct

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How do we count distinct elements efficiently when we cannot merge segment tree nodes?

The critical insight is that counting distinct values is not decomposable - knowing the distinct count in [l, mid] and [mid+1, r] does not help compute the distinct count in [l, r] because elements may appear in both halves.

Breaking Down the Problem

What are we looking for? Count of unique elements in each range [l, r]
What information do we have? Static array, multiple queries
What’s the relationship between input and output? For each (l, r), count elements appearing at least once in that range

The Key Insight: Last Occurrence Tracking

Instead of counting elements directly, we can ask: “For each position i, does the element at i contribute to the distinct count for range [l, r]?”

An element at position i contributes to range [l, r] if:

l <= i <= r (it’s within the range)
There is no occurrence of the same element at position j where l <= j < i (it’s the first/leftmost occurrence in the range)

Reformulation: If we mark position i with 1 when arr[i]’s last occurrence before or at position r is exactly at position i, then the distinct count for [l, r] = sum of marks in [l, r].

Solution 1: Brute Force

Idea

For each query, iterate through the range and use a set to count distinct elements.

Algorithm

For each query (l, r)
Add all elements in range [l, r] to a set
Return the size of the set

Code

def solve_brute_force(n, arr, queries):
  """
  Brute force: Use a set for each query.

  Time: O(q * n)
  Space: O(n)
  """
  results = []
  for l, r in queries:
    distinct = set()
    for i in range(l - 1, r):  # Convert to 0-indexed
      distinct.add(arr[i])
    results.append(len(distinct))
  return results

Complexity

Metric	Value	Explanation
Time	O(q * n)	Each query scans up to n elements
Space	O(n)	Set can hold up to n elements

Why This Works (But Is Slow)

Correctness is guaranteed because a set inherently stores only unique elements. However, with q and n up to 2 * 10^5, this gives 4 * 10^10 operations - far too slow.

Solution 2: Offline Processing + BIT (Optimal)

Key Insight

The Trick: Process queries sorted by right endpoint. For each position, only mark the rightmost occurrence of each value up to the current position.

When we process queries in order of increasing right endpoint r:

As we move r from 1 to n, we update a BIT where position i has value 1 if arr[i] is the rightmost occurrence of that value among positions 1 to r
For query (l, r), the answer is sum(l, r) in the BIT

Why This Works

When processing a query with right endpoint r:

For each unique value, exactly one position (its rightmost occurrence up to r) has a 1 in the BIT
Elements appearing multiple times are counted exactly once
The range sum [l, r] counts how many “rightmost occurrences” fall within [l, r]

Algorithm

Store queries as (l, r, original_index), sort by r
Maintain a BIT and a map: value -> its current marked position
Process array left to right (position i from 1 to n):
- If arr[i] was previously marked at position p, unmark it (BIT subtract 1 at p)
- Mark position i for arr[i] (BIT add 1 at i)
- Update map: arr[i] -> i
- Answer all queries with r == i: answer = BIT.sum(l, i)
Return answers in original query order

Dry Run Example

Let’s trace through with arr = [3, 2, 3, 1, 2] and queries [(1,3), (2,4), (1,5)]:

Sorted queries by r: [(1,3,0), (2,4,1), (1,5,2)]

Initial state:
  BIT = [0, 0, 0, 0, 0] (conceptually)
  last_pos = {}

Step 1: i = 1, arr[1] = 3
  3 not in last_pos
  BIT.add(1, +1) -> marks position 1
  last_pos = {3: 1}
  BIT state: [1, 0, 0, 0, 0]
  No query with r = 1

Step 2: i = 2, arr[2] = 2
  2 not in last_pos
  BIT.add(2, +1) -> marks position 2
  last_pos = {3: 1, 2: 2}
  BIT state: [1, 1, 0, 0, 0]
  No query with r = 2

Step 3: i = 3, arr[3] = 3
  3 IS in last_pos at position 1
  BIT.add(1, -1) -> unmarks position 1
  BIT.add(3, +1) -> marks position 3
  last_pos = {3: 3, 2: 2}
  BIT state: [0, 1, 1, 0, 0]
  Query (1, 3): sum(1, 3) = 0 + 1 + 1 = 2  [Answer for query 0]

Step 4: i = 4, arr[4] = 1
  1 not in last_pos
  BIT.add(4, +1) -> marks position 4
  last_pos = {3: 3, 2: 2, 1: 4}
  BIT state: [0, 1, 1, 1, 0]
  Query (2, 4): sum(2, 4) = 1 + 1 + 1 = 3  [Answer for query 1]

Step 5: i = 5, arr[5] = 2
  2 IS in last_pos at position 2
  BIT.add(2, -1) -> unmarks position 2
  BIT.add(5, +1) -> marks position 5
  last_pos = {3: 3, 2: 5, 1: 4}
  BIT state: [0, 0, 1, 1, 1]
  Query (1, 5): sum(1, 5) = 0 + 0 + 1 + 1 + 1 = 3  [Answer for query 2]

Final answers in original order: [2, 3, 3]

Visual Diagram

Array:    [3, 2, 3, 1, 2]
Index:     1  2  3  4  5

Processing at i=5 (final state):
  Value 3: last at position 3  -> BIT[3] = 1
  Value 2: last at position 5  -> BIT[5] = 1  (moved from position 2)
  Value 1: last at position 4  -> BIT[4] = 1

BIT:      [0, 0, 1, 1, 1]
           1  2  3  4  5

Query (1,5): sum of BIT[1..5] = 3 distinct values

Code (Python)

class BIT:
  """Binary Indexed Tree for point updates and prefix sums."""
  def __init__(self, n):
    self.n = n
    self.tree = [0] * (n + 1)

  def update(self, i, delta):
    """Add delta to position i (1-indexed)."""
    while i <= self.n:
      self.tree[i] += delta
      i += i & (-i)

  def prefix_sum(self, i):
    """Sum of elements from 1 to i."""
    total = 0
    while i > 0:
      total += self.tree[i]
      i -= i & (-i)
    return total

  def range_sum(self, l, r):
    """Sum of elements from l to r (inclusive)."""
    return self.prefix_sum(r) - self.prefix_sum(l - 1)


def solve_offline_bit(n, arr, queries):
  """
  Offline processing with BIT.

  Time: O((n + q) * log(n))
  Space: O(n + q)
  """
  # Store queries with original index, sort by right endpoint
  indexed_queries = [(l, r, i) for i, (l, r) in enumerate(queries)]
  indexed_queries.sort(key=lambda x: x[1])  # Sort by r

  bit = BIT(n)
  last_pos = {}  # value -> last position where it's marked
  answers = [0] * len(queries)

  query_idx = 0
  for i in range(1, n + 1):
    val = arr[i - 1]  # Convert to 0-indexed array access

    # If this value was marked before, unmark it
    if val in last_pos:
      bit.update(last_pos[val], -1)

    # Mark current position
    bit.update(i, 1)
    last_pos[val] = i

    # Answer all queries with right endpoint == i
    while query_idx < len(indexed_queries) and indexed_queries[query_idx][1] == i:
      l, r, orig_idx = indexed_queries[query_idx]
      answers[orig_idx] = bit.range_sum(l, r)
      query_idx += 1

  return answers


def main():
  import sys
  input_data = sys.stdin.read().split()
  idx = 0

  n, q = int(input_data[idx]), int(input_data[idx + 1])
  idx += 2

  arr = [int(input_data[idx + i]) for i in range(n)]
  idx += n

  queries = []
  for _ in range(q):
    l, r = int(input_data[idx]), int(input_data[idx + 1])
    queries.append((l, r))
    idx += 2

  results = solve_offline_bit(n, arr, queries)
  print('\n'.join(map(str, results)))


if __name__ == "__main__":
  main()

Complexity

Metric	Value	Explanation
Time	O((n + q) * log n)	n BIT updates, q BIT range queries
Space	O(n + q)	BIT array, hash map, query storage

Solution 3: Mo’s Algorithm (Alternative)

Key Insight

The Trick: Process queries in a special order that minimizes pointer movement, allowing O(1) amortized add/remove operations.

Mo’s algorithm divides the array into blocks of size sqrt(n) and sorts queries by (block of l, r). This ensures the total pointer movement is O((n + q) * sqrt(n)).

Algorithm

Choose block size B = sqrt(n)
Sort queries by (l / B, r)
Maintain current range [cur_l, cur_r] and frequency map
For each query, expand/shrink the current range to match the query
Track distinct count as we add/remove elements

Code (Python)

from collections import defaultdict
import math

def solve_mo(n, arr, queries):
  """
  Mo's algorithm for offline range queries.

  Time: O((n + q) * sqrt(n))
  Space: O(n + q)
  """
  block_size = max(1, int(math.sqrt(n)))

  # Store queries with original index
  indexed_queries = [(l, r, i) for i, (l, r) in enumerate(queries)]

  # Sort by (block of l, r)
  indexed_queries.sort(key=lambda x: (x[0] // block_size, x[1]))

  freq = defaultdict(int)
  distinct_count = 0
  cur_l, cur_r = 1, 0  # Empty range initially
  answers = [0] * len(queries)

  def add(idx):
    nonlocal distinct_count
    val = arr[idx - 1]  # Convert to 0-indexed
    if freq[val] == 0:
      distinct_count += 1
    freq[val] += 1

  def remove(idx):
    nonlocal distinct_count
    val = arr[idx - 1]
    freq[val] -= 1
    if freq[val] == 0:
      distinct_count -= 1

  for l, r, orig_idx in indexed_queries:
    # Expand/shrink to reach [l, r]
    while cur_r < r:
      cur_r += 1
      add(cur_r)
    while cur_l > l:
      cur_l -= 1
      add(cur_l)
    while cur_r > r:
      remove(cur_r)
      cur_r -= 1
    while cur_l < l:
      remove(cur_l)
      cur_l += 1

    answers[orig_idx] = distinct_count

  return answers

Complexity (Mo’s Algorithm)

Metric	Value	Explanation
Time	O((n + q) * sqrt(n))	Each pointer moves O(n * sqrt(n)) total
Space	O(n + q)	Frequency map, query storage

Common Mistakes

Mistake 1: Processing Queries Online

# WRONG - Cannot answer each query independently in O(log n)
for l, r in queries:
  # No efficient way to count distinct in [l, r] online
  answer = some_magic_query(l, r)  # Doesn't exist!

Problem: Distinct value counting is not decomposable; segment trees cannot merge distinct counts. Fix: Use offline processing to handle queries in a specific order.

Mistake 2: Forgetting to Unmark Previous Occurrence

# WRONG
for i in range(1, n + 1):
  bit.update(i, 1)  # Just marks everything
  last_pos[arr[i]] = i

Problem: An element appearing at positions 2 and 5 would be counted twice. Fix: Always unmark the previous occurrence before marking the new one.

Mistake 3: Wrong Query Sorting

# WRONG - Sorting by left endpoint
indexed_queries.sort(key=lambda x: x[0])  # Sorted by l

Problem: We need to process by right endpoint to maintain the BIT invariant correctly. Fix: Sort queries by right endpoint r.

Mistake 4: Off-by-One in BIT Indexing

# WRONG - Using 0-indexed in BIT
bit.update(0, 1)  # BIT is 1-indexed!

Problem: BIT operations assume 1-indexed positions. Fix: Use 1-indexed positions throughout, convert array access carefully.

Edge Cases

Case	Input	Expected Output	Why
Single element	`n=1, arr=[5], query=(1,1)`	1	One element is always distinct
All same	`n=5, arr=[3,3,3,3,3], query=(1,5)`	1	All elements are the same value
All different	`n=5, arr=[1,2,3,4,5], query=(1,5)`	5	Each element is unique
Large values	`arr=[10^9, 10^9-1]`	Works	Hash map handles large keys
Single query full range	`query=(1,n)`	Count of unique in entire array	Maximum range
Many overlapping queries	All queries are (1, n)	Same answer for all	Offline still works

When to Use This Pattern

Use Offline + BIT When:

Queries are given upfront (not online/streaming)
Need O(log n) per query after sorting
Can afford O(n log n) preprocessing
Space for storing all queries is acceptable

Use Mo’s Algorithm When:

Add/remove operations are O(1)
O(sqrt(n)) per query is acceptable
Problem involves counting or frequency tracking
Cannot use offline + BIT due to problem structure

Don’t Use When:

Queries must be answered online (as they arrive)
Updates to the array are interleaved with queries
Need sub-linear query time without preprocessing

Pattern Recognition Checklist:

Static array with multiple range queries? -> Consider offline processing
Counting/frequency in ranges? -> Offline + BIT or Mo’s algorithm
Non-decomposable range function? -> Cannot use standard segment tree
Need to track “first/last occurrence” in range? -> Offline + BIT

Easier (Do These First)

Problem	Why It Helps
Static Range Sum Queries (CSES)	Basic range queries with prefix sums
Range Minimum Queries (CSES)	Sparse table for static RMQ

Similar Difficulty

Problem	Key Difference
Distinct Values (CSES)	Count subarrays with at most k distinct (sliding window)
Range Update Queries (CSES)	BIT with lazy propagation
K-query (SPOJ)	Count elements greater than k in range

Harder (Do These After)

Problem	New Concept
Distinct Values Queries with Updates	Handle array modifications
Count on a Tree (SPOJ)	HLD + persistent segment tree
Range Distinct Query (D-Query SPOJ)	Classic offline + BIT

Key Takeaways

The Core Idea: Transform “count distinct” into “count rightmost occurrences” using offline processing
Time Optimization: O(q * n) -> O((n + q) * log n) by processing queries in sorted order
Space Trade-off: O(n + q) space for BIT and query storage
Pattern: Offline query processing - when online is impossible, sort queries cleverly

Practice Checklist

Before moving on, make sure you can:

Explain why segment trees cannot directly solve this problem
Implement offline + BIT solution from scratch
Trace through the “last occurrence” invariant manually
Recognize when Mo’s algorithm is a better choice
Handle coordinate compression if values are very large

Additional Resources

CP-Algorithms: Fenwick Tree
CP-Algorithms: Mo’s Algorithm
CSES Distinct Values Queries - Count distinct in range

Distinct Values Queries

Problem Overview

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Breaking Down the Problem

The Key Insight: Last Occurrence Tracking

Solution 1: Brute Force

Idea

Algorithm

Code

Complexity

Why This Works (But Is Slow)

Solution 2: Offline Processing + BIT (Optimal)

Key Insight

Why This Works

Algorithm

Dry Run Example

Visual Diagram

Code (Python)

Complexity

Solution 3: Mo’s Algorithm (Alternative)

Key Insight

Algorithm

Code (Python)

Complexity (Mo’s Algorithm)

Common Mistakes

Mistake 1: Processing Queries Online

Mistake 2: Forgetting to Unmark Previous Occurrence

Mistake 3: Wrong Query Sorting

Mistake 4: Off-by-One in BIT Indexing

Edge Cases

When to Use This Pattern

Use Offline + BIT When:

Use Mo’s Algorithm When:

Don’t Use When:

Pattern Recognition Checklist:

Related Problems

Easier (Do These First)

Similar Difficulty

Harder (Do These After)

Key Takeaways

Practice Checklist

Additional Resources