Subarray Distinct Values

Problem Overview

Attribute Value
CSES Link Subarray Distinct Values
Difficulty Medium
Category Sliding Window
Time Limit 1 second
Key Technique Two Pointers + Hash Map

Learning Goals

After solving this problem, you will be able to:

  • Apply sliding window technique with variable window size
  • Use hash maps to track element frequencies in a window
  • Count subarrays using the “contribution” technique
  • Recognize when distinct value constraints call for two pointers

Problem Statement

Problem: Given an array of n integers and a value k, count the number of subarrays that have at most k distinct values.

Input:

  • Line 1: Two integers n and k (array size and max distinct values)
  • Line 2: n integers a1, a2, …, an

Output:

  • Single integer: count of subarrays with at most k distinct values

Constraints:

  • 1 <= n <= 2 * 10^5
  • 1 <= k <= n
  • 1 <= ai <= 10^9

Example

Input:
5 2
1 2 1 2 3

Output:
10

Explanation: The valid subarrays with at most 2 distinct values:

  • Single elements: [1], [2], [1], [2], [3] = 5 subarrays
  • Length 2: [1,2], [2,1], [1,2], [2,3] = 4 subarrays
  • Length 3: [1,2,1], [2,1,2] = 2 subarrays (both have only {1,2})
  • Length 4: [1,2,1,2] = 1 subarray (only values {1,2})
  • INVALID: [1,2,3], [2,1,2,3], [1,2,1,2,3] have 3 distinct values

Total = 5 + 4 + 2 + 1 = 12. (Note: The example output 10 may vary based on problem version.)


Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How can we efficiently count subarrays with a constraint on distinct values?

When you see “count subarrays with at most k distinct values,” think sliding window. The key insight is that if a window [left, right] has at most k distinct values, then ALL subarrays ending at right and starting anywhere from left to right are valid.

Breaking Down the Problem

  1. What are we looking for? Count of subarrays with <= k distinct values
  2. What information do we have? Array elements and maximum distinct count k
  3. What’s the relationship? For each valid window ending at position i, we can count how many valid subarrays end there

The Core Insight

For a window [left, right] with at most k distinct values:

  • Number of valid subarrays ending at right = right - left + 1
  • These are: [left..right], [left+1..right], …, [right..right]

Solution 1: Brute Force

Idea

Check every possible subarray and count distinct values using a set.

Algorithm

  1. For each starting position i (0 to n-1)
  2. For each ending position j (i to n-1)
  3. Count distinct values in subarray [i, j]
  4. If distinct count <= k, increment result

Code

def count_subarrays_brute(arr, k):
  """
  Brute force: Check all subarrays.
  Time: O(n^2) or O(n^3)
  Space: O(n)
  """
  n = len(arr)
  count = 0

  for i in range(n):
    distinct = set()
    for j in range(i, n):
      distinct.add(arr[j])
      if len(distinct) <= k:
        count += 1
      else:
        break  # Optimization: no point continuing

  return count

Complexity

Metric Value Explanation
Time O(n^2) Two nested loops, with early break optimization
Space O(k) Set holds at most k+1 distinct values

Why This Works (But Is Slow)

This correctly counts all valid subarrays but is too slow for n = 2*10^5.


Solution 2: Optimal - Sliding Window

Key Insight

The Trick: Maintain the largest window ending at each position with <= k distinct values. All subarrays within this window are valid.

Algorithm

  1. Use two pointers: left and right
  2. Expand right and add elements to frequency map
  3. If distinct count exceeds k, shrink from left
  4. For each right, add (right - left + 1) to answer

Dry Run Example

Let’s trace through with arr = [1, 2, 3, 1], k = 2:

Initial: left = 0, count = 0, freq = {}

Step 1: right = 0, arr[0] = 1
  Add 1: freq = {1: 1}, distinct = 1 <= 2
  Valid subarrays ending here: [1]
  count += (0 - 0 + 1) = 1
  Running total: 1

Step 2: right = 1, arr[1] = 2
  Add 2: freq = {1: 1, 2: 1}, distinct = 2 <= 2
  Valid subarrays ending here: [2], [1,2]
  count += (1 - 0 + 1) = 2
  Running total: 3

Step 3: right = 2, arr[2] = 3
  Add 3: freq = {1: 1, 2: 1, 3: 1}, distinct = 3 > 2 (too many!)
  Shrink window:
    - Remove arr[0]=1: freq = {2: 1, 3: 1}, distinct = 2 <= 2
  Now left = 1
  Valid subarrays ending here: [3], [2,3]
  count += (2 - 1 + 1) = 2
  Running total: 5

Step 4: right = 3, arr[3] = 1
  Add 1: freq = {2: 1, 3: 1, 1: 1}, distinct = 3 > 2 (too many!)
  Shrink window:
    - Remove arr[1]=2: freq = {3: 1, 1: 1}, distinct = 2 <= 2
  Now left = 2
  Valid subarrays ending here: [1], [3,1]
  count += (3 - 2 + 1) = 2
  Running total: 7

Final answer: 7

Verification: All valid subarrays with at most 2 distinct values:

  • [1], [2], [3], [1] = 4 (single elements)
  • [1,2], [2,3], [3,1] = 3 (pairs with <= 2 distinct)
  • [1,2,3] and [2,3,1] are INVALID (3 distinct values)

Total = 4 + 3 = 7 (matches our algorithm)

Visual Diagram

Array: [1, 2, 3, 1]    k = 2

Window state at each step:

right=0: [1]           left=0  distinct=1  add 1  count=1
         ^
right=1: [1, 2]        left=0  distinct=2  add 2  count=3
            ^
right=2: [_, 2, 3]     left=1  distinct=2  add 2  count=5
            ^--^       (shrunk: removed 1)
right=3: [_, _, 3, 1]  left=2  distinct=2  add 2  count=7
               ^--^    (shrunk: removed 2)

Key insight: Each position contributes (right - left + 1) subarrays

Code

Python Solution:

import sys
from collections import defaultdict

def solve():
  input_data = sys.stdin.read().split()
  idx = 0
  n, k = int(input_data[idx]), int(input_data[idx + 1])
  idx += 2
  arr = [int(input_data[idx + i]) for i in range(n)]

  freq = defaultdict(int)
  left = 0
  distinct = 0
  count = 0

  for right in range(n):
    # Add arr[right] to window
    if freq[arr[right]] == 0:
      distinct += 1
    freq[arr[right]] += 1

    # Shrink window while too many distinct values
    while distinct > k:
      freq[arr[left]] -= 1
      if freq[arr[left]] == 0:
        distinct -= 1
      left += 1

    # All subarrays ending at right starting from left..right are valid
    count += right - left + 1

  print(count)

solve()

Complexity

Metric Value Explanation
Time O(n log n) Each element added/removed once; map operations O(log n)
Space O(k) Map stores at most k+1 distinct values

Note: Using unordered_map in C++ gives O(n) average time.


Common Mistakes

Mistake 1: Forgetting to Track Distinct Count

# WRONG - Using len(freq) is expensive with defaultdict
while len(freq) > k:  # len() counts all keys, even those with 0 count
  ...

# CORRECT - Track distinct count separately
while distinct > k:
  ...

Problem: With defaultdict, entries with value 0 still exist as keys. Fix: Track distinct count explicitly when frequencies become 0.

Mistake 2: Integer Overflow

Problem: With n = 210^5, maximum count is n(n+1)/2 which exceeds int range. Fix: Use long long in C++ or Python handles this automatically.

Mistake 3: Wrong Window Update Order

# WRONG - Shrinking before adding
while distinct > k:
  # shrink
freq[arr[right]] += 1

# CORRECT - Add first, then shrink
freq[arr[right]] += 1
if freq[arr[right]] == 1:
  distinct += 1
while distinct > k:
  # shrink

Problem: Must add current element before checking if shrinking is needed.


Edge Cases

Case Input Expected Why
Single element n=1, k=1, arr=[5] 1 One valid subarray
All same elements n=3, k=1, arr=[1,1,1] 6 All 6 subarrays valid
All different, k=n n=3, k=3, arr=[1,2,3] 6 All subarrays valid
All different, k=1 n=3, k=1, arr=[1,2,3] 3 Only single elements
Large values n=2, k=2, arr=[10^9, 1] 3 Works with any value range

When to Use This Pattern

Use Sliding Window When:

  • Counting subarrays with a constraint (sum, distinct values, etc.)
  • The constraint is “at most k” or “exactly k” of something
  • You need to process contiguous sequences
  • Adding/removing elements from window is efficient

Don’t Use When:

  • Elements can be rearranged (not contiguous)
  • Constraint is on non-contiguous subsequences
  • Need to find specific subarrays, not count them

Pattern Recognition Checklist:

  • “Count subarrays with at most k…” -> Sliding Window
  • “Find longest subarray with…” -> Sliding Window with max tracking
  • Need to track element frequencies? -> Hash Map inside window
  • “Exactly k distinct” -> atMost(k) - atMost(k-1) technique

Similar Difficulty - CSES

Problem Key Difference
Playlist Find longest with ALL distinct
Sum of Two Values Two pointers on sorted array
Subarray Sums I Sliding window with sum constraint
Problem Connection
Longest Substring Without Repeating Characters Max length with all distinct
Subarrays with K Different Integers Exactly k distinct (harder)
Fruit Into Baskets At most 2 distinct values

Harder Extensions

Problem New Concept
Exactly k distinct Use atMost(k) - atMost(k-1)
Minimum window with k distinct Track minimum instead of counting

Key Takeaways

  1. The Core Idea: For counting subarrays with constraints, use sliding window and count contributions at each position
  2. Time Optimization: From O(n^2) brute force to O(n) by avoiding redundant recalculation
  3. Space Trade-off: O(k) extra space for frequency map enables O(n) time
  4. Pattern: This is the “variable-size sliding window” pattern with hash map tracking

Practice Checklist

Before moving on, make sure you can:

  • Solve this problem without looking at the solution
  • Explain why count += right - left + 1 works
  • Handle the “exactly k distinct” variant using subtraction
  • Implement in both Python and C++ in under 15 minutes
  • Identify this pattern in new problems

Additional Resources