Subarrays with K Distinct Values
Problem Overview
| Attribute | Value |
|---|---|
| Difficulty | Medium |
| Category | Sliding Window / Two Pointers |
| Time Limit | 1 second |
| Key Technique | atMost(k) - atMost(k-1) Decomposition |
| Similar To | LeetCode 992 |
Learning Goals
After solving this problem, you will be able to:
- Apply the atMost(k) - atMost(k-1) technique to count exact occurrences
- Implement sliding window with hash map for distinct element tracking
- Recognize when “exactly k” problems need decomposition into “at most k”
- Count subarrays efficiently using the right pointer contribution formula
Problem Statement
Problem: Given an array of integers, count the number of contiguous subarrays that contain exactly k distinct values.
Input:
- Line 1: n (array size) and k (required distinct count)
- Line 2: n space-separated integers
Output:
- Single integer: count of subarrays with exactly k distinct values
Constraints:
- 1 <= n <= 2 * 10^5
- 1 <= k <= n
- 1 <= arr[i] <= 10^9
Example
Input:
5 2
1 2 1 2 3
Output:
7
Explanation: The 7 subarrays with exactly 2 distinct values are:
- [1,2] (indices 0-1)
- [2,1] (indices 1-2)
- [1,2] (indices 2-3)
- [2,3] (indices 3-4)
- [1,2,1] (indices 0-2)
- [2,1,2] (indices 1-3)
- [1,2,1,2] (indices 0-3)
Intuition: How to Think About This Problem
Pattern Recognition
Key Question: Why can’t we directly count subarrays with exactly k distinct values using a simple sliding window?
The problem is that “exactly k” doesn’t have a monotonic property. When you expand a window, you might go from k-1 to k distinct (good) or from k to k+1 (bad). There’s no clean way to know when to shrink.
The Breakthrough Insight
Exactly(k) = AtMost(k) - AtMost(k-1)
This decomposition works because:
- AtMost(k) counts subarrays with 1, 2, …, or k distinct values
- AtMost(k-1) counts subarrays with 1, 2, …, or k-1 distinct values
- The difference gives us subarrays with EXACTLY k distinct values
Why AtMost(k) Is Easier
“At most k” has a beautiful monotonic property:
- If a window has <= k distinct values, ALL its subarrays also have <= k distinct
- This means we can count all valid subarrays ending at each position
Solution 1: Brute Force
Idea
Check every possible subarray and count distinct elements using a set.
Code
def count_subarrays_brute(arr, k):
"""
Brute force: check all subarrays.
Time: O(n^2), Space: O(n)
"""
n = len(arr)
count = 0
for i in range(n):
distinct = set()
for j in range(i, n):
distinct.add(arr[j])
if len(distinct) == k:
count += 1
elif len(distinct) > k:
break # Optimization: no point continuing
return count
Complexity
| Metric | Value | Explanation |
|---|---|---|
| Time | O(n^2) | Nested loops, early break helps average case |
| Space | O(n) | Set can hold up to n distinct values |
Solution 2: Optimal - AtMost Decomposition
Key Insight
The Trick: Convert “exactly k” into “at most k” minus “at most k-1”
The AtMost Function
For a window ending at index right with left as the leftmost valid position:
- Number of valid subarrays ending at
right=right - left + 1 - This counts: [left..right], [left+1..right], …, [right..right]
Algorithm
- Implement
atMost(k)using sliding window - Return
atMost(k) - atMost(k-1)
Dry Run Example
Input: arr = [1, 2, 1, 2, 3], k = 2
Computing atMost(2):
right=0: arr[0]=1, window=[1], distinct=1, count += 1
Subarrays: [1]
right=1: arr[1]=2, window=[1,2], distinct=2, count += 2
Subarrays: [2], [1,2]
right=2: arr[2]=1, window=[1,2,1], distinct=2, count += 3
Subarrays: [1], [2,1], [1,2,1]
right=3: arr[3]=2, window=[1,2,1,2], distinct=2, count += 4
Subarrays: [2], [1,2], [2,1,2], [1,2,1,2]
right=4: arr[4]=3, window=[1,2,1,2,3], distinct=3 > k!
Shrink: remove arr[0]=1, window=[2,1,2,3], distinct=3
Shrink: remove arr[1]=2, window=[1,2,3], distinct=3
Shrink: remove arr[2]=1, window=[2,3], distinct=2, count += 2
Subarrays: [3], [2,3]
atMost(2) = 1 + 2 + 3 + 4 + 2 = 12
Computing atMost(1):
right=0: count += 1 (window [1])
right=1: distinct=2, shrink to [2], count += 1
right=2: distinct=2, shrink to [1], count += 1
right=3: distinct=2, shrink to [2], count += 1
right=4: distinct=2, shrink to [3], count += 1
atMost(1) = 5
Result: Exactly(2) = atMost(2) - atMost(1) = 12 - 5 = 7
Visual Diagram
Array: [1, 2, 1, 2, 3] k = 2
AtMost(2) subarrays:
[1] [2] [1] [2] [3] <- length 1: 5
[1,2] [2,1] [1,2] [2,3] <- length 2: 4
[1,2,1] [2,1,2] <- length 3: 2
[1,2,1,2] <- length 4: 1
Total: 12
AtMost(1) subarrays:
[1] [2] [1] [2] [3] <- length 1: 5
Total: 5
Exactly(2) = 12 - 5 = 7
Code
Python:
def subarrays_with_k_distinct(arr, k):
"""
Count subarrays with exactly k distinct values.
Time: O(n), Space: O(k)
"""
def at_most(k):
if k < 0:
return 0
count = {}
left = 0
result = 0
for right in range(len(arr)):
# Add right element
count[arr[right]] = count.get(arr[right], 0) + 1
# Shrink window if too many distinct
while len(count) > k:
count[arr[left]] -= 1
if count[arr[left]] == 0:
del count[arr[left]]
left += 1
# Count all valid subarrays ending at right
result += right - left + 1
return result
return at_most(k) - at_most(k - 1)
# Alternative: Find longest subarray with exactly k distinct
def longest_with_k_distinct(arr, k):
"""
Find length of longest subarray with at most k distinct.
Time: O(n), Space: O(k)
"""
count = {}
left = 0
max_len = 0
for right in range(len(arr)):
count[arr[right]] = count.get(arr[right], 0) + 1
while len(count) > k:
count[arr[left]] -= 1
if count[arr[left]] == 0:
del count[arr[left]]
left += 1
max_len = max(max_len, right - left + 1)
return max_len
Complexity
| Metric | Value | Explanation |
|---|---|---|
| Time | O(n) | Two passes through array, each element visited at most twice per pass |
| Space | O(k) | Hash map stores at most k+1 distinct elements |
Common Mistakes
Mistake 1: Forgetting to Handle k-1 Edge Case
# WRONG - crashes when k = 0
def at_most(k):
# ... code assumes k >= 0
# CORRECT
def at_most(k):
if k < 0:
return 0
# ... rest of code
Problem: When k=1, we call atMost(0), which should return 0. Fix: Add base case check at the start.
Mistake 2: Counting Logic Error
# WRONG - only counts when exactly k distinct
if len(count) == k:
result += right - left + 1
# CORRECT - counts when at most k distinct
# No condition needed; the while loop ensures len(count) <= k
result += right - left + 1
Problem: atMost must count ALL valid windows, not just those with exactly k.
Mistake 3: Not Deleting Zero-Count Entries
# WRONG - map size stays inflated
count[arr[left]] -= 1
left += 1
# CORRECT - remove entry when count reaches 0
count[arr[left]] -= 1
if count[arr[left]] == 0:
del count[arr[left]]
left += 1
Problem: Hash map size becomes unreliable if zero-count entries remain.
Edge Cases
| Case | Input | Expected | Why |
|---|---|---|---|
| k = 1 | [1,1,1], k=1 |
6 | All subarrays valid: 3+2+1=6 |
| k = n | [1,2,3], k=3 |
1 | Only full array has all 3 |
| All same | [5,5,5], k=2 |
0 | Only 1 distinct exists |
| k > distinct count | [1,2], k=3 |
0 | Impossible |
| Single element | [7], k=1 |
1 | Only [7] |
When to Use This Pattern
Use AtMost Decomposition When:
- Counting subarrays with exactly k of something
- Direct “exactly k” window doesn’t have monotonic property
- The “at most” version has clean shrinking condition
Pattern Recognition Checklist:
- Problem asks for “exactly k distinct”? -> atMost(k) - atMost(k-1)
- Problem asks for “at most k distinct”? -> Single sliding window
- Problem asks for “at least k distinct”? -> Total - atMost(k-1)
Common Applications:
- Subarrays with exactly k distinct elements
- Substrings with exactly k unique characters
- Windows with exactly k types/categories
Related Problems
Easier (Do These First)
| Problem | Why It Helps |
|---|---|
| LeetCode 3: Longest Substring Without Repeating | Basic sliding window for distinct elements |
| LeetCode 219: Contains Duplicate II | Sliding window with hash map |
Similar Difficulty
| Problem | Key Difference |
|---|---|
| LeetCode 992: Subarrays with K Different Integers | Same problem |
| LeetCode 340: Longest Substring with At Most K Distinct | Find max length instead of count |
| LeetCode 904: Fruit Into Baskets | At most 2 distinct (k=2) |
Harder (Do These After)
| Problem | New Concept |
|---|---|
| LeetCode 1248: Count Nice Subarrays | Exactly k odd numbers |
| LeetCode 930: Binary Subarrays With Sum | Exactly k ones |
| CSES: Playlist | Longest unique subsequence |
Key Takeaways
- The Core Idea: Transform “exactly k” into subtraction of two “at most” counts
- Why It Works: “At most k” has monotonic property; “exactly k” doesn’t
- Counting Formula: For window [left, right], count = right - left + 1
- Space Trade-off: O(k) space for O(n) time vs O(n^2) brute force
Practice Checklist
Before moving on, make sure you can:
- Explain why exactly(k) = atMost(k) - atMost(k-1)
- Implement atMost function from scratch
- Handle edge case when k = 1 (calls atMost(0))
- Apply this pattern to “exactly k odd numbers” variant
- Solve in under 15 minutes without reference