Distinct Colors

Problem Overview

Attribute	Value
Difficulty	Medium
Category	Tree Algorithms
Time Limit	1 second
Key Technique	Small-to-Large Merging (DSU on Tree)
CSES Link	Distinct Colors

Learning Goals

After solving this problem, you will be able to:

Apply small-to-large merging technique for efficient set operations on trees
Understand why merging smaller sets into larger sets gives O(n log n) complexity
Implement DSU on tree pattern for subtree aggregation problems
Recognize when set merging problems can be optimized with this technique

Problem Statement

Problem: Given a tree with n nodes where each node has a color, find the number of distinct colors in each node’s subtree (including the node itself).

Input:

Line 1: n (number of nodes)
Line 2: n space-separated integers - colors of nodes 1 to n
Next n-1 lines: edges (u, v) defining the tree

Output:

n space-separated integers: distinct color count for each node’s subtree

Constraints:

1 <= n <= 2 x 10^5
1 <= color[i] <= 10^9

Example

Input:
5
2 3 2 5 3
1 2
1 3
3 4
3 5

Output:
3 1 3 1 1

Explanation:

Tree Structure (node: color):
       1(2)
      /    \
    2(3)   3(2)
          /    \
        4(5)   5(3)

Subtree Analysis:
- Node 1: subtree contains all 5 nodes
  colors = {2, 3, 2, 5, 3} -> distinct = {2, 3, 5} -> count = 3
- Node 2: subtree = {2} only (leaf node)
  colors = {3} -> distinct = {3} -> count = 1
- Node 3: subtree = {3, 4, 5}
  colors = {2, 5, 3} -> distinct = {2, 3, 5} -> count = 3
- Node 4: subtree = {4} only (leaf node)
  colors = {5} -> distinct = {5} -> count = 1
- Node 5: subtree = {5} only (leaf node)
  colors = {3} -> distinct = {3} -> count = 1

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How do we efficiently count distinct colors in every subtree without recomputing from scratch?

The naive approach of running DFS from each node takes O(n^2) time. The key insight is that when we finish processing a subtree, we already have all its colors collected - we should reuse this information for the parent.

Breaking Down the Problem

What are we looking for? Distinct color count in each subtree
What information do we have? Tree structure and node colors
What’s the relationship? Parent’s distinct colors = union of all children’s distinct colors + parent’s own color

The Small-to-Large Insight

When merging sets, always merge the smaller set into the larger set. This ensures each element is moved at most O(log n) times across all operations.

Why O(log n)? Each time an element is moved, the set it joins is at least twice as large as its previous set. So an element can be moved at most log2(n) times before the set has size n.

Solution 1: Brute Force

Idea

For each node, perform DFS to collect all colors in its subtree and count distinct colors.

Algorithm

Build adjacency list from edges
For each node, run DFS to collect all colors in subtree
Use a set to count distinct colors

Code

def solve_brute_force(n, colors, edges):
  """
  Brute force: DFS from each node.

  Time: O(n^2)
  Space: O(n)
  """
  from collections import defaultdict

  graph = defaultdict(list)
  for u, v in edges:
    graph[u].append(v)
    graph[v].append(u)

  def collect_colors(node, parent):
    result = {colors[node - 1]}
    for child in graph[node]:
      if child != parent:
        result.update(collect_colors(child, node))
    return result

  # For each node, collect subtree colors
  result = []
  for node in range(1, n + 1):
    subtree_colors = collect_colors(node, -1)
    result.append(len(subtree_colors))

  return result

Complexity

Metric	Value	Explanation
Time	O(n^2)	DFS from each of n nodes, each DFS is O(n)
Space	O(n)	Recursion stack and color set

Why This Works (But Is Slow)

Each DFS correctly collects all colors but we’re redoing work. The subtree of node 1 includes all other subtrees - we’re computing overlapping information repeatedly.

Solution 2: Optimal - Small-to-Large Merging

Key Insight

The Trick: Process children first, then merge their color sets into the parent. Always merge smaller sets into larger sets to minimize total operations.

Why Small-to-Large Works

Consider what happens when we merge sets:

If we always merge smaller into larger, each element can only be in a set that doubles in size
An element starts in a set of size 1, and the maximum size is n
So each element is moved at most log2(n) times
Total operations: O(n log n)

Algorithm

Build adjacency list
DFS from root (node 1), processing children before parent (post-order)
For each node:
- Start with a set containing only its own color
- Get color sets from all children
- Merge all sets, always merging smaller into larger
- Record the distinct count
Return the color set to parent for further merging

Dry Run Example

Input: n=5, colors=[2,3,2,5,3], edges=[(1,2),(1,3),(3,4),(3,5)]

Tree:
       1(2)
      /    \
    2(3)   3(2)
          /    \
        4(5)   5(3)

DFS Execution (post-order):

Step 1: Visit node 2 (leaf)
  - Initialize: set_2 = {3}
  - No children to merge
  - result[2] = 1
  - Return {3} to parent

Step 2: Visit node 4 (leaf)
  - Initialize: set_4 = {5}
  - No children to merge
  - result[4] = 1
  - Return {5} to parent

Step 3: Visit node 5 (leaf)
  - Initialize: set_5 = {3}
  - No children to merge
  - result[5] = 1
  - Return {3} to parent

Step 4: Visit node 3
  - Initialize: set_3 = {2}
  - Children: 4 returns {5}, 5 returns {3}
  - Merge {5} into {2}: set_3 = {2,5}
  - Merge {3} into {2,5}: set_3 = {2,5,3}
  - result[3] = 3
  - Return {2,5,3} to parent

Step 5: Visit node 1
  - Initialize: set_1 = {2}
  - Children: 2 returns {3}, 3 returns {2,5,3}
  - Compare sizes: |{3}|=1, |{2}|=1, |{2,5,3}|=3
  - Merge {2} into {2,5,3}: set_1 = {2,5,3} (2 already present)
  - Merge {3} into {2,5,3}: set_1 = {2,5,3} (3 already present)
  - result[1] = 3
  - Done!

Final answer: [3, 1, 3, 1, 1]

Visual Diagram

Processing Order (bottom-up):

       1(2)          Step 5: Merge all -> {2,3,5} = 3
      /    \
    2(3)   3(2)      Step 1: {3}=1    Step 4: {2,3,5}=3
          /    \
        4(5)   5(3)  Step 2: {5}=1    Step 3: {3}=1

Merging at Node 3:
  {2} <- {5} <- {3}

  {2}     : size 1
  + {5}   : {2,5}, size 2
  + {3}   : {2,5,3}, size 3

Code (Python)

import sys
from collections import defaultdict

def solve(n, colors, edges):
  """
  Small-to-large merging solution.

  Time: O(n log n)
  Space: O(n)
  """
  sys.setrecursionlimit(300000)

  graph = defaultdict(list)
  for u, v in edges:
    graph[u].append(v)
    graph[v].append(u)

  result = [0] * (n + 1)

  def dfs(node, parent):
    # Start with current node's color
    color_set = {colors[node - 1]}

    for child in graph[node]:
      if child != parent:
        child_set = dfs(child, node)

        # Small-to-large: always merge smaller into larger
        if len(color_set) < len(child_set):
          color_set, child_set = child_set, color_set

        color_set.update(child_set)

    result[node] = len(color_set)
    return color_set

  dfs(1, -1)
  return result[1:]


def main():
  input_data = sys.stdin.read().split()
  idx = 0

  n = int(input_data[idx])
  idx += 1

  colors = []
  for i in range(n):
    colors.append(int(input_data[idx]))
    idx += 1

  edges = []
  for _ in range(n - 1):
    u = int(input_data[idx])
    v = int(input_data[idx + 1])
    edges.append((u, v))
    idx += 2

  result = solve(n, colors, edges)
  print(' '.join(map(str, result)))


if __name__ == "__main__":
  main()

Complexity

Metric	Value	Explanation
Time	O(n log n)	Each element moved O(log n) times during merges
Space	O(n)	Total elements across all sets is O(n)

Common Mistakes

Mistake 1: Merging Large into Small

# WRONG - O(n^2) in worst case
for child in children:
  child_set = dfs(child, node)
  child_set.update(color_set)  # Merging large into small!
  color_set = child_set

Problem: If we merge the parent’s (larger) set into child’s (smaller) set, we lose the O(n log n) guarantee.

Fix: Always check sizes and merge the smaller set into the larger one.

Mistake 2: Not Handling 1-indexed vs 0-indexed

# WRONG - colors array is 0-indexed
color_set = {colors[node]}  # node is 1-indexed!

# CORRECT
color_set = {colors[node - 1]}  # Adjust for 0-indexed array

Mistake 3: Stack Overflow on Deep Trees

# WRONG - default recursion limit is ~1000 in Python
def dfs(node, parent):
  ...

# CORRECT - increase limit for large inputs
import sys
sys.setrecursionlimit(300000)

Edge Cases

Case	Input	Expected Output	Why
Single node	n=1, colors=[5]	1	Only one node with one color
All same color	n=3, colors=[1,1,1]	1 1 1	Every subtree has just color 1
All different colors	n=3, colors=[1,2,3]	3 1 1	Root sees all, leaves see 1
Linear tree (chain)	1-2-3-4-5	n n-1 … 1	Each node sees colors below
Star graph	Center connected to all	n 1 1 … 1	Center sees all, leaves see 1
Large color values	colors up to 10^9	-	Use set, not array for counting

When to Use This Pattern

Use Small-to-Large Merging When:

You need to aggregate information from subtrees to parents
The aggregation involves set union or similar merge operations
Naive merging would be O(n^2) due to repeated insertions

Don’t Use When:

You can use Euler tour + data structures (sometimes faster in practice)
The problem asks for specific queries (consider Heavy-Light Decomposition)
Simple DFS with counting suffices (no set operations needed)

Pattern Recognition Checklist:

Computing something for each subtree? -> Consider tree DP
Merging sets or collections from children? -> Consider small-to-large
Need O(n log n) instead of O(n^2)? -> Small-to-large is the key
Each element appears in exactly one set at any time? -> Good fit

DSU on Tree (Sack): Similar idea, but keeps heavy child’s data structure
Heavy-Light Decomposition: For path queries
Centroid Decomposition: For distance-related queries

Easier (Do These First)

Problem	Why It Helps
Subordinates (CSES)	Basic subtree counting without merging
Tree Diameter (CSES)	DFS on trees fundamentals

Similar Difficulty

Problem	Key Difference
Lomsat gelral (Codeforces)	Find most frequent color (uses same technique)
Tree Distances I (CSES)	Different aggregation (max distance)

Harder (Do These After)

Problem	New Concept
Path Queries II (CSES)	Heavy-Light Decomposition
Tree Queries (CSES)	Euler tour + segment tree

Key Takeaways

The Core Idea: Always merge smaller sets into larger sets to ensure O(n log n) total operations
Time Optimization: From O(n^2) naive merging to O(n log n) by tracking set sizes
Space Trade-off: O(n) total space - elements exist in exactly one set at a time
Pattern: DSU on Tree / Small-to-Large Merging for subtree aggregation problems

Practice Checklist

Before moving on, make sure you can:

Explain why small-to-large gives O(n log n) complexity
Implement the solution without looking at code
Handle both 0-indexed and 1-indexed variants
Recognize this pattern in new tree problems
Implement in both Python and C++ within 15 minutes

Additional Resources

CP-Algorithms: Small-to-Large
CF Blog: DSU on Tree
CSES Distinct Colors - Small-to-large merging

Distinct Colors

Problem Overview

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Breaking Down the Problem

The Small-to-Large Insight

Solution 1: Brute Force

Idea

Algorithm

Code

Complexity

Why This Works (But Is Slow)

Solution 2: Optimal - Small-to-Large Merging

Key Insight

Why Small-to-Large Works

Algorithm

Dry Run Example

Visual Diagram

Code (Python)

Complexity

Common Mistakes

Mistake 1: Merging Large into Small

Mistake 2: Not Handling 1-indexed vs 0-indexed

Mistake 3: Stack Overflow on Deep Trees

Edge Cases

When to Use This Pattern

Use Small-to-Large Merging When:

Don’t Use When:

Pattern Recognition Checklist:

Related Techniques

Related Problems

Easier (Do These First)

Similar Difficulty

Harder (Do These After)

Key Takeaways

Practice Checklist

Additional Resources