Strongly Connected Components

Problem Overview

Aspect	Details
Problem	Find all SCCs in a directed graph and label each node
Input	n nodes, m directed edges
Output	Number of SCCs and component ID for each node
Constraints	1 <= n <= 100,000 and 1 <= m <= 200,000
Time Limit	1 second
Difficulty	Hard

Learning Goals

By completing this problem, you will understand:

SCC Definition: What makes a strongly connected component
Kosaraju’s Algorithm: Two-pass DFS approach for finding SCCs
Graph Reversal: Why reversing edges helps identify components
Finish Time Ordering: How DFS finish times reveal SCC structure

Problem Statement

Given a directed graph with n nodes and m edges, find:

The number of strongly connected components
For each node, which component it belongs to

Example Input:

n=7, m=8
Edges: 1->2, 2->3, 3->1, 3->4, 4->5, 5->6, 6->4, 7->6

Example Output:

3 components
Node labels: [1, 1, 1, 2, 2, 2, 3]

What is a Strongly Connected Component?

A Strongly Connected Component (SCC) is a maximal set of vertices where every vertex is reachable from every other vertex within that set.

Key properties:

For any two nodes u and v in the same SCC: u can reach v AND v can reach u
“Maximal” means we cannot add any more nodes while maintaining this property
Every directed graph can be uniquely partitioned into SCCs

Example: Which nodes form SCCs?

    1 --> 2
    ^     |
    |     v
    +---- 3 --> 4 --> 5
              ^     |
              |     v
              +---- 6 <-- 7

SCC 1: {1, 2, 3}  - All nodes can reach each other via the cycle
SCC 2: {4, 5, 6}  - All nodes can reach each other via the cycle
SCC 3: {7}        - Single node (can only reach itself trivially)

Kosaraju’s Algorithm

Kosaraju’s algorithm finds all SCCs in O(V + E) time using two DFS passes.

Algorithm Overview

KOSARAJU'S ALGORITHM:

Pass 1: DFS on original graph
        Record finish times (use a stack)

Pass 2: DFS on REVERSED graph
        Process nodes in decreasing finish time order
        Each DFS tree = one SCC

Step-by-Step Explanation

Pass 1: Record Finish Times

Run DFS on the original graph
When a node finishes (all descendants explored), push it to a stack
The stack now contains nodes in order of decreasing finish time (top = latest finish)

Pass 2: Find SCCs on Reversed Graph

Reverse all edges in the graph
Pop nodes from stack (decreasing finish time order)
Run DFS from each unvisited node on the reversed graph
All nodes reached in one DFS form one SCC

Why Does Kosaraju’s Algorithm Work?

The key insight is about finish times and reachability.

INTUITION:

If node A finishes after node B in Pass 1:
  - Either A can reach B (A started before B and explored B)
  - Or A and B are in different parts of the graph

In the REVERSED graph:
  - If B could reach A in original, now A can reach B
  - Nodes in same SCC can still reach each other (cycles reverse to cycles)

Processing by decreasing finish time ensures:
  - We start DFS from a "root" of an SCC
  - We cannot escape to another SCC (edges going out become edges coming in)

Formal reasoning:

If u and v are in the same SCC, they remain connected in the reversed graph
If u finishes after v and they are in different SCCs, then in the reversed graph, v cannot reach u
Therefore, starting from nodes with highest finish times isolates each SCC

Visual Diagram: Original vs Reversed Graph

ORIGINAL GRAPH:                    REVERSED GRAPH:

    1 -----> 2                         1 <----- 2
    ^        |                         |        ^
    |        v                         v        |
    +------- 3 -----> 4                +------- 3 <----- 4
                      |                                  ^
                      v                                  |
                      5 -----> 6                         5 <----- 6
                      ^        |                         |        ^
                      |        v                         v        |
                      +------- 7                         +------- 7

Pass 1 on Original:                Pass 2 on Reversed:
DFS order fills stack              Pop from stack, DFS on reversed

Finish order (bottom to top):      Processing order:
Stack: [7, 6, 5, 4, 3, 2, 1]      1 -> finds {1,2,3}
       (1 finishes last)          4 -> finds {4,5,7}
                                  6 -> finds {6}

Result: 3 SCCs: {1,2,3}, {4,5,7}, {6}

Dry Run Example

Let us trace through with a concrete example:

Graph: n=6, edges: 1->2, 2->3, 3->1, 2->4, 4->5, 5->6, 6->4

Adjacency list (original):
1: [2]
2: [3, 4]
3: [1]
4: [5]
5: [6]
6: [4]

PASS 1: DFS on original, record finish times
----------------------------------------
Start DFS from node 1:
  Visit 1 -> Visit 2 -> Visit 3 -> (1 already visited)
           -> Visit 4 -> Visit 5 -> Visit 6 -> (4 already visited)
                         Finish 6, push to stack
                       Finish 5, push to stack
                     Finish 4, push to stack
           Finish 3, push to stack
         Finish 2, push to stack
       Finish 1, push to stack

Stack (top to bottom): [1, 2, 3, 4, 5, 6]

PASS 2: DFS on reversed graph in stack order
--------------------------------------------
Reversed adjacency list:
1: [3]
2: [1]
3: [2]
4: [6, 2]
5: [4]
6: [5]

Pop 1: DFS from 1 on reversed graph
  Visit 1 -> Visit 3 -> Visit 2 -> (1 already visited)
  SCC #1 = {1, 2, 3}

Pop 2: Already visited, skip
Pop 3: Already visited, skip

Pop 4: DFS from 4 on reversed graph
  Visit 4 -> Visit 6 -> Visit 5 -> (4 already visited)
  SCC #2 = {4, 5, 6}

Pop 5: Already visited, skip
Pop 6: Already visited, skip

RESULT: 2 SCCs
  Component 1: {1, 2, 3}
  Component 2: {4, 5, 6}

Python Implementation

import sys
from collections import defaultdict

sys.setrecursionlimit(200001)

def find_sccs(n: int, edges: list) -> tuple:
  """
  Find strongly connected components using Kosaraju's algorithm.

  Args:
    n: Number of nodes (1-indexed)
    edges: List of (u, v) directed edges

  Returns:
    (num_sccs, component_id) where component_id[i] is the SCC of node i
  """
  # Build adjacency lists
  graph = defaultdict(list)      # Original graph
  reversed_graph = defaultdict(list)  # Reversed graph

  for u, v in edges:
    graph[u].append(v)
    reversed_graph[v].append(u)

  # Pass 1: DFS on original graph, record finish order
  visited = [False] * (n + 1)
  finish_stack = []

  def dfs_pass1(node):
    visited[node] = True
    for neighbor in graph[node]:
      if not visited[neighbor]:
        dfs_pass1(neighbor)
    finish_stack.append(node)

  for node in range(1, n + 1):
    if not visited[node]:
      dfs_pass1(node)

  # Pass 2: DFS on reversed graph in decreasing finish time order
  visited = [False] * (n + 1)
  component_id = [0] * (n + 1)
  current_scc = 0

  def dfs_pass2(node, scc_id):
    visited[node] = True
    component_id[node] = scc_id
    for neighbor in reversed_graph[node]:
      if not visited[neighbor]:
        dfs_pass2(neighbor, scc_id)

  while finish_stack:
    node = finish_stack.pop()
    if not visited[node]:
      current_scc += 1
      dfs_pass2(node, current_scc)

  return current_scc, component_id

# Read input
n, m = map(int, input().split())
edges = []
for _ in range(m):
  u, v = map(int, input().split())
  edges.append((u, v))

# Find SCCs
num_sccs, component_id = find_sccs(n, edges)

# Output
print(num_sccs)
print(' '.join(map(str, component_id[1:])))

Brief Overview: Tarjan’s Algorithm

Tarjan’s algorithm finds SCCs in a single DFS pass using low-link values.

TARJAN'S ALGORITHM (Single Pass):

For each node, track:
  - discovery_time[v]: When v was first visited
  - low_link[v]: Smallest discovery time reachable from v's subtree

Key idea:
  - A node v is the "root" of an SCC if low_link[v] == discovery_time[v]
  - Use a stack to track current SCC candidates
  - When we find a root, pop all nodes until v from stack -> one SCC

Advantage: Single pass O(V + E)
Disadvantage: More complex to understand and implement

When to use which:

Kosaraju’s: Easier to understand, good for learning
Tarjan’s: Slightly more efficient (single pass), preferred in competitions

Common Mistakes

Mistake	Problem	Solution
Forgetting to reverse the graph	Pass 2 runs on original graph, gives wrong SCCs	Build reversed adjacency list explicitly
Wrong order in Pass 2	Processing nodes in wrong order breaks the algorithm	Use stack from Pass 1, pop in order
0-indexed vs 1-indexed confusion	Off-by-one errors, missing nodes	Be consistent, CSES uses 1-indexed
Stack overflow on large graphs	Recursion limit exceeded	Increase recursion limit or use iterative DFS
Not handling disconnected graphs	Missing some nodes entirely	Start DFS from all unvisited nodes in Pass 1

Applications of SCCs

1. 2-SAT (Boolean Satisfiability)

Model implications as directed graph
Variables in same SCC must have same truth value
If x and NOT x in same SCC, no solution exists

2. DAG of SCCs (Condensation Graph)

Contract each SCC to a single node
Result is a DAG (no cycles between SCCs)
Useful for problems requiring topological ordering

3. Reachability Queries

Nodes in same SCC can all reach each other
Condense to DAG, then answer reachability on DAG

4. Finding Bridges and Articulation Points

Related concepts in connectivity analysis
SCC algorithms share ideas with bridge-finding

Complexity Analysis

Aspect	Kosaraju’s	Tarjan’s
Time	O(V + E)	O(V + E)
Space	O(V + E)	O(V)
DFS Passes	2	1
Extra Space	Reversed graph	Stack + arrays

Both algorithms are optimal for finding SCCs. Choose based on implementation preference.

Summary

SCC = Maximal set where every node reaches every other node
Kosaraju’s Algorithm: Two-pass DFS approach
- Pass 1: Record finish times on original graph
- Pass 2: DFS on reversed graph in decreasing finish time order
Key insight: Finish time ordering + graph reversal isolates SCCs
Complexity: O(V + E) time and space
Applications: 2-SAT, condensation graphs, reachability analysis