Lowest Common Ancestor (LCA) — Complete Guide
What Is LCA?
Given a rooted tree and two nodes u and v, their Lowest Common Ancestor is the deepest node that is an ancestor of both.
0
/ \
1 2
/ \ \
3 4 5
/ \
6 7
LCA(6, 4) = 1 (1 is the deepest node above both 6 and 4)
LCA(6, 7) = 3 (3 is parent of both)
LCA(6, 5) = 0 (have to go all the way to root)
LCA(3, 3) = 3 (a node is its own ancestor)
Why Does LCA Matter?
LCA is a building block for many tree problems:
| Problem | How LCA helps |
|---|---|
| Distance between u and v | depth[u] + depth[v] - 2*depth[LCA(u,v)] |
| Path sum/max/min on trees | Split path at LCA: u->LCA + LCA->v |
| HLD path queries | query_path internally finds LCA while climbing chains |
| Tree diff (version control) | Find common base of two branches |
| Phylogenetic trees (biology) | Find most recent common ancestor of species |
Technique 1: Brute Force (Climb Up)
Idea
Make u and v the same depth, then climb both up one step at a time until they meet.
LCA(6, 4):
Step 1: depth[6]=3, depth[4]=2 -> bring 6 up to depth 2
6 -> parent[6] = 3
Step 2: now both at depth 2: node 3 and node 4
3 != 4, climb both:
3 -> parent[3] = 1
4 -> parent[4] = 1
Step 3: both are node 1 -> LCA = 1
Implementation
def lca_brute(u, v, parent, depth):
# step 1: equalize depths
while depth[u] > depth[v]:
u = parent[u]
while depth[v] > depth[u]:
v = parent[v]
# step 2: climb together
while u != v:
u = parent[u]
v = parent[v]
return u
Complexity
| Time | Space | |
|---|---|---|
| Preprocess | O(N) -- one DFS for parent/depth | O(N) |
| Query | O(N) -- worst case climb entire tree | -- |
Verdict: Simple but too slow for many queries. A straight-line tree (path graph) always gives O(N).
Technique 2: Binary Lifting
Idea
Instead of climbing one step at a time, climb in powers of 2. Pre-compute up[node][k] = the ancestor 2^k steps above node.
up[node][0] = parent (1 step = 2^0)
up[node][1] = grandparent (2 steps = 2^1)
up[node][2] = great-great-grandpa (4 steps = 2^2)
...
up[node][k] = up[up[node][k-1]][k-1] (2^k = 2^(k-1) + 2^(k-1))
Think of it like how you’d jump 13 floors in an elevator: jump 8, then 4, then 1 (13 = 1101 in binary).
Walkthrough
Tree:
0 (depth 0)
|
1 (depth 1)
|
2 (depth 2)
|
3 (depth 3)
|
4 (depth 4)
up table:
k=0(1) k=1(2) k=2(4)
node 0: -1 -1 -1
node 1: 0 -1 -1
node 2: 1 0 -1
node 3: 2 1 -1
node 4: 3 2 0
LCA(4, 2):
depth[4]=4, depth[2]=2, diff=2
Climb 4 by 2 steps: up[4][1] = 2
Now both at node 2 -> LCA = 2
Implementation
import sys
sys.setrecursionlimit(200_000)
LOG = 20 # enough for N up to 10^6
class BinaryLifting:
def __init__(self, n, adj, root=0):
self.n = n
self.adj = adj
self.depth = [0] * n
self.up = [[-1] * n for _ in range(LOG)] # up[k][node]
self._dfs(root, -1)
self._build()
def _dfs(self, node, par):
self.up[0][node] = par # parent = 2^0 ancestor
for nb in self.adj[node]:
if nb == par:
continue
self.depth[nb] = self.depth[node] + 1
self._dfs(nb, node)
def _build(self):
"""Fill the binary lifting table bottom-up."""
for k in range(1, LOG):
for v in range(self.n):
mid = self.up[k - 1][v]
if mid == -1:
self.up[k][v] = -1
else:
self.up[k][v] = self.up[k - 1][mid]
def _lift(self, node, dist):
"""Climb `dist` steps from node."""
for k in range(LOG):
if dist & (1 << k):
node = self.up[k][node]
if node == -1:
return -1
return node
def lca(self, u, v):
# step 1: bring deeper node up
if self.depth[u] < self.depth[v]:
u, v = v, u
u = self._lift(u, self.depth[u] - self.depth[v])
# step 2: if same, we're done
if u == v:
return u
# step 3: climb both in powers of 2
for k in range(LOG - 1, -1, -1):
if self.up[k][u] != self.up[k][v]:
u = self.up[k][u]
v = self.up[k][v]
# now u and v are children of the LCA
return self.up[0][u]
Why Step 3 Works
LCA(6, 5) in this tree:
0
/ \
1 2
/ \ \
3 4 5
/ \
6 7
After equalizing depth: u=3, v=5 (both depth 2)
k=2: up[2][3]=? (4 steps up from 3 -- doesn't exist) = -1
up[2][5]=? = -1
Equal (-1 == -1), DON'T jump. (would overshoot)
k=1: up[1][3]=0, up[1][5]=0
Equal (0 == 0), DON'T jump. (would land ON the LCA)
k=0: up[0][3]=1, up[0][5]=2
Different! Jump: u=1, v=2
Now up[0][1] = up[0][2] = 0 -> LCA = 0
Key insight: We deliberately don’t jump when ancestors match. We want to land just below the LCA, then go up one final step. This avoids overshooting.
Complexity
| Time | Space | |
|---|---|---|
| Preprocess | O(N log N) | O(N log N) |
| Query | O(log N) | -- |
Verdict: The most popular technique. Great balance of simplicity and speed. Works online (queries arrive one by one).
Technique 3: Euler Tour + Sparse Table (RMQ)
Idea
Reduce LCA to a Range Minimum Query problem. Then solve RMQ with a sparse table for O(1) queries.
The reduction:
- Do an Euler Tour, recording nodes as you enter AND revisit them
- LCA(u, v) = the shallowest node between the first occurrence of u and v in the tour
Tree: 0
/ \
1 2
/ \
3 4
Euler Tour (record every visit):
Visit: 0 1 3 1 4 1 0 2 0
Depth: 0 1 2 1 2 1 0 1 0
Index: 0 1 2 3 4 5 6 7 8
First occurrence:
first[0]=0, first[1]=1, first[2]=7, first[3]=2, first[4]=4
LCA(3, 4):
first[3]=2, first[4]=4
Look at depths in range [2..4]: depths = [2, 1, 2]
Minimum depth = 1 at index 3 -> node 1
LCA = 1
LCA(3, 2):
first[3]=2, first[2]=7
Depths in [2..7]: [2, 1, 2, 1, 0, 1]
Minimum depth = 0 at index 6 -> node 0
LCA = 0
Why Does This Work?
Between visiting u for the first time and v for the first time, the DFS must pass through their LCA. The LCA is the shallowest point in that range because:
- DFS goes deeper into subtrees (away from LCA)
- The only way to get from u’s subtree to v’s subtree is to come back up through LCA
Implementation
import sys
sys.setrecursionlimit(200_000)
class SparseTable:
"""Range Minimum Query in O(1) with O(N log N) preprocessing."""
def __init__(self, arr):
n = len(arr)
self.log = [0] * (n + 1)
for i in range(2, n + 1):
self.log[i] = self.log[i // 2] + 1
k = self.log[n] + 1
self.table = [list(range(n)) for _ in range(k)]
# table[j][i] = index of minimum in arr[i..i+2^j-1]
self.arr = arr
for j in range(1, k):
for i in range(n - (1 << j) + 1):
l = self.table[j - 1][i]
r = self.table[j - 1][i + (1 << (j - 1))]
self.table[j][i] = l if arr[l] <= arr[r] else r
def query(self, l, r):
"""Index of minimum value in arr[l..r]."""
j = self.log[r - l + 1]
left = self.table[j][l]
right = self.table[j][r - (1 << j) + 1]
return left if self.arr[left] <= self.arr[right] else right
class LCA_EulerTour:
def __init__(self, n, adj, root=0):
self.n = n
self.adj = adj
self.depth = [0] * n
self.euler = [] # euler tour sequence
self.euler_depth = [] # depth at each position in tour
self.first = [0] * n # first occurrence of node in euler tour
self._dfs(root, -1)
self.sparse = SparseTable(self.euler_depth)
def _dfs(self, node, par):
self.first[node] = len(self.euler)
self.euler.append(node)
self.euler_depth.append(self.depth[node])
for nb in self.adj[node]:
if nb == par:
continue
self.depth[nb] = self.depth[node] + 1
self._dfs(nb, node)
# revisit current node after returning from child
self.euler.append(node)
self.euler_depth.append(self.depth[node])
def lca(self, u, v):
l = self.first[u]
r = self.first[v]
if l > r:
l, r = r, l
idx = self.sparse.query(l, r)
return self.euler[idx]
Complexity
| Time | Space | |
|---|---|---|
| Preprocess | O(N log N) -- Euler tour O(N) + sparse table O(N log N) | O(N log N) |
| Query | O(1) | -- |
Verdict: Fastest query time. Ideal when you have millions of queries and can afford the preprocessing.
Technique 4: HLD-Based LCA
Idea
If you already have HLD built, LCA falls out naturally -- it’s what query_path does when climbing chains!
def lca(self, u, v):
while self.chain_head[u] != self.chain_head[v]:
if self.depth[self.chain_head[u]] < self.depth[self.chain_head[v]]:
u, v = v, u
u = self.parent[self.chain_head[u]]
# same chain -- shallower node is the LCA
return u if self.depth[u] <= self.depth[v] else v
That’s it -- the same chain-climbing loop, but instead of querying the segment tree, you just return the shallower node at the end.
Complexity
| Time | Space | |
|---|---|---|
| Preprocess | O(N) -- two DFS passes | O(N) |
| Query | O(log N) | -- |
Verdict: Free if you already have HLD. No extra data structures needed.
Technique 5: Tarjan’s Offline LCA
Idea
If you have all queries upfront (offline), process them during a single DFS using Union-Find (DSU).
After finishing a subtree, union it with its parent.
When both nodes of a query are visited, the answer is
the current "find" representative of one of them.
Implementation
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]]
x = self.parent[x]
return x
def union(self, a, b):
a, b = self.find(a), self.find(b)
if a == b:
return
if self.rank[a] < self.rank[b]:
a, b = b, a
self.parent[b] = a
if self.rank[a] == self.rank[b]:
self.rank[a] += 1
def tarjan_lca(n, adj, root, queries):
"""
queries: list of (u, v, query_index)
Returns: list of LCA answers in query order
"""
dsu = DSU(n)
ancestor = list(range(n)) # ancestor[find(x)] = current LCA candidate
visited = [False] * n
answers = [0] * len(queries)
# group queries by node
query_map = [[] for _ in range(n)]
for u, v, idx in queries:
query_map[u].append((v, idx))
query_map[v].append((u, idx))
def dfs(node, par):
visited[node] = True
for nb in adj[node]:
if nb == par:
continue
dfs(nb, node)
dsu.union(node, nb)
ancestor[dsu.find(node)] = node # after union, node is the ancestor
# check queries involving this node
for other, idx in query_map[node]:
if visited[other]:
answers[idx] = ancestor[dsu.find(other)]
dfs(root, -1)
return answers
How It Works -- Traced
Tree: 0
/ \
1 2
/ \
3 4
Query: LCA(3, 4)
DFS order: 0 -> 1 -> 3 (leaf, backtrack) -> 4 (leaf, backtrack) -> 1 done -> 2
After visiting 3:
visited[3] = True
union(1, 3), ancestor[find(1)] = 1
Visit 4:
visited[4] = True
Check query (3, 4): visited[3]? Yes!
answer = ancestor[find(3)] = ancestor[find(1)] = 1
The key insight: when you visit node 4 and check node 3, the DSU has already merged 3 into 1’s set (because we backtracked from 3). So find(3) leads to 1, which is exactly the LCA.
Complexity
| Time | Space | |
|---|---|---|
| Total | O((N + Q) a(N)) ~ O(N + Q) | O(N + Q) |
Verdict: Optimal for offline. Nearly linear. But can’t handle queries that arrive one at a time.
All Techniques Compared
Preprocess Query Space Online? Extra DS
---------- ----- ----- ------- --------
Brute Force O(N) O(N) O(N) Yes None
Binary Lifting O(N log N) O(log N) O(N log N) Yes Lifting table
Euler + Sparse O(N log N) O(1) O(N log N) Yes Sparse table
HLD O(N) O(log N) O(N) Yes HLD arrays
Tarjan (offline) O(N+Q) O(1)* O(N+Q) No DSU
*Tarjan’s amortized over all queries
Which to Pick?
+-- Yes --- Have all queries upfront?
| |
| +- Yes --> Tarjan's offline (fastest total)
| +- No --> continue below
|
Start -------------+
|
+-- Online queries needed
|
+-----+-----+
| |
Few queries? Many queries?
(Q < N) (Q >> N)
| |
v v
Binary Lifting Euler + Sparse Table
(simple, log N) (O(1) per query)
Already have HLD? --> Use HLD's LCA (it's free)
LCA Variations & Applications
Variation 1: Distance Between Nodes
dist(u, v) = depth[u] + depth[v] - 2 * depth[LCA(u, v)]
0 (depth 0)
/ \
1 2 (depth 1)
/ \
3 4 (depth 2)
dist(3, 2):
LCA(3,2) = 0
dist = 2 + 1 - 2*0 = 3
path: 3 -> 1 -> 0 -> 2 (3 edges)
Variation 2: Kth Ancestor
“What is the ancestor K steps above node u?”
Binary lifting gives this directly:
def kth_ancestor(self, u, k):
for bit in range(LOG):
if k & (1 << bit):
u = self.up[bit][u]
if u == -1:
return -1
return u
Variation 3: Kth Node on Path
“What is the Kth node on the path from u to v?”
def kth_on_path(self, u, v, k):
lca = self.lca(u, v)
dist_u = self.depth[u] - self.depth[lca] # u to LCA
dist_v = self.depth[v] - self.depth[lca] # v to LCA
if k <= dist_u:
# on the u -> LCA side
return self.kth_ancestor(u, k)
else:
# on the LCA -> v side
return self.kth_ancestor(v, dist_u + dist_v - k)
Path: 6 -> 3 -> 1 -> 4, k=2 (0-indexed)
dist_u (6->LCA=1) = 2
dist_v (4->LCA=1) = 1
k=2 <= dist_u=2 -> kth_ancestor(6, 2) = 1
k=3 > dist_u=2 -> kth_ancestor(4, 2+1-3) = kth_ancestor(4, 0) = 4
Variation 4: Path Max/Min/Sum
“What is the maximum edge weight on the path from u to v?”
Split at LCA and query each half:
Path u -> v splits into: u -> LCA -> v
max_on_path(u, v) = max(
max_on_path(u, LCA), // climbing from u to LCA
max_on_path(v, LCA) // climbing from v to LCA
)
With binary lifting, store max weight for each power-of-2 jump:
# max_edge[k][node] = max edge weight in the 2^k steps above node
# built alongside up[k][node]
Variation 5: LCA on Weighted Trees
Same algorithms work. Just store edge weights and combine them during lifting/tour.
Variation 6: LCA of Multiple Nodes
“LCA of nodes {a, b, c, d}?”
Sort by Euler Tour tin, then LCA of the set = LCA of consecutive pairs:
def lca_set(nodes):
nodes.sort(key=lambda x: tin[x])
result = nodes[0]
for i in range(1, len(nodes)):
result = lca(result, nodes[i])
return result
Variation 7: Virtual Tree (Auxiliary Tree)
When you have Q queries on specific nodes, build a virtual tree containing only the query nodes and their pairwise LCAs. Reduces an N-node tree to at most 2Q nodes.
Quick Reference Table
| Problem | Technique | Complexity |
|---|---|---|
| LCA query (online, simple) | Binary Lifting | O(log N) |
| LCA query (online, fastest) | Euler Tour + Sparse Table | O(1) |
| LCA query (offline batch) | Tarjan’s + DSU | O(N + Q) |
| Distance(u, v) | LCA + depth | O(LCA query) |
| Kth ancestor | Binary Lifting | O(log N) |
| Kth node on path | LCA + Kth ancestor | O(log N) |
| Path max/sum | Binary Lifting with weights | O(log N) |
| Path query with updates | HLD + Segment Tree | O(log^2 N) |
| LCA of K nodes | Sort by tin + pairwise LCA | O(K log N) |