Consistent Hashing
Consistent hashing is a distributed systems technique for distributing data across nodes in a way that minimizes reorganization when nodes are added or removed.
Interview context: Consistent hashing is a foundational concept that appears in many system design questions. When asked about distributed caches, databases, or load balancing, this is often the underlying mechanism.
Table of Contents
- The Problem with Regular Hashing
- How Consistent Hashing Works
- Why It’s Better
- Virtual Nodes
- Replication and Durability
- Implementation
- Real-World Systems
- Key Takeaways
The Problem with Regular Hashing
Interview context: Start by explaining WHY consistent hashing exists. The problem it solves is more important than the solution.
The Challenge
With traditional hashing (e.g., hash(key) % N where N = number of servers):
Server = hash("user_123") % 3 → Server 1
Problem: If you add or remove a server, N changes, and almost ALL keys get remapped to different servers. This causes massive cache invalidation or data migration.
| Operation | Traditional Hash | Consistent Hash |
|---|---|---|
| Add/remove 1 of 100 servers | ~99% keys remapped | ~1% keys remapped |
Interviewer might ask: “Why is remapping bad?”
For caches: remapping means cache misses → database overload (“thundering herd”) For databases: remapping means data migration → downtime or complex operations
How Consistent Hashing Works
The Hash Ring
Imagine a circular ring of hash values (0 to 2^32-1):
flowchart TB
subgraph Ring["Hash Ring (0 to 2^32-1)"]
direction TB
Top["0"]
end
Placing Nodes on the Ring
Hash each server’s identifier to position it on the ring:
flowchart TB
subgraph HashRing["Hash Ring"]
A["Server A"]
B["Server B"]
C["Server C"]
end
Placing Keys on the Ring
Hash each key and walk clockwise to find the first server:
flowchart LR
Key["Key 'user_123'<br/>hashes here"] -->|walks clockwise| ServerA["Server A"]
Key “user_123” is stored on Server A (the first server clockwise from the key’s position).
Key Lookup Process
flowchart LR
Step1["1. Hash the key<br/>hash('user_123') = 847291..."]
Step2["2. Find position<br/>Locate on ring"]
Step3["3. Walk clockwise<br/>Find first node >= hash"]
Step4["4. Query that node<br/>Ask Server A"]
Step1 --> Step2 --> Step3 --> Step4
The client (or a coordinator) computes which node owns the key, then queries only that node directly. No broadcast needed.
Why It’s Better
When a server is added or removed, only keys between that server and its predecessor are affected:
Before: A handles keys in range [C, A]
After removing A: B now handles [C, B] (only A's keys move)
Virtual Nodes
Interview context: Virtual nodes are a critical optimization. Interviewers often ask about load balancing, and this is the answer.
The Challenge
Problem: With few servers, distribution can be uneven.
Solution: Map each physical server to multiple “virtual nodes” on the ring:
Server A → A1, A2, A3, A4 (4 virtual nodes spread around ring)
Server B → B1, B2, B3, B4
How Virtual Nodes Work
A virtual node is simply a hash ring position that maps back to a physical server. Instead of placing one point on the ring per server, you place many points (all pointing to the same physical server).
Without virtual nodes (3 servers):
Ring positions:
hash("ServerA") = 1000 → ServerA
hash("ServerB") = 5000 → ServerB
hash("ServerC") = 9000 → ServerC
With 4 virtual nodes each:
Ring positions:
hash("ServerA:0") = 1000 → ServerA
hash("ServerA:1") = 4200 → ServerA
hash("ServerA:2") = 7800 → ServerA
hash("ServerA:3") = 11500 → ServerA
hash("ServerB:0") = 2300 → ServerB
hash("ServerB:1") = 5600 → ServerB
... etc
Why Virtual Nodes Improve Distribution
Without virtual nodes (3 servers):
o A
/
/ Large gap - ServerA gets ~60% of keys
o------------------o
C B <- Small gap, B gets only 15%
With virtual nodes spread around:
A1 B2 C1 A3 B1 C3 A2 B3 C2 A4 B4 C4
o---o---o---o---o---o---o---o---o---o---o---o
Keys are now evenly distributed because each server has many evenly-spaced “claim points” on the ring.
Virtual Node Counts in Production
| System | Virtual Nodes per Physical Node |
|---|---|
| Cassandra | 256 (default) |
| Typical production | 100-500 |
Data Structure in Memory
The ring is just a sorted map: hash_position → physical_server_name
Positions: 1000 → ServerA
2300 → ServerB
3100 → ServerC
4200 → ServerA (same server, different position)
5600 → ServerB
7800 → ServerA (again ServerA)
The word “virtual” means these nodes don’t exist physically—they’re simply extra entries with different string suffixes (:0, :1, :2) that produce different hash values, all pointing back to the same real server.
Interviewer might ask: “Why not just have evenly-spaced fixed positions?”
Because nodes join and leave dynamically. Random hash positions (via
hash("ServerA:0")) are simpler than coordinated even spacing.
Replication and Durability
Consistent hashing solves routing. Replication solves durability.
Without Replication: Data Loss
When a node is removed, its data is LOST:
Before:
flowchart TB
A["Server A<br/>(has keys X, Y, Z)"]
B["Server B"]
C["Server C"]
After removing A:
flowchart TB
Gone["(A gone!)"]
B["Server B<br/>now responsible for X, Y, Z<br/>but doesn't have the data!"]
C["Server C"]
With Replication: Data Survives
Production systems replicate data to N successor nodes on the ring:
flowchart LR
Key["Key 'user_123'"] --> A["Server A (primary)"]
A -->|replicated to| B["Server B (replica 1)"]
B -->|replicated to| C["Server C (replica 2)"]
When A is removed:
- B becomes the new primary for those keys
- B already has the data (it was a replica)
- System replicates to a new third node to maintain N=3
Data Status by Scenario
| Scenario | Data Status |
|---|---|
| Node removed, no replication | Data lost |
| Node removed, with replication | Data safe on replicas, system rebalances |
| Node added | New node receives data from neighbors (gradual migration) |
Interview Tips
How to Explain Consistent Hashing (5 minutes)
1. STATE THE PROBLEM (1 min)
"With modulo hashing, adding a server remaps ~99% of keys.
This causes cache stampedes or massive data migration."
2. INTRODUCE THE RING (1 min)
"We place both servers AND keys on a circular hash space.
Keys belong to the first server clockwise."
3. SHOW THE BENEFIT (1 min)
"Adding a server only affects keys between it and its predecessor.
~1/N keys move instead of ~(N-1)/N."
4. EXPLAIN VIRTUAL NODES (1 min)
"With few servers, distribution is uneven. Virtual nodes spread
each server across many ring positions for better balance."
5. MENTION REPLICATION (1 min)
"For durability, we replicate to N successive nodes clockwise.
When a node fails, replicas already have the data."
Key Phrases That Show Depth
| Instead of… | Say… |
|---|---|
| “We use consistent hashing” | “We use a hash ring with 100-200 virtual nodes per server. Keys walk clockwise to find their owner.” |
| “It’s more efficient” | “Adding a node remaps only 1/N keys versus (N-1)/N with modulo hashing—that’s 1% vs 99% for 100 servers.” |
| “We replicate data” | “We replicate to N successor nodes on the ring. When a node fails, its successor already has the data and becomes the new primary.” |
Common Follow-up Questions
| Question | Key Points |
|---|---|
| “Why virtual nodes?” | Even distribution with few servers, smoother rebalancing |
| “How many virtual nodes?” | 100-500 typical, Cassandra uses 256 |
| “What hash function?” | MD5 for simplicity, Murmur3 for performance |
| “Hot spots?” | Virtual nodes help, but some keys are inherently hot |
| “What about replication?” | Replicate to N successive distinct physical nodes |
Real-World Systems
Interview context: Mentioning real systems shows practical knowledge.
Amazon DynamoDB
- Uses consistent hashing for partitioning data across storage nodes
- Each partition key is hashed to determine which partition stores the data
- Uses virtual nodes (vnodes) for better load distribution
- Implements quorum-based replication (N replicas, W writes, R reads)
- Introduced in the famous Dynamo Paper (2007)
Partition Key → MD5 Hash → Ring Position → N Replicas
Apache Cassandra
- Directly inspired by DynamoDB’s design
- Uses Murmur3 hash (faster than MD5) by default
- Token ranges: Each node owns a range of hash values
- Virtual nodes (vnodes): Default 256 vnodes per physical node
- Replication factor: Configurable per keyspace
CREATE KEYSPACE my_app WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3, -- 3 replicas in datacenter 1
'dc2': 2 -- 2 replicas in datacenter 2
};
Redis Cluster
- Uses hash slots (16384 fixed slots) instead of a continuous ring
- Each key is hashed:
SLOT = CRC16(key) % 16384 - Slots are distributed across master nodes
- Simpler than continuous hashing but same principle
Node A: slots 0-5460
Node B: slots 5461-10922
Node C: slots 10923-16383
Other Notable Systems
| System | Description |
|---|---|
| Memcached | Clients implement consistent hashing via libketama |
| Akamai CDN | Pioneered consistent hashing for web caching (1997) |
| Discord | Routes messages to Elixir processes |
| Riak | Dynamo-inspired with hinted handoff and read repair |
| Apache Kafka | Partition assignment for consumer groups |
| Couchbase | 1024 vBuckets (similar to Redis hash slots) |
| Netflix EVCache | Memcached with zone-aware consistent hashing |
Implementation Comparison
| System | Hash Function | Partitioning | Replication |
|---|---|---|---|
| DynamoDB | MD5 | Continuous ring + vnodes | Quorum (N,R,W) |
| Cassandra | Murmur3 | Token ranges + vnodes | Configurable RF |
| Redis Cluster | CRC16 | 16384 hash slots | Primary + replicas |
| Memcached | Ketama (MD5) | Client-side ring | None (cache only) |
| Riak | SHA-1 | Ring + vnodes | N replicas + hinted handoff |
| Kafka | Murmur2 | Fixed partitions | ISR (In-Sync Replicas) |
| Couchbase | CRC32 | 1024 vBuckets | Configurable replicas |
Algorithm Variations
| Variation | Used By | Benefit |
|---|---|---|
| Virtual nodes | Cassandra, DynamoDB, Riak | Better load balance |
| Fixed slots/buckets | Redis, Couchbase | Simpler resharding |
| Jump consistent hash | O(1) memory, no ring storage | |
| Rendezvous hashing | Microsoft, Twitter | Alternative to ring-based |
Key Takeaways
Design Decisions Summary
| Decision | Choice | Why |
|---|---|---|
| Hash space | Circular ring (0 to 2^32-1) | Allows clockwise lookup |
| Key assignment | First server clockwise | Simple, deterministic |
| Virtual nodes | 100-500 per server | Even distribution |
| Replication | N successors | Durability without coordination |
Trade-offs to Discuss
| Trade-off | Option A | Option B |
|---|---|---|
| Virtual nodes | Many (even distribution) | Few (simpler, faster lookup) |
| Hash function | MD5 (simple, standard) | Murmur3 (faster) |
| Replication | On ring (N successors) | Separate replication layer |
| Hot keys | Consistent hashing can’t help | Need separate caching layer |
Core Concepts
- Hash ring - Nodes and keys share the same circular hash space
- Clockwise lookup - Keys are assigned to the next node clockwise
- Minimal disruption - Adding/removing nodes only affects neighboring keys (~1/N)
- Virtual nodes - Multiple ring positions per server improves load balance
- Replication - Consistent hashing solves routing; replication solves durability