Coordinate Compression - Efficient Value Mapping
Problem Overview
| Aspect | Details |
|---|---|
| Technique | Map large/sparse values to small consecutive integers |
| Use Case | When value range is huge but count of distinct values is small |
| Benefit | Enables array indexing, reduces memory, speeds up algorithms |
| Time | O(n log n) for sorting + O(1) per lookup |
| Space | O(n) for mapping |
What is Coordinate Compression?
Coordinate compression maps a set of values to consecutive integers [0, k-1] where k is the number of distinct values, preserving relative order.
Example:
Original: [1000000, 5, 999999, 5, 500000]
Distinct sorted: [5, 500000, 999999, 1000000]
Mapping: 5->0, 500000->1, 999999->2, 1000000->3
Compressed: [3, 0, 2, 0, 1]
Why useful:
- Original range: 0 to 10^9 (can’t create array this large)
- Compressed range: 0 to 4 (easy to index)
When to Use
| Scenario | Without Compression | With Compression |
|---|---|---|
| Segment tree on values | O(10^9) space | O(n) space |
| Counting sort on large values | Impossible | O(n) |
| 2D grid with sparse points | Memory limit | Feasible |
Implementation
Python Solution
def compress(arr):
"""
Compress array values to consecutive integers.
Returns compressed array and mappings.
"""
# Get sorted unique values
sorted_unique = sorted(set(arr))
# Create value -> compressed index mapping
val_to_idx = {v: i for i, v in enumerate(sorted_unique)}
# Compress the array
compressed = [val_to_idx[v] for v in arr]
return compressed, val_to_idx, sorted_unique
# Example usage
arr = [1000000, 5, 999999, 5, 500000]
compressed, mapping, original = compress(arr)
print(f"Compressed: {compressed}") # [3, 0, 2, 0, 1]
print(f"Mapping: {mapping}") # {5: 0, 500000: 1, 999999: 2, 1000000: 3}
Common Applications
1. Range Queries with Large Values
def count_in_range(arr, queries):
"""
Count elements in range [l, r] for each query.
Values can be up to 10^9.
"""
# Compress values
compressed, val_to_idx, sorted_vals = compress(arr)
n_unique = len(sorted_vals)
# Build frequency array on compressed values
freq = [0] * n_unique
for c in compressed:
freq[c] += 1
# Build prefix sum for range queries
prefix = [0] * (n_unique + 1)
for i in range(n_unique):
prefix[i + 1] = prefix[i] + freq[i]
results = []
for l, r in queries:
# Find compressed indices for l and r
l_idx = bisect.bisect_left(sorted_vals, l)
r_idx = bisect.bisect_right(sorted_vals, r)
results.append(prefix[r_idx] - prefix[l_idx])
return results
2. 2D Coordinate Compression
def compress_2d(points):
"""
Compress both x and y coordinates independently.
"""
xs = sorted(set(p[0] for p in points))
ys = sorted(set(p[1] for p in points))
x_map = {x: i for i, x in enumerate(xs)}
y_map = {y: i for i, y in enumerate(ys)}
compressed_points = [(x_map[x], y_map[y]) for x, y in points]
return compressed_points, len(xs), len(ys)
# Example: Points with coordinates up to 10^9
points = [(1000000000, 1), (5, 999999999), (1000000000, 999999999)]
comp_points, w, h = compress_2d(points)
# Now can create w x h grid instead of 10^9 x 10^9
3. Segment Tree on Values
class SegmentTreeOnValues:
"""
Segment tree indexed by values (after compression).
Useful for: count of elements in value range, etc.
"""
def __init__(self, arr):
self.compressed, self.mapping, self.original = compress(arr)
self.n = len(self.original)
self.tree = [0] * (4 * self.n)
# Insert compressed values
for c in self.compressed:
self._update(1, 0, self.n - 1, c, 1)
def _update(self, node, start, end, idx, delta):
if start == end:
self.tree[node] += delta
else:
mid = (start + end) // 2
if idx <= mid:
self._update(2 * node, start, mid, idx, delta)
else:
self._update(2 * node + 1, mid + 1, end, idx, delta)
self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]
def count_in_range(self, l, r):
"""Count elements with value in [l, r]."""
# Map to compressed indices
import bisect
l_idx = bisect.bisect_left(self.original, l)
r_idx = bisect.bisect_right(self.original, r) - 1
if l_idx > r_idx:
return 0
return self._query(1, 0, self.n - 1, l_idx, r_idx)
Complexity Analysis
| Operation | Time | Notes |
|---|---|---|
| Sort unique values | O(n log n) | One-time preprocessing |
| Build mapping | O(n) | Hash map or sorted array |
| Compress one value | O(1) or O(log n) | Hash map vs binary search |
| Total compression | O(n log n) | Dominated by sorting |
Common Mistakes
- Not handling duplicates: Use
set()before sorting - Off-by-one in binary search: Use
lower_boundvsupper_boundcorrectly - Forgetting to decompress: Store original values if needed for output
- Compressing queries separately: Query values might not be in original array
When NOT to Use
- Values already small (0 to n-1)
- Only need to compare values (not index them)
- Single pass algorithm that doesn’t need random access
Key Takeaways
- Coordinate compression reduces large value ranges to [0, k-1]
- Essential for segment trees and BITs on value ranges
- O(n log n) preprocessing enables O(1) or O(log n) lookups
- Preserve original values if output needs them
- Common in competitive programming for handling 10^9 value ranges