SOS DP (Sum over Subsets)

Problem Overview

Attribute	Value
Difficulty	Hard
Category	Dynamic Programming / Bitmask
Key Technique	Sum over Subsets DP
Time Complexity	O(n * 2^n)

Concept Explanation

SOS DP is a technique to efficiently compute, for each bitmask mask, the sum (or count, min, max, etc.) over all subsets of that mask. Given an array A of size 2^n, we want to compute:

F[mask] = sum of A[submask] for all submask where (submask & mask) == submask

The naive approach takes O(3^n), but SOS DP reduces this to O(n * 2^n).

Learning Goals

After studying this technique, you will be able to:

Understand bitmask subset relationships and iteration
Implement SOS DP to compute subset sums in O(n * 2^n)
Apply SOS DP to AND convolution and subset counting problems
Recognize problems that can be optimized using SOS DP
Differentiate between subset DP and superset DP formulations

Problem Statement

Problem: Given an array A of 2^n elements indexed by bitmasks from 0 to 2^n - 1, compute array F where:

F[mask] = sum of A[i] for all i such that i is a subset of mask (i.e., i & mask == i)

Input:

n: Number of bits
A[0..2^n-1]: Array of values

Output:

F[0..2^n-1]: Array where F[mask] = sum over all subsets of mask

Constraints:

1 <= n <= 20 (practical limit due to 2^n space)

Example

Input: n = 2, A = [1, 2, 3, 4]  (indices: 00, 01, 10, 11)

Output: F = [1, 3, 4, 10]

Explanation:
F[00] = A[00] = 1                    (only subset is 00)
F[01] = A[00] + A[01] = 1 + 2 = 3    (subsets: 00, 01)
F[10] = A[00] + A[10] = 1 + 3 = 4    (subsets: 00, 10)
F[11] = A[00] + A[01] + A[10] + A[11] = 1 + 2 + 3 + 4 = 10

Intuition: How to Think About This Problem

Pattern Recognition

Key Question: How can we avoid recomputing overlapping subsets?

The key insight is to build up the answer dimension by dimension (bit by bit). Instead of iterating over all subsets directly, we process one bit position at a time, combining results from having that bit “on” or “off”.

Bitmask Visualization (n=3)

Bitmasks for n=3:
  Index (binary)   Index (decimal)   Subsets
  ─────────────────────────────────────────────
       000              0            {000}
       001              1            {000, 001}
       010              2            {000, 010}
       011              3            {000, 001, 010, 011}
       100              4            {000, 100}
       101              5            {000, 001, 100, 101}
       110              6            {000, 010, 100, 110}
       111              7            {all 8 masks}

Breaking Down the Problem

What are we looking for? Sum over all subsets for each mask
What information do we have? Original array A indexed by bitmasks
Key Relationship: If dp[mask][i] = sum over subsets that differ only in bits 0..i-1, then we can build this incrementally

Solution 1: Brute Force (O(3^n))

Idea

For each mask, enumerate all its subsets using the standard subset enumeration technique.

Algorithm

For each mask from 0 to 2^n - 1
Enumerate all subsets of mask
Sum up A[submask] for each subset

Code

def sos_brute_force(n: int, A: list[int]) -> list[int]:
  """
  Brute force: enumerate all subsets of each mask.

  Time: O(3^n) - each element belongs to 2^k supersets
  Space: O(2^n) for result array
  """
  size = 1 << n
  F = [0] * size

  for mask in range(size):
    submask = mask
    while True:
      F[mask] += A[submask]
      if submask == 0:
        break
      submask = (submask - 1) & mask

  return F

Complexity

Metric	Value	Explanation
Time	O(3^n)	Sum of 2^(popcount(mask)) over all masks = 3^n
Space	O(2^n)	Result array

Why This Is Slow

Each subset is counted multiple times across different masks. For n=20, 3^20 is about 3.5 billion operations.

Solution 2: SOS DP (O(n * 2^n))

Key Insight

The Trick: Process one bit at a time. Build dp[mask][i] = sum over subsets that match mask in bits >= i.

DP State Definition

State	Meaning
`dp[mask][i]`	Sum of A[s] for all s that are subsets of mask AND s agrees with mask on bits i to n-1

In plain English: After processing i bits, dp[mask][i] holds the partial sum considering only choices in the first i bit positions.

State Transition

dp[mask][i+1] = dp[mask][i]                           (if bit i of mask is 0)
dp[mask][i+1] = dp[mask][i] + dp[mask ^ (1<<i)][i]   (if bit i of mask is 1)

Why? When bit i is 1 in mask, subsets can either have bit i as 0 or 1. We combine both cases.

Space-Optimized Transition

Since we only need the previous iteration, we can use 1D array:

for i in 0 to n-1:
    for mask in 0 to 2^n - 1:
        if mask & (1 << i):
            dp[mask] += dp[mask ^ (1 << i)]

Dry Run Example (n=3)

Let’s trace through with A = [1, 2, 3, 4, 5, 6, 7, 8] (indices 000 to 111):

Initial: dp = A = [1, 2, 3, 4, 5, 6, 7, 8]

═══════════════════════════════════════════════════════
ITERATION i=0 (process bit 0):
───────────────────────────────────────────────────────
For each mask with bit 0 set, add dp[mask with bit 0 cleared]:

mask=001: dp[001] += dp[000]  =>  2 + 1 = 3
mask=011: dp[011] += dp[010]  =>  4 + 3 = 7
mask=101: dp[101] += dp[100]  =>  6 + 5 = 11
mask=111: dp[111] += dp[110]  =>  8 + 7 = 15

After i=0: dp = [1, 3, 3, 7, 5, 11, 7, 15]

═══════════════════════════════════════════════════════
ITERATION i=1 (process bit 1):
───────────────────────────────────────────────────────
For each mask with bit 1 set, add dp[mask with bit 1 cleared]:

mask=010: dp[010] += dp[000]  =>  3 + 1 = 4
mask=011: dp[011] += dp[001]  =>  7 + 3 = 10
mask=110: dp[110] += dp[100]  =>  7 + 5 = 12
mask=111: dp[111] += dp[101]  => 15 + 11 = 26

After i=1: dp = [1, 3, 4, 10, 5, 11, 12, 26]

═══════════════════════════════════════════════════════
ITERATION i=2 (process bit 2):
───────────────────────────────────────────────────────
For each mask with bit 2 set, add dp[mask with bit 2 cleared]:

mask=100: dp[100] += dp[000]  =>  5 + 1 = 6
mask=101: dp[101] += dp[001]  => 11 + 3 = 14
mask=110: dp[110] += dp[010]  => 12 + 4 = 16
mask=111: dp[111] += dp[011]  => 26 + 10 = 36

FINAL: dp = [1, 3, 4, 10, 6, 14, 16, 36]

═══════════════════════════════════════════════════════
VERIFICATION:
───────────────────────────────────────────────────────
F[111] = A[000]+A[001]+A[010]+A[011]+A[100]+A[101]+A[110]+A[111]
       = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36  [Correct!]

F[101] = A[000] + A[001] + A[100] + A[101]
       = 1 + 2 + 5 + 6 = 14  [Correct!]

Visual Diagram

Processing bit-by-bit for mask = 111 (binary):

             Original values
             ┌─────────────┐
             │    A[s]     │
             └─────────────┘
                   │
        ┌──────────┴──────────┐
        ▼                     ▼
   Bit 0 = 0              Bit 0 = 1
   (000,010,100,110)      (001,011,101,111)
        │                     │
        └──────────┬──────────┘
                   ▼
             After i=0:
         dp[111] = sum of all
         masks ending in 1
        ┌──────────┴──────────┐
        ▼                     ▼
   Bit 1 = 0              Bit 1 = 1
        │                     │
        └──────────┬──────────┘
                   ▼
             After i=1:
         dp[111] includes
         all bit-1 variations
                   │
                   ▼
             After i=2:
         dp[111] = sum over
         ALL 8 subsets = 36

Code (Python)

def sos_dp(n: int, A: list[int]) -> list[int]:
  """
  SOS DP: Sum over Subsets in O(n * 2^n).

  Time: O(n * 2^n)
  Space: O(2^n)
  """
  size = 1 << n
  dp = A[:]  # Copy input array

  for i in range(n):  # Process each bit
    for mask in range(size):
      if mask & (1 << i):  # If bit i is set
        dp[mask] += dp[mask ^ (1 << i)]

  return dp

Complexity

Metric	Value	Explanation
Time	O(n * 2^n)	n iterations, each processing 2^n masks
Space	O(2^n)	Single DP array

Why O(n * 2^n) Works

The magic comes from the inclusion principle done incrementally:

Outer loop: n iterations (one per bit position)
Inner loop: 2^n masks
Total: n * 2^n operations

Compare to O(3^n): For n=20:

O(3^n) = 3,486,784,401 operations
O(n * 2^n) = 20 * 1,048,576 = 20,971,520 operations
167x faster!

Common Mistakes

Mistake 1: Wrong Iteration Order

# WRONG: Processing masks before bits
for mask in range(size):
  for i in range(n):  # Bits inside!
    if mask & (1 << i):
      dp[mask] += dp[mask ^ (1 << i)]

Problem: The dp values for smaller masks haven’t been computed correctly yet when needed. Fix: Always iterate bits in the outer loop, masks in the inner loop.

Mistake 2: Confusing Subset vs Superset DP

# SUBSET DP (sum over subsets of mask):
if mask & (1 << i):
  dp[mask] += dp[mask ^ (1 << i)]

# SUPERSET DP (sum over supersets of mask):
if not (mask & (1 << i)):
  dp[mask] += dp[mask | (1 << i)]

Problem: Using subset logic when superset is needed (or vice versa). Fix: Understand which direction you need:

Subset: smaller masks contribute to larger masks
Superset: larger masks contribute to smaller masks

Mistake 3: Not Copying Input Array

# WRONG: Modifying input directly
dp = A  # Reference, not copy!

# CORRECT:
dp = A[:]  # Python
dp = A.copy()  # Also Python
vector<ll> dp(A);  // C++

Edge Cases

Case	Input	Expected Behavior	Notes
n=0	Single element	F[0] = A[0]	Trivial case
All zeros	A = [0,0,0,…]	F = [0,0,0,…]	Sum of zeros
Single set bit	A[k]=1, rest 0	Only supersets of k are non-zero	Tests mask relationships
Max values	A[i] = 10^9	Use long long	Sum can exceed 2^31
n=1	Two elements	F = [A[0], A[0]+A[1]]	Minimal non-trivial case
Negative values	Mixed signs	Works correctly	SOS DP handles negatives

When to Use This Pattern

Use SOS DP When:

Computing sum/count over all subsets of each bitmask
AND convolution of two arrays: C[k] = sum of A[i]*B[j] where i&j=k
Counting subsets with specific properties
Problems involving “for each mask, consider all subsets”
The constraint involves n <= 20 (since 2^n must fit in memory)

Don’t Use When:

n > 20-22 (memory/time constraints)
You only need subset sum for a single mask (use simple iteration)
The problem involves OR convolution (different technique)
Subset XOR problems (use linear algebra / Gaussian elimination)

Pattern Recognition Checklist:

Array indexed by bitmasks? -> Consider SOS DP
“Sum over all subsets” phrase in problem? -> Strong indicator
AND operation combining results? -> AND convolution via SOS
Need inverse (Mobius transform)? -> SOS with subtraction
Superset sum instead? -> Reverse the bit condition

Variant: Superset Sum

To compute sum over all supersets instead of subsets:

def superset_dp(n: int, A: list[int]) -> list[int]:
  """Sum over all supersets of each mask."""
  size = 1 << n
  dp = A[:]

  for i in range(n):
    for mask in range(size):
      if not (mask & (1 << i)):  # If bit i is NOT set
        dp[mask] += dp[mask | (1 << i)]

  return dp

Prerequisites (Do These First)

Problem	Why It Helps
Hamiltonian Flights (CSES)	Bitmask DP fundamentals
Two Sets II (CSES)	Subset sum DP basics

Direct Applications

Problem	Key Concept
Codeforces - Compatible Numbers	Find subset with AND = 0
Codeforces - AND, OR and Square Sum	AND convolution application

Harder (Do These After)

Problem	New Concept
Codeforces - Inverse the Problem	SOS with tree reconstruction
AtCoder - Grouping (DP Contest)	Subset DP with grouping

Key Takeaways

The Core Idea: Process one bit at a time, accumulating contributions from having each bit on or off
Time Optimization: From O(3^n) to O(n * 2^n) by avoiding redundant subset enumeration
Space Trade-off: O(2^n) space is required and unavoidable
Pattern: This is a “dimension-by-dimension” DP technique, similar to Floyd-Warshall
Variants: Superset DP uses reversed bit condition; Mobius transform uses subtraction

Practice Checklist

Before moving on, make sure you can:

Implement SOS DP from scratch without reference
Explain why the bit-by-bit approach is correct
Convert between subset and superset formulations
Identify SOS DP problems in contest settings
Handle edge cases (n=0, negative values, overflow)
Derive time complexity O(n * 2^n) mathematically

Additional Resources

CP-Algorithms: Sum over Subsets
Codeforces Blog: SOS DP
CSES Elevator Rides - Bitmask DP application

SOS DP (Sum over Subsets)

Problem Overview

Concept Explanation

Learning Goals

Problem Statement

Example

Intuition: How to Think About This Problem

Pattern Recognition

Bitmask Visualization (n=3)

Breaking Down the Problem

Solution 1: Brute Force (O(3^n))

Idea

Algorithm

Code

Complexity

Why This Is Slow

Solution 2: SOS DP (O(n * 2^n))

Key Insight

DP State Definition

State Transition

Space-Optimized Transition

Dry Run Example (n=3)

Visual Diagram

Code (Python)

Complexity

Why O(n * 2^n) Works

Common Mistakes

Mistake 1: Wrong Iteration Order

Mistake 2: Confusing Subset vs Superset DP

Mistake 3: Not Copying Input Array

Edge Cases

When to Use This Pattern

Use SOS DP When:

Don’t Use When:

Pattern Recognition Checklist:

Variant: Superset Sum

Related Problems

Prerequisites (Do These First)

Direct Applications

Harder (Do These After)

Key Takeaways

Practice Checklist

Additional Resources