Design Ad Click Event Aggregation

An ad click event aggregation system ingests billions of click events per day, aggregates them in real time, and provides advertisers with accurate metrics for billing and analytics. The challenge is building a pipeline that is fast enough for real-time dashboards yet accurate enough for billing.

Interview context: This is a data pipeline / stream processing question. Unlike request-response systems, the focus here is on event ingestion, aggregation accuracy, and the trade-off between real-time speed and data correctness. Be ready to discuss MapReduce, stream processing (Flink/Spark), and exactly-once semantics.

Requirements
High-Level Architecture
Data Model
Core Components
Aggregation Service Deep Dive
Key Design Decisions
Scalability
Reliability and Data Accuracy
Reconciliation
Interview Tips
Key Takeaways

1. Requirements

Interview context: Always start by clarifying requirements. Ad click aggregation has very different needs from a typical CRUD application.

Questions to Ask the Interviewer

What’s the expected click volume? (events per second)
What aggregation granularity? (per minute, per hour, per day)
How fresh must the data be? (real-time dashboards vs end-of-day reports)
What dimensions should we aggregate by? (ad_id, campaign_id, geo, device)
Do we need to support ad-hoc queries, or only predefined aggregations?
Is this for billing, analytics, or both?

Functional Requirements

Requirement	Description
Click ingestion	Receive and store raw ad click events at scale
Real-time aggregation	Aggregate clicks per ad_id per minute for real-time dashboards
Multi-dimensional aggregation	Slice by campaign, geography, device type, time
Query API	Return aggregated click counts for a given ad/campaign over a time range
Filtering	Filter duplicate and fraudulent clicks before counting

Non-Functional Requirements

Requirement	Target	Rationale
Throughput	10,000 clicks/sec (peak 50K)	High-traffic ad network
Aggregation latency	< 1 minute	Near real-time dashboard
Data accuracy	100% for billing	Advertisers pay per click
Availability	99.99%	Revenue-critical pipeline
Data retention	Raw: 7 days, Aggregated: years	Compliance + trending
Query latency	p99 < 500ms	Dashboard responsiveness

Capacity Estimation

Click events:
- Daily clicks:          ~1 billion
- Average QPS:           ~10,000 clicks/sec
- Peak QPS:              ~50,000 clicks/sec (5x average)

Per click event:          ~500 bytes
- Raw storage/day:       1B × 500B = ~500 GB/day
- Raw retention (7 days): ~3.5 TB

Aggregated data:
- Unique ad_ids:          ~2 million
- Aggregations/min:       2M ad_ids × 1 record = 2M rows/min
- Aggregated storage/day: ~50 GB
- Aggregated retention:   ~18 TB/year

2. High-Level Architecture

Interview context: “This system has two distinct paths: a real-time aggregation pipeline for dashboards, and a batch reconciliation pipeline for billing accuracy. Let me draw both.”

flowchart TB
    subgraph Clients["AD CLIENTS"]
        Browser["Web Browser"]
        MobileApp["Mobile App"]
        SDK["Ad SDK"]
    end

    subgraph Ingestion["INGESTION LAYER"]
        LB["Load Balancer"]
        ClickAPI["Click Receiver<br/>(Stateless)"]
    end

    Clients -->|"Click event"| Ingestion

    subgraph Messaging["MESSAGE QUEUE"]
        Kafka["Kafka Cluster<br/>(Partitioned by ad_id)"]
    end

    Ingestion --> Kafka

    subgraph RealTime["REAL-TIME PATH (Speed Layer)"]
        Flink["Stream Processor<br/>(Flink / Spark Streaming)"]
        AggDB["Aggregation DB<br/>(OLAP Store)"]
    end

    Kafka --> Flink
    Flink --> AggDB

    subgraph Batch["BATCH PATH (Accuracy Layer)"]
        RawStore["Raw Event Store<br/>(S3 / HDFS)"]
        Spark["Batch Processor<br/>(Spark / MapReduce)"]
    end

    Kafka --> RawStore
    RawStore --> Spark
    Spark -->|"Reconcile"| AggDB

    subgraph Query["QUERY LAYER"]
        QueryAPI["Query Service"]
        Cache["Cache (Redis)"]
        Dashboard["Advertiser Dashboard"]
    end

    AggDB --> QueryAPI
    Cache --> QueryAPI
    QueryAPI --> Dashboard

Component Responsibilities

Component	Responsibility	Technology
Click Receiver	Validate and forward click events	Go / Java (stateless)
Kafka	Durable event buffer, decouples ingestion from processing	Kafka with partitioning
Stream Processor	Real-time aggregation (per-minute windows)	Apache Flink
Raw Event Store	Archive raw events for reconciliation and replay	S3 / HDFS
Batch Processor	Daily reconciliation, exact counts for billing	Apache Spark
Aggregation DB	Store pre-computed aggregated counts	ClickHouse / Druid / Cassandra
Query Service	Serve aggregated data to dashboards	Java / Go
Cache	Hot aggregation results for frequent queries	Redis

Why Two Paths (Lambda Architecture)?

Interviewer might ask: “Why not just use stream processing for everything?”

Concern	Real-Time Only	Lambda (Real-Time + Batch)
Latency	Sub-minute	Sub-minute (speed layer)
Accuracy	May miss late events	Batch corrects drift
Billing	Risky for financial data	Batch is source of truth
Complexity	Simpler	More complex, but safer

We choose Lambda because: Advertisers are billed per click. Even a 0.1% discrepancy at 1B clicks/day = 1M miscounted clicks. The batch layer acts as the source of truth for billing, while the speed layer powers real-time dashboards.

3. Data Model

Interview context: “Let me define the data model. We have raw events and aggregated tables.”

Raw Click Event

{
  "click_id":    "uuid-abc-123",
  "ad_id":       "ad_98765",
  "campaign_id": "camp_555",
  "user_id":     "user_42",
  "ip_address":  "203.0.113.42",
  "user_agent":  "Mozilla/5.0 ...",
  "country":     "US",
  "device_type": "mobile",
  "os":          "iOS",
  "referrer":    "https://example.com/article",
  "timestamp":   "2026-02-26T14:05:32.123Z"
}

Aggregated Click Table

CREATE TABLE ad_click_aggregation (
    ad_id          VARCHAR(50)   NOT NULL,
    click_minute   TIMESTAMP     NOT NULL,   -- truncated to minute
    click_count    BIGINT        NOT NULL DEFAULT 0,
    filter_count   BIGINT        NOT NULL DEFAULT 0,  -- filtered (fraud/dup)
    updated_at     TIMESTAMP     DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (ad_id, click_minute)
);

Multi-Dimensional Aggregation Table

CREATE TABLE ad_click_aggregation_extended (
    ad_id          VARCHAR(50)   NOT NULL,
    campaign_id    VARCHAR(50)   NOT NULL,
    country        VARCHAR(5)    NOT NULL,
    device_type    VARCHAR(20)   NOT NULL,
    click_minute   TIMESTAMP     NOT NULL,
    click_count    BIGINT        NOT NULL DEFAULT 0,
    PRIMARY KEY (ad_id, campaign_id, country, device_type, click_minute)
);

Why Star Schema for Analytics?

The aggregation tables follow a star schema pattern. The click_minute acts as a time dimension, and ad_id, campaign_id, country, device_type are dimension keys. This enables fast slice-and-dice queries like:

-- Clicks per campaign in the US, last hour
SELECT campaign_id, SUM(click_count)
FROM ad_click_aggregation_extended
WHERE country = 'US'
  AND click_minute >= NOW() - INTERVAL 1 HOUR
GROUP BY campaign_id;

Interviewer might ask: “Why not store raw events and query them directly?”

Querying 1B raw events per day is too slow for dashboards. Pre-aggregation reduces the query dataset by ~500x (2M aggregated rows vs 1B raw events).

4. Core Components

4.1 Click Ingestion Service

Interview context: “The ingestion layer must handle 50K clicks/sec at peak with minimal latency.”

The Challenge

Accept clicks at 50K/sec without dropping events
Validate basic fields (ad_id exists, timestamp reasonable)
Forward to Kafka quickly — this is not the place for heavy processing

Design

Client → Load Balancer → Click Receiver → Kafka

Click Receiver responsibilities:
1. Validate: ad_id exists, timestamp within ±5min
2. Deduplicate: Check click_id against Bloom filter
3. Enrich: Add server_timestamp, geo from IP
4. Produce: Send to Kafka topic (partitioned by ad_id)

Why partition by ad_id? All clicks for the same ad land on the same Kafka partition, which means the same Flink worker aggregates them. This avoids distributed counting and simplifies exactly-once semantics.

Why Not Process Clicks Synchronously?

Synchronous (API → DB)	Asynchronous (API → Kafka → Processor)
Simple	More components
DB becomes bottleneck at 50K/sec	Kafka handles 1M+ msg/sec
Single point of failure	Decoupled, resilient
Hard to add new consumers	New consumers attach to Kafka

4.2 Fraud and Duplicate Filtering

Interview context: “Before aggregating, we need to filter invalid clicks. This is critical — advertisers shouldn’t pay for fraudulent traffic.”

Types of Invalid Clicks

Type	Detection Method	Example
Duplicate clicks	click_id dedup (Bloom filter)	User double-clicked
Bot traffic	User-agent analysis, CAPTCHA signals	Automated scraper
Click farms	IP clustering, behavioral analysis	Same IP, many clicks
Self-clicking	ad_id owner detection	Publisher clicking own ads
Rapid fire	Rate limiting per user/IP	100 clicks/sec from one IP

Filtering Architecture

flowchart LR
    Raw["Raw Click<br/>Event"] --> Dedup["Dedup Filter<br/>(Bloom Filter)"]
    Dedup -->|Unique| Fraud["Fraud Filter<br/>(Rules + ML)"]
    Dedup -->|Duplicate| Filtered["Filtered<br/>(Counted separately)"]
    Fraud -->|Valid| Valid["Valid Click<br/>(Aggregated)"]
    Fraud -->|Fraudulent| Filtered

Bloom Filter for Deduplication:

In-memory, space-efficient probabilistic data structure
False positive rate of 0.1% at 1B elements ≈ 1.2 GB memory
Acceptable: A false positive means we discard a valid click (rare), but we never count a duplicate (critical)

Interviewer might ask: “Why a Bloom filter instead of a hash set?”

A hash set storing 1B click_ids (each 36 bytes UUID) = 36 GB. A Bloom filter at 0.1% false positive rate needs ~1.2 GB. At this scale, the 30x memory reduction matters.

5. Aggregation Service Deep Dive

Interview context: “This is the heart of the system. Let me explain how we aggregate clicks in real time.”

Stream Processing with Flink

flowchart TD
    subgraph KafkaSource["KAFKA SOURCE"]
        P1["Partition 1<br/>(ad_id hash 0-33%)"]
        P2["Partition 2<br/>(ad_id hash 34-66%)"]
        P3["Partition 3<br/>(ad_id hash 67-100%)"]
    end

    subgraph FlinkPipeline["FLINK PIPELINE"]
        Map1["Map: Parse +<br/>Filter"]
        Map2["Map: Parse +<br/>Filter"]
        Map3["Map: Parse +<br/>Filter"]

        Window1["Window:<br/>Tumbling 1 min"]
        Window2["Window:<br/>Tumbling 1 min"]
        Window3["Window:<br/>Tumbling 1 min"]

        Agg1["Aggregate:<br/>Count by ad_id"]
        Agg2["Aggregate:<br/>Count by ad_id"]
        Agg3["Aggregate:<br/>Count by ad_id"]
    end

    P1 --> Map1 --> Window1 --> Agg1
    P2 --> Map2 --> Window2 --> Agg2
    P3 --> Map3 --> Window3 --> Agg3

    subgraph Sink["SINK"]
        DB["Aggregation DB"]
    end

    Agg1 --> DB
    Agg2 --> DB
    Agg3 --> DB

Windowing Strategy

Window Type	Description	Use Case
Tumbling	Fixed, non-overlapping (e.g., every 1 min)	Primary aggregation
Sliding	Overlapping windows (e.g., 5 min window, slides every 1 min)	Moving averages
Session	Gap-based (close after N min of inactivity)	Per-user click sessions

We use tumbling windows (1 minute) because:

Simple, no overlap means each event counted exactly once
Aligns with minute-granularity storage
Easy to reason about for billing

Handling Late-Arriving Events

Interviewer might ask: “What if a click event arrives 3 minutes late due to network delays?”

flowchart LR
    subgraph Watermark["WATERMARK STRATEGY"]
        Event["Click Event<br/>event_time=14:02:58"]
        WM["Current Watermark<br/>= 14:03:00"]
        Late["Late? event_time < watermark"]
    end

    Late -->|"Within grace period (5 min)"| SideOutput["Side Output<br/>(Update aggregation)"]
    Late -->|"Beyond grace period"| Drop["Drop + Log<br/>(Count for monitoring)"]

Watermark + allowed lateness:

Watermark: Tracks event-time progress (e.g., “all events up to 14:03:00 have arrived”)
Grace period: 5 minutes — late events within this window trigger an aggregation update
Beyond grace: Drop the event, log it, count it for monitoring (should be < 0.01%)

Aggregation Logic (Pseudocode)

// Flink job pseudocode
clickStream
  .filter(event -> !isDuplicate(event) && !isFraud(event))
  .keyBy(event -> event.ad_id)
  .window(TumblingEventTimeWindows.of(Time.minutes(1)))
  .allowedLateness(Time.minutes(5))
  .aggregate(
    new CountAggregator(),      // in-window counting
    new WriteToAggDB()          // on window close, write result
  )
  .sideOutputLateData(lateTag)  // handle very late events

MapReduce Perspective

Interviewer might ask: “Can you explain this in terms of MapReduce?”

The aggregation is fundamentally a MapReduce operation:

MAP Phase:
  Input:  (click_id, ad_id, timestamp, ...)
  Output: (ad_id + minute_bucket) → 1

REDUCE Phase:
  Input:  (ad_id + minute_bucket) → [1, 1, 1, ..., 1]
  Output: (ad_id + minute_bucket) → count

Flink performs this continuously (stream MapReduce) rather than in batch. The keyBy(ad_id) is the shuffle/partition step, and aggregate(count) is the reduce step.

6. Key Design Decisions

6.1 Time Granularity

Interview context: “One key decision is what time granularity to aggregate at.”

Granularity	Storage/day	Query Speed	Flexibility
Per-second	~86M rows	Slow	Most flexible
Per-minute	~1.4M rows	Fast	Good balance
Per-hour	~48K rows	Very fast	Limited drill-down

We chose per-minute because: It’s a good balance — small enough for useful real-time monitoring, large enough that storage and query costs are manageable. Hourly/daily rollups are derived from minute-level data.

6.2 Push vs Pull for Aggregation Results

Approach	Pros	Cons
Pull (query on demand)	Simple, up-to-date	Higher query load
Push (pre-compute + cache)	Fast reads, low DB load	Stale data, more infra

We chose Pull with caching: Query service reads from the aggregation DB, caches hot queries (top campaigns, last hour) in Redis with a 1-minute TTL.

6.3 Database Choice for Aggregation Store

Option	Strengths	Weaknesses
ClickHouse	Extremely fast analytical queries, columnar	Mutable data harder
Apache Druid	Real-time ingestion + fast queries	Complex to operate
Cassandra	High write throughput, good for time series	Slower ad-hoc queries
TimescaleDB	Familiar SQL, good for time series	Scale ceiling

We chose ClickHouse because: Columnar storage compresses time-series aggregation data well (~10x), queries over time ranges are extremely fast, and it handles the append-heavy write pattern naturally.

Why Not a General-Purpose RDBMS?

At 2M aggregated rows per minute across multiple dimensions, the write volume (and the analytical query patterns) exceed what MySQL/PostgreSQL handle well. Columnar OLAP databases are purpose-built for this workload — they compress repeated values in columns (same ad_id across many minutes) and scan only the columns needed by each query.

Design Decisions Summary

Decision	Choice	Alternative	Rationale
Message queue	Kafka	RabbitMQ, SQS	Durable, replayable, high throughput
Stream processor	Flink	Spark Streaming, KStreams	True event-time processing, exactly-once
Aggregation DB	ClickHouse	Druid, Cassandra	Fast OLAP queries, columnar compression
Time granularity	Per-minute	Per-second, per-hour	Balance storage vs flexibility
Architecture	Lambda	Kappa (stream-only)	Batch layer for billing accuracy
Dedup method	Bloom filter	Hash set, Redis	Memory-efficient at billions of events

7. Scalability

Interview context: “Let me discuss how this system scales as click volume grows.”

Kafka Scaling

flowchart TD
    subgraph KafkaCluster["KAFKA CLUSTER"]
        subgraph Topic["ad-clicks topic"]
            P1["Partition 0"]
            P2["Partition 1"]
            P3["Partition 2"]
            PN["Partition N"]
        end
    end

    subgraph FlinkCluster["FLINK CLUSTER"]
        W1["Worker 1<br/>← Partition 0"]
        W2["Worker 2<br/>← Partition 1"]
        W3["Worker 3<br/>← Partition 2"]
        WN["Worker N<br/>← Partition N"]
    end

    P1 --> W1
    P2 --> W2
    P3 --> W3
    PN --> WN

Key insight: Kafka partitions = Flink parallelism. To scale, increase partitions and Flink workers together. Each worker processes a disjoint subset of ad_ids.

Scaling Strategies

Component	Strategy	When to Apply
Click Receivers	Horizontal scaling (stateless)	QPS > capacity per node
Kafka	Add partitions + brokers	Consumer lag increases
Flink	Add workers (1:1 with partitions)	Processing lag > 1 min
ClickHouse	Add shards (by ad_id range)	Query latency degrades
Redis cache	Cluster mode, partition by key	Cache miss rate > 10%

Hot Shard Problem

Interviewer might ask: “What if a single ad gets 1M clicks/min (viral campaign)?”

Problem: All clicks for that ad_id go to one Kafka partition → one Flink worker becomes a bottleneck.

Solutions:

Solution	How It Works	Trade-off
Salted partition key	Hash(ad_id + random_salt) spreads across partitions	Need extra aggregation step to combine
Local pre-aggregation	Each receiver counts locally, sends partial sums	Slightly delayed, but distributes load
Dedicated partition	Route hot ad_ids to dedicated workers	Operational complexity

Recommended: Local pre-aggregation in the Click Receiver. Each receiver maintains a 10-second in-memory counter per ad_id, then flushes partial aggregates to Kafka. This reduces Kafka message volume by ~100x and distributes the counting.

Without pre-aggregation:
  1M clicks/min → 1M Kafka messages → 1 Flink worker overwhelmed

With pre-aggregation (100 receivers):
  1M clicks/min → each receiver sees ~10K clicks
  → each flushes every 10s → 600 Kafka messages total
  → Flink worker handles 600 partial sums easily

8. Reliability and Data Accuracy

Interview context: “For a billing system, every click must be counted exactly once. Let me walk through how we ensure that.”

Exactly-Once Processing

flowchart LR
    subgraph E2E["END-TO-END EXACTLY-ONCE"]
        Kafka2["Kafka<br/>(Durable, replayable)"]
        Flink2["Flink<br/>(Checkpointing)"]
        Sink2["ClickHouse<br/>(Idempotent writes)"]

        Kafka2 -->|"At-least-once<br/>delivery"| Flink2
        Flink2 -->|"Checkpoint<br/>offsets + state"| Flink2
        Flink2 -->|"Idempotent<br/>upsert"| Sink2
    end

Three guarantees combined:

Kafka: Durable writes with replication factor 3. Consumer offsets committed after processing.
Flink checkpointing: Periodically snapshots processing state + Kafka offsets. On failure, restores from last checkpoint and replays from Kafka.
Idempotent sink: ClickHouse upsert with (ad_id, click_minute) as the key. Replaying the same window produces the same count → safe to retry.

Failure Scenarios

Scenario	Impact	Recovery
Click Receiver crash	Brief click loss	Load balancer routes to healthy nodes. Kafka producer retries with acks=all
Kafka broker failure	None (if RF=3)	Partition leader election, transparent to producers/consumers
Flink worker crash	Processing paused	Restore from checkpoint, replay Kafka from last committed offset
ClickHouse node failure	Query degradation	ClickHouse replicas serve reads. Writes buffer in Flink
Full datacenter outage	Major degradation	Multi-DC Kafka replication (MirrorMaker), standby Flink cluster

Data Loss Prevention

Click event lifecycle:
Click Receiver receives event
Kafka ACK (acks=all, RF=3) → Event is DURABLE ✓
Flink processes and checkpoints → State is SAFE ✓
ClickHouse writes with replication → Result is PERSISTED ✓
Raw event archived to S3 → Full history PRESERVED ✓

Key principle: Once Kafka acknowledges (step 2), the click is safe. Everything downstream can be replayed from Kafka.

9. Reconciliation

Interview context: “Even with exactly-once semantics, we run a daily reconciliation as the final safety net for billing.”

Why Reconcile?

Late events beyond the grace period
Rare Flink checkpoint edge cases
Fraud filter updates (retroactive reclassification)
Correct any streaming approximation errors

Reconciliation Pipeline

flowchart TB
    subgraph Daily["DAILY RECONCILIATION (Batch)"]
        S3["Raw Events<br/>(S3 / HDFS)"]
        Spark2["Spark Batch Job"]
        Recount["Recount from<br/>raw events"]
        Compare["Compare:<br/>Batch vs Stream"]
    end

    S3 --> Spark2
    Spark2 --> Recount
    Recount --> Compare

    Compare -->|"Match"| OK["OK ✓"]
    Compare -->|"Diff > threshold"| Alert["Alert +<br/>Auto-correct"]
    Compare -->|"Small diff"| Log["Log +<br/>Accept stream count"]

    subgraph Billing["BILLING"]
        BillingDB["Billing Database"]
    end

    Alert -->|"Use batch count"| BillingDB
    OK -->|"Use stream count"| BillingDB
    Log -->|"Use stream count"| BillingDB

Reconciliation Thresholds

Metric	Threshold	Action
Per-ad count diff	< 0.01%	Accept stream count
Per-ad count diff	0.01% - 1%	Log, use batch count for billing
Per-ad count diff	> 1%	Alert on-call, investigate
Total daily diff	> 0.1%	Escalate, potential system issue

Interviewer might ask: “What’s the source of truth for billing — the stream or the batch?”

The batch is the source of truth for billing. It processes all raw events with complete information (no late-event issues). The stream result is used for real-time dashboards only. Billing systems always wait for the reconciled batch result (T+1 day).

10. Interview Tips

Approach (45 minutes)

0-5 min:   CLARIFY REQUIREMENTS
           - What's the click volume? (affects everything)
           - Real-time, batch, or both?
           - Is this for billing? (accuracy critical)

5-10 min:  CAPACITY ESTIMATION
           - Clicks per second, storage per day
           - Number of unique ads/campaigns
           - Aggregation data volume

10-20 min: HIGH-LEVEL DESIGN
           - Draw ingestion → Kafka → stream processor → OLAP store
           - Mention Lambda architecture if billing is involved
           - Define the data model (raw events + aggregated tables)

20-35 min: DEEP DIVE (pick 2-3)
           - Stream aggregation (windowing, watermarks, late events)
           - Exactly-once semantics (Kafka + Flink checkpointing)
           - Hot shard problem and pre-aggregation
           - Fraud/duplicate filtering

35-40 min: SCALABILITY & RELIABILITY
           - Kafka partitioning = Flink parallelism
           - Failure recovery from checkpoints
           - Reconciliation for billing accuracy

40-45 min: WRAP UP
           - Summarize trade-offs
           - Mention monitoring: consumer lag, processing latency

Key Phrases That Show Depth

Instead of…	Say…
“Use Kafka”	“Partition Kafka by ad_id so all clicks for the same ad land on one partition, enabling local aggregation without distributed coordination”
“Count clicks”	“Use tumbling event-time windows in Flink with a 5-minute watermark grace period to handle late arrivals while keeping aggregation latency under 1 minute”
“Store in database”	“Write to ClickHouse with idempotent upserts keyed on (ad_id, click_minute) so replayed data from checkpoint recovery doesn’t double-count”
“Handle hot keys”	“Pre-aggregate in the Click Receiver layer — each receiver holds a 10-second local counter per ad_id and flushes partial sums, reducing Kafka volume by ~100x”
“Make it accurate”	“Lambda architecture: stream layer for real-time dashboards, batch layer reprocesses raw events from S3 daily as the billing source of truth”

Common Follow-up Questions

Question	Key Points
“How handle 10x traffic spike?”	Kafka absorbs burst (backpressure), auto-scale Flink workers, pre-aggregation reduces load
“How prevent double-counting?”	Bloom filter for click_id dedup + Flink exactly-once checkpointing + idempotent DB writes
“Stream vs batch — which to trust?”	Batch for billing (complete data), stream for dashboards (fast but approximate)
“Why not just use a database counter?”	50K writes/sec saturates any single DB. Kafka + stream processing distributes the workload
“How detect click fraud?”	Rule-based (rate limits, IP clustering) + ML-based (behavioral anomaly detection). Filter before aggregation
“What if Flink crashes?”	Restores from last checkpoint, replays Kafka from saved offsets. Idempotent sink prevents double-counting

Trade-offs to Discuss

Trade-off	Option A	Option B
Lambda vs Kappa	Lambda (batch + stream) — more accurate	Kappa (stream only) — simpler
Pre-aggregation	In receiver (reduces volume)	In Flink only (simpler, more load)
Bloom filter	Fast, memory-efficient (false positives)	Redis set — exact but expensive
Window size	1 minute (granular, more storage)	1 hour (coarse, less storage)
OLAP store	ClickHouse (fast queries)	Cassandra (easier writes)
Late event handling	Grace period (5 min)	Batch reconciliation only

11. Key Takeaways

Core Concepts

Lambda Architecture: Combine a real-time speed layer (Flink) with a batch accuracy layer (Spark) when data correctness matters for billing
Event-Time Windowing: Aggregate by event timestamp (not processing time) using watermarks to handle clock skew and network delays
Exactly-Once = At-Least-Once + Idempotency: Kafka replay + Flink checkpoints guarantee at-least-once; idempotent DB upserts eliminate duplicates
Pre-Aggregation for Hot Keys: Shift partial counting upstream (in receivers) to avoid single-partition bottlenecks in the stream processor
Bloom Filters for Scale: Use probabilistic data structures when exact answers aren’t required and the data volume is in the billions

Design Decisions Summary

Decision	Choice	Alternative	Rationale
Architecture	Lambda	Kappa	Billing needs batch accuracy
Stream processor	Flink	Spark Streaming	True event-time semantics, exactly-once
Message queue	Kafka	Pulsar, Kinesis	Durable, replayable, high throughput
OLAP database	ClickHouse	Druid, Cassandra	Columnar compression, fast time-range queries
Deduplication	Bloom filter	Redis set	30x less memory at 1B scale
Time granularity	Per-minute	Per-second, per-hour	Balance storage vs dashboard utility

Red Flags to Avoid

Don’t propose writing 50K clicks/sec directly to a relational database
Don’t forget about late-arriving events in stream processing
Don’t skip the fraud/duplicate filtering step
Don’t treat real-time aggregation as the billing source of truth
Don’t ignore the hot shard problem for popular ads
Don’t conflate processing-time and event-time windowing

References

System Design Interview Volume 2 - Alex Xu (Chapter on Ad Click Aggregation)
Apache Flink Documentation - Windowing
ClickHouse Documentation
Kafka: The Definitive Guide
Lambda Architecture - Nathan Marz

Design Ad Click Event Aggregation

Table of Contents

1. Requirements

Questions to Ask the Interviewer

Functional Requirements

Non-Functional Requirements

Capacity Estimation

2. High-Level Architecture

Component Responsibilities

Why Two Paths (Lambda Architecture)?

3. Data Model

Raw Click Event

Aggregated Click Table

Multi-Dimensional Aggregation Table

Why Star Schema for Analytics?

4. Core Components

4.1 Click Ingestion Service

The Challenge

Design

Why Not Process Clicks Synchronously?

4.2 Fraud and Duplicate Filtering

Types of Invalid Clicks

Filtering Architecture

5. Aggregation Service Deep Dive

Stream Processing with Flink

Windowing Strategy

Handling Late-Arriving Events

Aggregation Logic (Pseudocode)

MapReduce Perspective

6. Key Design Decisions

6.1 Time Granularity

6.2 Push vs Pull for Aggregation Results

6.3 Database Choice for Aggregation Store

Why Not a General-Purpose RDBMS?

Design Decisions Summary

7. Scalability

Kafka Scaling

Scaling Strategies

Hot Shard Problem

8. Reliability and Data Accuracy

Exactly-Once Processing

Failure Scenarios

Data Loss Prevention

9. Reconciliation

Why Reconcile?

Reconciliation Pipeline

Reconciliation Thresholds

10. Interview Tips

Approach (45 minutes)

Key Phrases That Show Depth

Common Follow-up Questions

Trade-offs to Discuss

11. Key Takeaways

Core Concepts

Design Decisions Summary

Red Flags to Avoid

References