A cloud file storage and synchronization service that allows users to store files, sync across devices, share with others, and collaborate in real-time.
“Before diving into the design, let me clarify the requirements to make sure I understand the scope correctly.”
| Requirement | Details |
|---|---|
| File upload/download | Upload files from any device, download anytime |
| File sync | Automatic synchronization across all user devices |
| File sharing | Share files/folders with specific users or via public links |
| File versioning | Track changes, restore previous versions |
| Notifications | Notify users of changes to shared files |
| Offline support | Access and edit files without internet, sync when online |
| Requirement | Target |
|---|---|
| Availability | 99.99% uptime (< 52 min/year downtime) |
| Durability | 99.999999999% (11 nines) - never lose user data |
| Latency | < 500ms for metadata operations, upload speed limited by network |
| Consistency | Eventual consistency for sync, strong for sharing permissions |
| Scale | 500M users, 100M DAU |
Interview tip: Always clarify scope. Google Drive is complex - focusing on core storage and sync is appropriate for a 45-minute interview.
“Let me estimate the scale to inform our design decisions.”
| Metric | Estimate |
|---|---|
| Total users | 500M |
| Daily active users (DAU) | 100M (20%) |
| Average files per user | 500 |
| Total files | 250B files |
| Operation | Calculation | Result |
|---|---|---|
| Uploads per day | 100M DAU × 2 uploads | 200M/day |
| Upload QPS | 200M / 86,400 | ~2,300 QPS |
| Peak upload QPS | 2,300 × 3 | ~7,000 QPS |
| Downloads per day | 100M DAU × 5 downloads | 500M/day |
| Download QPS | 500M / 86,400 | ~5,800 QPS |
| Sync checks per day | 100M DAU × 100 checks | 10B/day |
| Sync QPS | 10B / 86,400 | ~115,000 QPS |
| Data Type | Calculation | Result |
|---|---|---|
| Average file size | Mixed (docs, images, videos) | 1 MB |
| Storage per user | 500 files × 1 MB | 500 MB |
| Total storage | 500M users × 500 MB | 250 PB |
| Daily new storage | 200M uploads × 1 MB | 200 TB/day |
| Metadata per file | File info, versions, permissions | ~1 KB |
| Total metadata | 250B files × 1 KB | 250 TB |
| Direction | Calculation | Result |
|---|---|---|
| Upload bandwidth | 200M × 1 MB / 86,400 | ~23 Gbps |
| Download bandwidth | 500M × 1 MB / 86,400 | ~58 Gbps |
| Peak bandwidth | 81 Gbps × 3 | ~240 Gbps |
Key insight: The sync check QPS (115K) is much higher than upload/download. We need to optimize sync operations heavily.
“Let me start with the high-level architecture and then we can dive into specific components.”
flowchart TB
subgraph Clients["Client Devices"]
WEB[Web Browser]
DESKTOP[Desktop Client]
MOBILE[Mobile App]
end
subgraph Edge["Edge Layer"]
CDN[CDN]
LB[Load Balancer]
end
subgraph API["API Layer"]
APIGW[API Gateway]
AUTH[Auth Service]
end
subgraph Services["Core Services"]
META[Metadata Service]
UPLOAD[Upload Service]
DOWNLOAD[Download Service]
SYNC[Sync Service]
SHARE[Sharing Service]
NOTIFY[Notification Service]
end
subgraph Storage["Storage Layer"]
METADB[(Metadata DB)]
BLOCKSTORE[Block Storage]
CACHE[(Redis Cache)]
MQ[Message Queue]
end
WEB --> CDN
DESKTOP --> LB
MOBILE --> LB
CDN --> LB
LB --> APIGW
APIGW --> AUTH
APIGW --> META
APIGW --> UPLOAD
APIGW --> DOWNLOAD
APIGW --> SYNC
APIGW --> SHARE
META --> METADB
META --> CACHE
UPLOAD --> BLOCKSTORE
UPLOAD --> MQ
DOWNLOAD --> BLOCKSTORE
DOWNLOAD --> CDN
SYNC --> CACHE
SYNC --> MQ
SHARE --> METADB
NOTIFY --> MQ
POST /api/v1/files/upload
Headers:
Authorization: Bearer <token>
Content-Type: multipart/form-data
X-Upload-Id: <resumable-upload-id> // For resumable uploads
Body:
- file: binary
- parent_folder_id: string
- filename: string
Response:
{
"file_id": "f_abc123",
"name": "document.pdf",
"size": 1048576,
"checksum": "sha256:abc...",
"created_at": "2024-01-15T10:30:00Z",
"version": 1
}
GET /api/v1/files/{file_id}/download
GET /api/v1/files/{file_id}/download?version=3 // Specific version
Headers:
Authorization: Bearer <token>
Response:
- 302 Redirect to signed CDN URL (for large files)
- 200 with file content (for small files)
POST /api/v1/sync
Headers:
Authorization: Bearer <token>
Body:
{
"device_id": "device_123",
"last_sync_cursor": "cursor_abc",
"local_changes": [
{
"path": "/docs/file.txt",
"checksum": "sha256:...",
"modified_at": "2024-01-15T10:30:00Z",
"action": "modify"
}
]
}
Response:
{
"server_changes": [...],
"conflicts": [...],
"new_cursor": "cursor_def"
}
POST /api/v1/files/{file_id}/share
Headers:
Authorization: Bearer <token>
Body:
{
"recipients": [
{"email": "user@example.com", "role": "editor"},
{"email": "other@example.com", "role": "viewer"}
],
"link_settings": {
"enabled": true,
"access": "anyone_with_link",
"role": "viewer",
"expiry": "2024-02-15T00:00:00Z"
}
}
“Let me walk through the upload flow, which is one of the most critical paths.”
| Approach | Pros | Cons |
|---|---|---|
| Single upload | Simple | Fails completely on network error |
| Chunked upload | Resumable, parallel, efficient | More complex |
| Streaming | Low memory | Hard to retry |
Decision: Chunked upload with 4MB chunks for reliability and parallelism.
sequenceDiagram
participant C as Client
participant API as API Gateway
participant US as Upload Service
participant BS as Block Storage
participant MS as Metadata Service
participant MQ as Message Queue
C->>API: POST /upload/init (filename, size, checksum)
API->>US: Initialize upload
US->>MS: Create pending file record
MS-->>US: file_id
US-->>C: upload_id, chunk_urls[]
par Parallel chunk uploads
C->>BS: PUT chunk_1 (presigned URL)
C->>BS: PUT chunk_2 (presigned URL)
C->>BS: PUT chunk_3 (presigned URL)
end
C->>API: POST /upload/complete (upload_id, chunk_checksums)
API->>US: Finalize upload
US->>BS: Verify all chunks
US->>MS: Update file status to 'active'
US->>MQ: Publish FileUploaded event
US-->>C: Success (file metadata)
“To save storage, we deduplicate at the block level, not file level.”
flowchart LR
subgraph Client["Client Side"]
FILE[File] --> CHUNK[Split into 4MB chunks]
CHUNK --> HASH[Calculate SHA-256 per chunk]
end
subgraph Server["Server Side"]
HASH --> CHECK{Block exists?}
CHECK -->|Yes| SKIP[Skip upload, reference existing]
CHECK -->|No| UPLOAD[Upload block]
UPLOAD --> STORE[(Block Storage)]
SKIP --> META[Update metadata]
STORE --> META
end
Deduplication benefits:
Example:
File A: [Block1][Block2][Block3][Block4]
File B: [Block1][Block2][Block5][Block4] // Modified file
Storage: Only 5 unique blocks stored, not 8
“Sync is the most challenging part. We need to handle multiple devices editing the same file.”
| Strategy | Mechanism | Pros | Cons |
|---|---|---|---|
| Polling | Client checks every N seconds | Simple | High QPS, delayed updates |
| Long polling | Server holds request until change | Lower QPS | Connection overhead |
| WebSocket | Persistent connection | Real-time, efficient | Stateful connections |
| Delta sync | Only sync differences | Bandwidth efficient | Complex |
Decision: WebSocket for real-time notifications + Delta sync for efficient transfers.
flowchart TB
subgraph Devices["User Devices"]
D1[Desktop]
D2[Laptop]
D3[Phone]
end
subgraph Sync["Sync Infrastructure"]
WSS[WebSocket Servers]
SS[Sync Service]
subgraph State["State Management"]
REDIS[(Redis)]
CURSOR[Cursor Store]
end
end
subgraph Storage["Storage"]
META[(Metadata DB)]
BLOCKS[(Block Storage)]
end
D1 <-->|WebSocket| WSS
D2 <-->|WebSocket| WSS
D3 <-->|WebSocket| WSS
WSS --> SS
SS --> REDIS
SS --> CURSOR
SS --> META
SS --> BLOCKS
Each device maintains a cursor (checkpoint) of its last sync position:
{
"device_id": "desktop_123",
"user_id": "user_456",
"cursor": "1705312200_seq_99847",
"last_sync": "2024-01-15T10:30:00Z"
}
Sync flow:
“What happens when two devices edit the same file offline?”
stateDiagram-v2
[*] --> Detecting: Changes received
Detecting --> NoConflict: Timestamps don't overlap
Detecting --> Conflict: Same file modified
NoConflict --> Apply: Apply changes
Apply --> [*]
Conflict --> Strategy: Determine resolution
Strategy --> LastWriteWins: Non-critical files
Strategy --> CreateCopy: Important files
Strategy --> UserChoice: Manual resolution
LastWriteWins --> Apply
CreateCopy --> Apply
UserChoice --> UserInput
UserInput --> Apply
Conflict strategies:
| Strategy | When to use | Example |
|---|---|---|
| Last-write-wins | Low-risk files, single user | Cache files |
| Create conflict copy | Important files | “document (conflicted copy).txt” |
| Merge | Text files with no overlapping edits | Code files |
| User decision | Critical conflicts | Prompt user to choose |
“Metadata operations are read-heavy. Let’s design the data model.”
erDiagram
USER ||--o{ FILE : owns
USER ||--o{ DEVICE : has
FILE ||--o{ FILE_VERSION : has
FILE ||--o{ FILE_BLOCK : contains
FILE ||--o{ SHARING : shared_via
FOLDER ||--o{ FILE : contains
FOLDER ||--o{ FOLDER : contains
USER {
string user_id PK
string email
string name
bigint storage_quota
bigint storage_used
timestamp created_at
}
FILE {
string file_id PK
string user_id FK
string folder_id FK
string name
bigint size
string checksum
int current_version
string status
timestamp created_at
timestamp modified_at
}
FILE_VERSION {
string version_id PK
string file_id FK
int version_number
bigint size
string checksum
string modified_by
timestamp created_at
}
FILE_BLOCK {
string block_id PK
string file_id FK
int version_number
int block_index
string block_hash
bigint block_size
}
FOLDER {
string folder_id PK
string user_id FK
string parent_id FK
string name
timestamp created_at
}
DEVICE {
string device_id PK
string user_id FK
string device_name
string device_type
string sync_cursor
timestamp last_sync
}
SHARING {
string share_id PK
string file_id FK
string shared_with
string role
string share_type
timestamp expires_at
}
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| PostgreSQL | ACID, relations, mature | Scaling complexity | Good for metadata |
| MongoDB | Flexible schema, easy scaling | Weaker consistency | Could work |
| Sharded PostgreSQL | Best of both | Operational complexity | Best choice |
Sharding strategy: Shard by user_id to keep all user’s files on same shard.
flowchart LR
subgraph Read["Read Path"]
REQ[Request] --> L1[L1: Local Cache]
L1 -->|Miss| L2[L2: Redis Cluster]
L2 -->|Miss| DB[(PostgreSQL)]
DB --> L2
L2 --> L1
end
subgraph Invalidation["Cache Invalidation"]
WRITE[Write] --> DB2[(PostgreSQL)]
DB2 --> PUB[Pub/Sub]
PUB --> INV1[Invalidate L2]
PUB --> INV2[Invalidate L1s]
end
Cache policies:
| Data | TTL | Invalidation |
|---|---|---|
| User metadata | 1 hour | On profile update |
| File metadata | 5 minutes | On any file change |
| Folder structure | 10 minutes | On folder change |
| Sharing permissions | 1 minute | On permission change |
“Let’s discuss how we store the actual file content.”
flowchart TB
subgraph Upload["Upload Path"]
CLIENT[Client] --> UPLOAD[Upload Service]
UPLOAD --> DEDUP{Deduplicate}
DEDUP -->|New block| ENCRYPT[Encrypt]
DEDUP -->|Exists| REF[Reference existing]
ENCRYPT --> COMPRESS[Compress]
COMPRESS --> STORE[Store]
end
subgraph Storage["Storage Tiers"]
STORE --> HOT[Hot: SSD]
HOT -->|After 30 days| WARM[Warm: HDD]
WARM -->|After 90 days| COLD[Cold: Glacier]
end
subgraph Durability["Durability"]
HOT --> REP1[Replica 1]
HOT --> REP2[Replica 2]
HOT --> REP3[Replica 3]
end
| Tier | Storage Type | Access Time | Cost | Use Case |
|---|---|---|---|---|
| Hot | SSD | < 10ms | $$$ | Recently accessed |
| Warm | HDD | < 100ms | $$ | Older files |
| Cold | Glacier | Minutes-hours | $ | Archived/deleted |
Block Storage Layout:
/blocks/
/{hash_prefix}/ # First 2 chars of hash (256 buckets)
/{block_hash} # Full SHA-256 hash
- data # Encrypted, compressed block
- metadata.json # Size, references, created_at
Example:
/blocks/a7/a7b3c4d5e6f7...
- data: [encrypted binary]
- metadata.json: {
"size": 4194304,
"compressed_size": 3145728,
"reference_count": 47,
"created_at": "2024-01-15T10:30:00Z"
}
“Sharing is more complex than it appears. We need to handle permissions, links, and notifications.”
flowchart TB
subgraph Permissions["Permission Hierarchy"]
OWNER[Owner] --> EDITOR[Editor]
EDITOR --> COMMENTER[Commenter]
COMMENTER --> VIEWER[Viewer]
end
subgraph Capabilities["Capabilities"]
OWNER --> |Can| O1[Delete]
OWNER --> |Can| O2[Share]
OWNER --> |Can| O3[Transfer ownership]
EDITOR --> |Can| E1[Edit]
EDITOR --> |Can| E2[Add files]
COMMENTER --> |Can| C1[Comment]
VIEWER --> |Can| V1[View]
VIEWER --> |Can| V2[Download]
end
| Type | Description | Use Case |
|---|---|---|
| Direct share | Share with specific user | Team collaboration |
| Link (restricted) | Anyone with link + account | Semi-private sharing |
| Link (public) | Anyone with link | Public distribution |
| Domain share | Anyone in organization | Enterprise |
sequenceDiagram
participant U1 as Owner
participant API as API Gateway
participant SS as Sharing Service
participant MS as Metadata Service
participant NS as Notification Service
participant U2 as Recipient
U1->>API: Share file with user@example.com
API->>SS: Create share
SS->>MS: Verify file ownership
MS-->>SS: Confirmed
SS->>SS: Create sharing record
SS->>NS: Send notification
NS->>U2: Email: "File shared with you"
SS-->>U1: Share successful
U2->>API: Access shared file
API->>SS: Check permissions
SS-->>API: Viewer access granted
API-->>U2: File content
“Users need to know when files change. Let’s design real-time notifications.”
flowchart TB
subgraph Events["Event Sources"]
UPLOAD[File Upload]
EDIT[File Edit]
SHARE[File Shared]
COMMENT[Comment Added]
end
subgraph Processing["Event Processing"]
MQ[Message Queue]
PROCESSOR[Notification Processor]
ROUTER[Recipient Router]
end
subgraph Delivery["Delivery Channels"]
PUSH[Push Notification]
EMAIL[Email]
WS[WebSocket]
INAPP[In-App]
end
UPLOAD --> MQ
EDIT --> MQ
SHARE --> MQ
COMMENT --> MQ
MQ --> PROCESSOR
PROCESSOR --> ROUTER
ROUTER --> PUSH
ROUTER --> EMAIL
ROUTER --> WS
ROUTER --> INAPP
| Event Type | Default Channel | User Configurable |
|---|---|---|
| File shared with me | Email + Push | Yes |
| Comment on my file | Push + In-app | Yes |
| File edited | In-app only | Yes |
| Storage warning | No |
“Let me discuss how this system scales to handle 500M users.”
flowchart TB
subgraph Global["Global Architecture"]
DNS[DNS] --> GLB[Global Load Balancer]
GLB --> R1[Region: US]
GLB --> R2[Region: EU]
GLB --> R3[Region: Asia]
end
subgraph Region["Per Region"]
LB[Load Balancer]
subgraph Services["Stateless Services"]
API1[API 1]
API2[API 2]
APIn[API N]
end
subgraph Data["Data Layer"]
SHARD1[(Shard 1)]
SHARD2[(Shard 2)]
SHARDN[(Shard N)]
end
end
R1 --> LB
LB --> API1
LB --> API2
LB --> APIn
API1 --> SHARD1
API2 --> SHARD2
APIn --> SHARDN
| Component | Strategy | Capacity per Unit |
|---|---|---|
| API Gateway | Horizontal, auto-scale | 10K req/s |
| Metadata Service | Horizontal + sharding | 50K req/s |
| Upload Service | Horizontal | 1K uploads/s |
| Block Storage | Distributed (S3-like) | Unlimited |
| Sync Service | Horizontal + partitioned | 100K connections |
| WebSocket Servers | Horizontal, sticky | 50K connections |
flowchart TB
subgraph Sharding["Shard by User ID"]
ROUTER[Shard Router]
ROUTER --> S1[Shard 1: Users A-D]
ROUTER --> S2[Shard 2: Users E-H]
ROUTER --> S3[Shard 3: Users I-L]
ROUTER --> SN[Shard N: Users ...]
end
subgraph Benefits["Benefits"]
B1[User's files co-located]
B2[No cross-shard queries]
B3[Easy to add shards]
end
Shard key: user_id
flowchart TB
subgraph CDN["CDN Distribution"]
ORIGIN[Origin: Block Storage]
ORIGIN --> POP1[PoP: Americas]
ORIGIN --> POP2[PoP: Europe]
ORIGIN --> POP3[PoP: Asia]
ORIGIN --> POP4[PoP: Africa]
end
subgraph Caching["Cache Strategy"]
POP1 --> HOT1[Hot files cached]
HOT1 --> USERS1[Local users]
end
CDN policies:
“For a file storage system, data durability is non-negotiable.”
flowchart TB
subgraph Write["Write Path"]
BLOCK[New Block] --> W1[Write to Primary]
W1 --> W2[Sync to Replica 1]
W1 --> W3[Sync to Replica 2]
W2 --> ACK{All replicas ACK?}
W3 --> ACK
ACK -->|Yes| SUCCESS[Return success]
ACK -->|No| RETRY[Retry/Fail]
end
subgraph Distribution["Geographic Distribution"]
PRIMARY[Primary: US-West]
REP1[Replica 1: US-East]
REP2[Replica 2: EU-West]
end
| Scenario | Protection |
|---|---|
| Disk failure | 3 replicas per block |
| Server failure | Blocks distributed across servers |
| Rack failure | Replicas in different racks |
| Data center failure | Cross-region replication |
| Bit rot | Periodic checksum verification |
| Operation | Consistency | Why |
|---|---|---|
| File upload | Strong | Must confirm write |
| File read | Strong | Must return latest |
| Sync cursor | Eventual | Performance |
| Sharing permissions | Strong | Security critical |
| Usage stats | Eventual | Not critical |
stateDiagram-v2
[*] --> Normal: System healthy
Normal --> Degraded: Replica failure
Degraded --> Normal: Replica recovered
Degraded --> Critical: Multiple failures
Normal --> Failover: Primary failure
Failover --> Normal: Failover complete
Critical --> Recovery: Manual intervention
Recovery --> Degraded: Partial recovery
sequenceDiagram
participant C as Client
participant US as Upload Service
participant BS as Block Storage
C->>US: Upload chunk 5 (fails)
US-->>C: Error: timeout
Note over C: Client retries
C->>US: Resume upload (upload_id)
US->>BS: Check existing chunks
BS-->>US: Chunks 1-4 complete
US-->>C: Resume from chunk 5
C->>BS: Upload chunk 5
C->>BS: Upload chunk 6
C->>US: Complete upload
| Layer | Method | Key Management |
|---|---|---|
| In transit | TLS 1.3 | Auto-rotated certs |
| At rest | AES-256 | Per-user keys in KMS |
| Block level | AES-256 | Content-derived keys |
flowchart TB
REQ[Request] --> AUTH[Authentication]
AUTH --> AUTHZ[Authorization]
AUTHZ --> OWNER{Is owner?}
OWNER -->|Yes| ALLOW[Allow]
OWNER -->|No| SHARE{Has share?}
SHARE -->|Yes| PERM{Check permission}
SHARE -->|No| DENY[Deny]
PERM -->|Sufficient| ALLOW
PERM -->|Insufficient| DENY
| Metric | Alert Threshold | Action |
|---|---|---|
| Upload latency P99 | > 5s | Scale upload service |
| Sync lag | > 30s | Investigate sync service |
| Storage utilization | > 80% | Add storage capacity |
| Error rate | > 0.1% | Page on-call |
| Replication lag | > 1 minute | Critical alert |
| Strategy | Savings | Trade-off |
|---|---|---|
| Block deduplication | 40-60% | CPU for hashing |
| Compression | 20-30% | CPU overhead |
| Tiered storage | 50-70% | Access latency |
| Regional storage | 30-40% | Cross-region access |
| Question | Key Points |
|---|---|
| “How do you handle large files?” | Chunked upload, parallel transfers, resumable |
| “What about conflicts?” | Cursor-based sync, conflict copies, user resolution |
| “How do you ensure durability?” | 3 replicas, cross-region, checksums |
| “How does sharing work?” | ACL model, signed URLs, permission inheritance |
| “What about versioning?” | Block-level versioning, efficient storage |
| Decision | Trade-off | Reasoning |
|---|---|---|
| Chunk size (4MB) | Parallelism vs. overhead | Balance of resume granularity and request overhead |
| Sync strategy | Latency vs. bandwidth | WebSocket for speed, delta for efficiency |
| Consistency model | Performance vs. correctness | Strong for writes, eventual for sync |
| Deduplication scope | Storage vs. privacy | User-level only for privacy |
| Situation | Phrase |
|---|---|
| Starting upload design | “Let me walk through the upload flow, which is critical for user experience…” |
| Discussing sync | “Sync is the most challenging part because we need to handle offline edits and conflicts…” |
| Addressing durability | “For a file storage system, data durability is non-negotiable. Let me explain our strategy…” |
| Trade-off moment | “We have a trade-off here between storage cost and privacy. I’d lean toward…” |
flowchart TB
subgraph Client["Client Layer"]
WEB[Web]
DESKTOP[Desktop]
MOBILE[Mobile]
end
subgraph Edge["Edge"]
CDN[CDN]
LB[Load Balancer]
end
subgraph API["API Layer"]
GW[API Gateway]
AUTH[Auth]
end
subgraph Core["Core Services"]
META[Metadata]
UPLOAD[Upload]
DOWNLOAD[Download]
SYNC[Sync]
SHARE[Share]
NOTIFY[Notify]
end
subgraph Data["Data Layer"]
METADB[(Metadata DB)]
BLOCKS[(Block Storage)]
CACHE[(Cache)]
MQ[Queue]
end
Client --> Edge
Edge --> API
API --> Core
Core --> Data
| Decision | Choice | Rationale |
|---|---|---|
| Upload strategy | Chunked, resumable | Reliability for large files |
| Storage | Block-level dedup | 40-60% storage savings |
| Sync | WebSocket + delta | Real-time + efficient |
| Consistency | Strong for writes | Data integrity |
| Durability | 3 replicas, cross-region | 11 nines requirement |
| Metric | Value |
|---|---|
| Users | 500M |
| Files | 250B |
| Storage | 250 PB |
| Upload QPS | 2.3K (7K peak) |
| Sync QPS | 115K |
| Durability | 99.999999999% |