System Design
Overview
Principles for designing systems that handle scale, remain available, and perform well under load.
Scalability Fundamentals
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up):
┌─────────────────────┐
│ Bigger Server │
│ - More CPU │
│ - More RAM │
│ - Faster disk │
└─────────────────────┘
Limit: Hardware ceiling
Horizontal Scaling (Scale Out):
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Server│ │Server│ │Server│ │Server│
└──────┘ └──────┘ └──────┘ └──────┘
↑
Load Balancer
Limit: Coordination complexity
Stateless Services
// ❌ Stateful - stores session in memory
class BadService {
private sessions = new Map();
login(userId: string) {
this.sessions.set(userId, { loggedIn: true });
}
}
// ✅ Stateless - external session store
class GoodService {
constructor(private sessionStore: Redis) {}
async login(userId: string) {
await this.sessionStore.set(`session:${userId}`, { loggedIn: true });
}
}
Load Balancing
Strategies
| Strategy | Description | Use Case | |----------|-------------|----------| | Round Robin | Cycle through servers | Equal capacity servers | | Weighted RR | Based on server capacity | Mixed capacity | | Least Connections | Route to least busy | Long-lived connections | | IP Hash | Same IP → same server | Session stickiness | | URL Hash | Same URL → same server | Cache optimization |
Health Checks
# Kubernetes-style health checks
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
// Health check endpoints
app.get('/health/live', (req, res) => {
// Am I running?
res.status(200).json({ status: 'alive' });
});
app.get('/health/ready', async (req, res) => {
// Can I serve traffic?
const dbOk = await checkDatabase();
const cacheOk = await checkCache();
if (dbOk && cacheOk) {
res.status(200).json({ status: 'ready' });
} else {
res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk });
}
});
Caching Strategies
Cache Patterns
Cache-Aside (Lazy Loading):
┌────────┐ miss ┌────────┐
│ App │ ──────────→ │ Cache │
│ │ ←────────── │ │
└────────┘ null └────────┘
│
│ read
↓
┌────────┐
│ DB │ ──── write ──→ Cache
└────────┘
Write-Through:
App → Cache → DB (synchronous)
Write-Behind (Write-Back):
App → Cache → (async) → DB
// Cache-aside implementation
class CachedUserService {
constructor(
private cache: Redis,
private db: Database
) {}
async getUser(id: string): Promise<User> {
// Try cache first
const cached = await this.cache.get(`user:${id}`);
if (cached) return JSON.parse(cached);
// Cache miss - read from DB
const user = await this.db.users.findById(id);
if (user) {
// Store in cache with TTL
await this.cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
}
return user;
}
async updateUser(id: string, data: Partial<User>): Promise<User> {
const user = await this.db.users.update(id, data);
// Invalidate cache
await this.cache.del(`user:${id}`);
return user;
}
}
Cache Invalidation
| Strategy | Description | Complexity | |----------|-------------|------------| | TTL | Expire after time | Simple | | Event-based | Invalidate on write | Medium | | Version-based | Key includes version | Medium | | Tag-based | Group related keys | Complex |
Database Scaling
Read Replicas
┌─────────────────┐
Writes ──────→ │ Primary DB │
└────────┬────────┘
│ replication
┌─────────────────┼─────────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Replica 1│ │ Replica 2│ │ Replica 3│
└──────────┘ └──────────┘ └──────────┘
↑ ↑ ↑
└─────── Reads ───┴────────────────┘
Sharding (Partitioning)
// Hash-based sharding
function getShard(userId: string, numShards: number): number {
const hash = crypto.createHash('md5').update(userId).digest('hex');
return parseInt(hash.slice(0, 8), 16) % numShards;
}
// Range-based sharding
function getShardByDate(date: Date): string {
const year = date.getFullYear();
const month = date.getMonth() + 1;
return `orders_${year}_${month.toString().padStart(2, '0')}`;
}
// Consistent hashing for dynamic shards
class ConsistentHash {
private ring: Map<number, string> = new Map();
addNode(node: string) {
for (let i = 0; i < 150; i++) { // Virtual nodes
const hash = this.hash(`${node}:${i}`);
this.ring.set(hash, node);
}
}
getNode(key: string): string {
const hash = this.hash(key);
// Find next node on ring
for (const [nodeHash, node] of [...this.ring.entries()].sort()) {
if (nodeHash >= hash) return node;
}
return this.ring.values().next().value;
}
}
Message Queues
Patterns
Point-to-Point:
Producer → Queue → Consumer
Pub/Sub:
┌─→ Subscriber 1
Publisher → Topic ─→ Subscriber 2
└─→ Subscriber 3
Work Queue:
┌─→ Worker 1
Producer → Queue ─→ Worker 2 (competing consumers)
└─→ Worker 3
Delivery Guarantees
| Guarantee | Description | Implementation | |-----------|-------------|----------------| | At-most-once | May lose messages | Fire and forget | | At-least-once | May duplicate | Ack after process | | Exactly-once | No loss, no dupe | Idempotency + dedup |
// Idempotent processing
async function processOrder(event: OrderEvent) {
// Check if already processed
const processed = await redis.get(`processed:${event.id}`);
if (processed) {
console.log(`Already processed ${event.id}`);
return;
}
// Process the order
await db.orders.create(event.order);
// Mark as processed (with TTL for cleanup)
await redis.set(`processed:${event.id}`, '1', 'EX', 86400 * 7);
}
High Availability
Redundancy Patterns
Active-Active:
┌────────┐ ┌────────┐
│Server A│ ←─→ │Server B│ Both handle traffic
└────────┘ └────────┘
Active-Passive:
┌────────┐ ┌────────┐
│ Active │ ──→ │Standby │ Failover on failure
└────────┘ └────────┘
Circuit Breaker
class CircuitBreaker {
private failures = 0;
private lastFailure: Date | null = null;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 5,
private timeout: number = 30000
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailure!.getTime() > this.timeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit is open');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = 'closed';
}
private onFailure() {
this.failures++;
this.lastFailure = new Date();
if (this.failures >= this.threshold) {
this.state = 'open';
}
}
}
CAP Theorem
Consistency
/\
/ \
/ \
/ \
/ CA \
/──────────\
/ \
/ CP AP \
/________________\
Partition Availability
Tolerance
CA: Single node (RDBMS)
CP: MongoDB, HBase (may reject writes during partition)
AP: Cassandra, DynamoDB (eventual consistency)
Consistency Models
| Model | Description | Example | |-------|-------------|---------| | Strong | Read sees latest write | RDBMS | | Eventual | Eventually consistent | DNS, Cassandra | | Causal | Respects causality | Chat apps | | Read-your-writes | See your own writes | Social feeds |
Rate Limiting
Algorithms
// Token Bucket
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private capacity: number,
private refillRate: number // tokens per second
) {
this.tokens = capacity;
this.lastRefill = Date.now();
}
consume(tokens: number = 1): boolean {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true;
}
return false;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
}
// Sliding Window
class SlidingWindowRateLimiter {
constructor(
private redis: Redis,
private limit: number,
private window: number // seconds
) {}
async isAllowed(key: string): Promise<boolean> {
const now = Date.now();
const windowStart = now - this.window * 1000;
const pipe = this.redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now, `${now}`);
pipe.zcard(key);
pipe.expire(key, this.window);
const results = await pipe.exec();
const count = results[2][1] as number;
return count <= this.limit;
}
}
Common System Designs
| System | Key Components | |--------|----------------| | URL Shortener | Hash function, Redis cache, DB | | Twitter Feed | Fan-out, Redis timeline, Kafka | | Chat App | WebSocket, Presence, Message queue | | E-commerce | Cart service, Inventory, Payment | | Video Streaming | CDN, Chunking, Adaptive bitrate |
Related Skills
- [[architecture-patterns]] - Microservices, event-driven
- [[database]] - Database optimization
- [[caching-implementation]] - Cache strategies
- [[reliability-engineering]] - SRE practices