System Design Skill
Quick Reference
| Pattern | Best For | Complexity | Scaling | |---------|----------|------------|---------| | Monolith | Startups, MVPs | Low | Limited | | Microservices | Large teams | High | Excellent | | Serverless | Event-driven | Medium | Auto | | Event-Driven | High throughput | High | Excellent |
Scalability Progression
Level 1: Single Server
│
▼ Bottleneck: CPU/Memory
Level 2: Load Balancer + Multiple Servers
│
▼ Bottleneck: Database reads
Level 3: Caching Layer (Redis)
│
▼ Bottleneck: Database writes
Level 4: Read Replicas
│
▼ Bottleneck: Single DB limits
Level 5: Sharding / Partitioning
│
▼ Bottleneck: Cross-shard queries
Level 6: CQRS + Event Sourcing
Architecture Decision Tree
What's your team size and product stage?
│
├─► Team < 10, product unclear
│ └─► Monolith (start simple)
│
├─► Team > 10, clear domain boundaries
│ └─► Microservices
│
├─► Variable workloads, pay-per-use
│ └─► Serverless
│
└─► High throughput, async workflows
└─► Event-Driven
API Design
REST Best Practices
GET /api/v1/users # List
GET /api/v1/users/{id} # Get
POST /api/v1/users # Create
PUT /api/v1/users/{id} # Replace
PATCH /api/v1/users/{id} # Update
DELETE /api/v1/users/{id} # Delete
GET /api/v1/users/{id}/orders # Nested
HTTP Status Codes
| Code | Meaning | Use When | |------|---------|----------| | 200 | OK | GET/PUT/PATCH success | | 201 | Created | POST success | | 204 | No Content | DELETE success | | 400 | Bad Request | Invalid input | | 401 | Unauthorized | No/invalid auth | | 403 | Forbidden | No permission | | 404 | Not Found | Resource missing | | 429 | Too Many Requests | Rate limited | | 500 | Server Error | Server failure |
Database Selection
| Use Case | Best Choice | Notes | |----------|-------------|-------| | Transactions | PostgreSQL | ACID, most versatile | | High write | Cassandra | Write-optimized | | Caching | Redis | Sub-millisecond | | Search | Elasticsearch | Full-text search | | Analytics | BigQuery | Column-store | | Time-series | TimescaleDB | Time-based data | | Graph | Neo4j | Relationships |
Security: OWASP Top 10 (2025)
| # | Vulnerability | Prevention | |---|---------------|------------| | 1 | Broken Access Control | Verify auth on every request | | 2 | Cryptographic Failures | TLS 1.3, AES-256, Argon2 | | 3 | Injection | Parameterized queries | | 4 | Insecure Design | Threat modeling | | 5 | Security Misconfiguration | Harden defaults | | 6 | Vulnerable Components | Dependency scanning | | 7 | Auth Failures | MFA, rate limiting | | 8 | Data Integrity | Sign data, verify sources | | 9 | Logging Failures | Comprehensive logging | | 10 | SSRF | Allowlist URLs |
Encryption Standards
| Layer | Standard | Notes | |-------|----------|-------| | In Transit | TLS 1.3 | HTTPS everywhere | | At Rest | AES-256 | Encrypt sensitive data | | Passwords | Argon2id | bcrypt acceptable | | API Keys | SHA-256 | Store hashed |
Threat Modeling: STRIDE
┌─────────────────────────────────────────┐
│ STRIDE MODEL │
├─────────────────────────────────────────┤
│ S - Spoofing │
│ → Strong auth, MFA │
│ │
│ T - Tampering │
│ → Integrity checks, signatures │
│ │
│ R - Repudiation │
│ → Audit logging │
│ │
│ I - Information Disclosure │
│ → Encryption, access control │
│ │
│ D - Denial of Service │
│ → Rate limiting, DDoS protection │
│ │
│ E - Elevation of Privilege │
│ → Least privilege, RBAC │
└─────────────────────────────────────────┘
Compliance Requirements
| Standard | Domain | Key Requirements | |----------|--------|------------------| | GDPR | EU Data | Consent, right to delete | | HIPAA | Healthcare | PHI encryption, audit logs | | SOC 2 | Services | Security controls | | PCI DSS | Payments | Card data protection | | CCPA | CA Privacy | Consumer rights |
Disaster Recovery
| Strategy | RTO | RPO | Cost | |----------|-----|-----|------| | Backup/Restore | Hours | Hours | Low | | Pilot Light | 10s min | Minutes | Medium | | Warm Standby | Minutes | Seconds | High | | Active-Active | Seconds | Zero | Very High |
Troubleshooting
System not scaling?
├─► Database bottleneck? → Add caching, replicas
├─► Single point of failure? → Add redundancy
├─► Stateful services? → Make stateless
└─► Network limits? → CDN, optimize payloads
Security incident response?
├─► 1. CONTAIN: Isolate affected systems
├─► 2. IDENTIFY: Scope and entry point
├─► 3. ERADICATE: Remove threat, patch
├─► 4. RECOVER: Restore from clean backup
└─► 5. LEARN: Post-mortem, improve
Common Failure Modes
| Symptom | Root Cause | Recovery | |---------|------------|----------| | Cascading failures | Tight coupling | Circuit breakers | | Works locally | Env differences | Containers, IaC | | Data breach | Missing controls | Audit, RBAC | | Audit failed | Missing compliance | Gap analysis |
Next Actions
Describe your system requirements for architecture recommendations.