System Architecture Skill

System Architecture

Design systems for change. Every architecture decision answers: "How will this scale and evolve?"

Core Principle

There are no best practices—only trade-offs in context. The best architecture is the simplest one that meets current needs while enabling future growth.

Critical Rules

| Rule | Enforcement | |------|-------------| | Trade-offs over absolutes | No "best" - only "best for this context" | | Simplicity that scales | Earn complexity, don't assume it | | Decisions with rationale | ADRs for significant choices | | Boundaries and contracts | Enable teams to move independently |

Architecture Evaluation

When evaluating architecture, always:

Understand context first
- Business requirements
- Team capabilities
- Constraints (time, budget, skills)
Identify 2-3 valid approaches
- Never present only one option
Analyze trade-offs explicitly
- What do you gain?
- What do you give up?
Think long-term
- What will be hard to change later?
Document the decision
- Use ADR format

Architecture Patterns

When to Use What

| Pattern | When | Trade-offs | |---------|------|------------| | Monolith | Small team, unclear domain boundaries, speed matters | Simple deployment, harder to scale teams | | Modular Monolith | Growing team, clearer boundaries, want deployment simplicity | Structure without operational complexity | | Microservices | Large org, independent team deployment, clear bounded contexts | Team autonomy, operational complexity | | Serverless | Event-driven, variable load, minimal ops desire | Scaling built-in, cold start latency |

WRONG - Follow the Trend

"We should use microservices because that's what Netflix does."

Problem: Following trends without understanding context.

CORRECT - Context-Driven Decision

Given:
- Team of 5 developers
- Single deployment target
- Unclear domain boundaries still evolving

Recommendation: Modular monolith

Rationale: Microservices would add operational complexity
(service mesh, distributed tracing, deployment coordination)
without the benefit of independent team scaling.

When to revisit: If team grows >15 or we identify clear
bounded contexts with different scaling requirements.

Trade-Off Analysis Framework

For every significant decision, document:

| Dimension | Option A | Option B | |-----------|----------|----------| | Development speed | | | | Operational complexity | | | | Team independence | | | | Consistency guarantees | | | | Scaling characteristics | | | | Cost (infra + people) | | |

ADR Template

# ADR-XXX: [Decision Title]

**Status:** Proposed | Accepted | Deprecated | Superseded
**Date:** YYYY-MM-DD

## Context

[What issue are we facing? What constraints exist?]

## Decision

[What did we decide?]

## Consequences

### Positive
- [Benefit]

### Negative
- [Drawback]

## Alternatives Considered

### [Option Name]
**Why rejected:** [Reason]

Example ADR

# ADR-001: Use PostgreSQL for primary data store

**Status:** Accepted
**Date:** 2024-01-15

## Context

We need a primary data store for user data, orders, and inventory.
Requirements: ACID transactions, complex queries, team familiarity.

## Decision

Use PostgreSQL 15 as the primary data store.

## Consequences

### Positive
- ACID guarantees for financial data
- Team has 5+ years PostgreSQL experience
- Rich ecosystem (PostGIS, pg_trgm, etc.)
- Proven at our expected scale (100k users)

### Negative
- Vertical scaling limits (can address with read replicas)
- Schema migrations require coordination

## Alternatives Considered

### MongoDB
**Why rejected:** Team lacks experience, eventual consistency
problematic for order processing.

### DynamoDB
**Why rejected:** Complex queries (reporting) would require
additional infrastructure. Cost unpredictable with access patterns.

Database Selection

| Type | Use When | Trade-offs | |------|----------|------------| | Relational (Postgres) | ACID needed, complex queries | Scaling complexity | | Document (MongoDB) | Flexible schemas, embedded data | Weaker consistency | | Key-Value (Redis) | Caching, sessions, fast lookups | Limited queries | | Graph (Neo4j) | Relationship-heavy queries | Specialized | | Time-Series (InfluxDB) | Metrics, events, IoT | Append-optimized |

Scalability Patterns

Order of Consideration

Vertical scaling - Bigger machine (simplest)
Caching - CDN → Application → Database
Read replicas - Separate read/write traffic
Horizontal scaling - Multiple instances
Sharding - Partition data (most complex)

Resilience Patterns

| Pattern | Purpose | |---------|---------| | Retry with backoff | Handle transient failures | | Circuit breaker | Prevent cascade failures | | Bulkhead | Isolate failure domains | | Timeout | Bound waiting time | | Graceful degradation | Partial service over no service |

API Design Principles

REST

| Principle | Requirement | |-----------|-------------| | Resource modeling | Nouns, not verbs | | HTTP semantics | GET reads, POST creates, PUT replaces | | Versioning | URI (/v1/) or header | | Pagination | Cursor-based for large sets | | Error responses | Problem Details (RFC 7807) |

Event-Driven

| Consideration | Guidance | |---------------|----------| | Event schema | Version events, use schema registry | | Ordering | Partition key for ordering guarantees | | Idempotency | Handle duplicate delivery | | Dead letter | Handle poison messages |

Distributed Systems Fundamentals

CAP Theorem

Choose two: Consistency, Availability, Partition Tolerance.

In practice: During network partition, choose consistency OR availability.

Consistency Models

| Model | Meaning | Use When | |-------|---------|----------| | Strong | All reads see latest write | Financial data | | Eventual | All reads eventually see latest | Social feeds, caches | | Causal | Cause-effect ordering preserved | Collaborative editing |

Integration

| Skill | Relationship | |-------|--------------| | design-principles | Apply to architecture decisions | | pattern-enforcement | Enforce boundaries with tooling | | documentation-standards | Document architectural decisions |

Anti-Patterns

| Anti-Pattern | Why It's Wrong | |--------------|----------------| | Architecture astronauting | Designing for problems you don't have | | Premature optimization | Optimize without data | | Trend following | "Netflix does it" isn't a reason | | Undocumented decisions | Become mysterious legacy constraints | | Over-engineering | Complexity without justification | | Ignoring team capabilities | Architecture must match team |

Decision Checklist

When making architecture decisions:

[ ] Did I understand the context first?
[ ] Did I identify multiple valid approaches?
[ ] Did I analyze trade-offs explicitly?
[ ] Did I consider what's hard to change later?
[ ] Did I document the rationale (ADR)?
[ ] Does this match team capabilities?
[ ] Is this the simplest solution that works?

Agent Skills: System Architecture

Install this agent skill to your local

Skill Files