Database Design Skill Skill

Database Design Skill

Purpose: Atomic skill for database architecture with comprehensive parameter validation and optimization patterns.

Skill Identity

| Attribute | Value | |-----------|-------| | Scope | Schema Design, Indexing, Query Optimization | | Responsibility | Single: Data modeling and storage patterns | | Invocation | Skill("database-design") |

Parameter Schema

Input Validation

parameters:
  database_context:
    type: object
    required: true
    properties:
      use_case:
        type: string
        enum: [oltp, olap, htap, time_series, document, graph]
        required: true
      data_characteristics:
        type: object
        required: true
        properties:
          volume:
            type: string
            pattern: "^\\d+[KMGTP]B$"
          velocity:
            type: string
            pattern: "^\\d+[KM]?\\s*(reads?|writes?)/s$"
          variety:
            type: string
            enum: [structured, semi_structured, unstructured]
      access_patterns:
        type: array
        items:
          type: object
          properties:
            query_type: { type: string }
            frequency: { type: string, enum: [high, medium, low] }
            latency_sla: { type: string, pattern: "^\\d+ms$" }
      consistency:
        type: string
        enum: [strong, eventual, read_your_writes]
        default: strong

validation_rules:
  - name: "oltp_latency"
    rule: "use_case == 'oltp' implies latency_sla <= '100ms'"
    error: "OLTP workloads typically require <100ms latency"
  - name: "volume_velocity_match"
    rule: "high_volume implies adequate_velocity"
    error: "Volume and velocity constraints may be incompatible"

Output Schema

output:
  type: object
  properties:
    database_recommendation:
      type: object
      properties:
        primary: { type: string }
        secondary: { type: string }
        rationale: { type: string }
    schema:
      type: object
      properties:
        tables: { type: array }
        relationships: { type: array }
        constraints: { type: array }
    indexes:
      type: array
      items:
        type: object
        properties:
          table: { type: string }
          columns: { type: array }
          type: { type: string }
          rationale: { type: string }
    optimization_tips:
      type: array
      items: { type: string }

Core Patterns

Database Selection Matrix

OLTP (Transactional):
├── PostgreSQL
│   ├── Best for: Complex queries, ACID, JSON support
│   ├── Scale: Vertical + read replicas
│   └── Limit: ~10TB effective, 10K TPS
├── MySQL/MariaDB
│   ├── Best for: Web applications, simple queries
│   ├── Scale: Read replicas, ProxySQL
│   └── Limit: Similar to PostgreSQL
└── CockroachDB/TiDB
    ├── Best for: Distributed ACID
    ├── Scale: Horizontal auto-sharding
    └── Trade-off: Higher latency

OLAP (Analytical):
├── ClickHouse
│   ├── Best for: Real-time analytics, columnar
│   ├── Performance: 1B+ rows/second
│   └── Use: Log analysis, metrics
├── Snowflake/BigQuery
│   ├── Best for: Data warehouse, serverless
│   ├── Scale: Unlimited, pay-per-query
│   └── Use: BI, reporting
└── DuckDB
    ├── Best for: Embedded analytics
    └── Use: Edge, single-machine analytics

NoSQL:
├── MongoDB
│   ├── Best for: Documents, flexible schema
│   └── Pattern: Embedding, denormalization
├── Cassandra
│   ├── Best for: Write-heavy, wide-column
│   └── Pattern: Partition key design
├── Redis
│   ├── Best for: Cache, sessions, pub/sub
│   └── Limit: Memory-bound
└── Neo4j
    ├── Best for: Graph relationships
    └── Pattern: Cypher queries

Schema Design Patterns

Normalization (OLTP):
├── 1NF: Atomic values
├── 2NF: No partial dependencies
├── 3NF: No transitive dependencies
└── BCNF: Strict key dependencies

Denormalization (OLAP/NoSQL):
├── Embedding
│   ├── Nest related documents
│   ├── Best for: 1:few relationships
│   └── Avoid: Large arrays, frequent updates
├── Bucketing
│   ├── Group time-series data
│   ├── Reduce document count
│   └── Improve write performance
└── Materialized Views
    ├── Pre-computed aggregates
    ├── Trade: Storage vs query time
    └── Refresh: Incremental or full

Indexing Strategies

Index Types:
├── B-tree (default)
│   ├── Range queries, sorting
│   ├── Overhead: ~10% storage
│   └── Example: WHERE date BETWEEN x AND y
├── Hash
│   ├── Equality only
│   ├── Faster lookup
│   └── Example: WHERE id = 123
├── GIN (PostgreSQL)
│   ├── Full-text, JSONB, arrays
│   ├── Higher write cost
│   └── Example: WHERE tags @> '{redis}'
├── BRIN (Block Range)
│   ├── Large sorted tables
│   ├── Minimal storage
│   └── Example: Time-series data
└── Composite
    ├── Multi-column index
    ├── Order matters (leftmost prefix)
    └── Example: (user_id, created_at DESC)

Index Selection Rules:
├── Query frequency > 10% of total
├── Selectivity > 10% (filter out 90%)
├── Write overhead acceptable
└── Index size < table size

Retry Logic

Connection & Query Retry

retry_config:
  connection:
    max_attempts: 3
    initial_delay_ms: 100
    max_delay_ms: 5000
    multiplier: 2.0

  query:
    max_attempts: 2
    timeout_ms: 30000
    retry_on:
      - CONNECTION_LOST
      - DEADLOCK
      - LOCK_TIMEOUT
    abort_on:
      - SYNTAX_ERROR
      - CONSTRAINT_VIOLATION
      - PERMISSION_DENIED

  transaction:
    retry_on_deadlock: true
    max_deadlock_retries: 3
    isolation_level: READ_COMMITTED

Logging & Observability

Log Format

log_schema:
  level: { type: string }
  timestamp: { type: string, format: ISO8601 }
  skill: { type: string, value: "database-design" }
  event:
    type: string
    enum:
      - schema_analyzed
      - index_recommended
      - query_optimized
      - capacity_estimated
      - migration_planned
  context:
    type: object
    properties:
      table: { type: string }
      query: { type: string }
      execution_time_ms: { type: number }
      rows_affected: { type: integer }

example:
  level: INFO
  event: index_recommended
  context:
    table: orders
    columns: [user_id, status, created_at]
    type: composite
    rationale: "Covers 80% of queries"

Metrics

metrics:
  - name: query_duration_seconds
    type: histogram
    labels: [query_type, table]
    buckets: [0.001, 0.01, 0.1, 1, 10]

  - name: index_usage_ratio
    type: gauge
    labels: [index_name, table]

  - name: table_size_bytes
    type: gauge
    labels: [table, database]

  - name: connection_pool_usage
    type: gauge
    labels: [pool_name]

Troubleshooting

Common Issues

| Issue | Cause | Resolution | |-------|-------|------------| | Slow queries | Missing index | Add appropriate index | | Lock waits | Long transactions | Split or optimize | | Connection exhaustion | Pool too small | Increase pool, add pooler | | Replication lag | Heavy writes | Add replicas, optimize queries | | Disk full | Bloat, no vacuum | VACUUM, archival |

Debug Checklist

□ EXPLAIN ANALYZE run?
□ Index usage verified?
□ Connection pool monitored?
□ Lock contention checked?
□ Table statistics current?
□ Slow query log enabled?

Unit Test Templates

Schema Validation Tests

# test_database_design.py

def test_valid_oltp_schema():
    params = {
        "database_context": {
            "use_case": "oltp",
            "data_characteristics": {
                "volume": "100GB",
                "velocity": "10K writes/s",
                "variety": "structured"
            },
            "access_patterns": [
                {"query_type": "point_lookup", "frequency": "high", "latency_sla": "10ms"}
            ]
        }
    }
    result = validate_parameters(params)
    assert result.valid == True

def test_oltp_latency_validation():
    params = {
        "database_context": {
            "use_case": "oltp",
            "access_patterns": [
                {"latency_sla": "500ms"}  # Too high for OLTP
            ]
        }
    }
    result = validate_parameters(params)
    assert "latency" in result.warnings[0]

def test_index_recommendation():
    query = "SELECT * FROM orders WHERE user_id = ? AND status = ? ORDER BY created_at DESC"
    access_pattern = {"frequency": "high", "latency_sla": "10ms"}

    result = recommend_index(query, access_pattern)
    assert result.columns == ["user_id", "status", "created_at"]
    assert result.type == "composite"

def test_capacity_estimation():
    result = estimate_capacity(
        rows=10_000_000,
        row_size_bytes=500,
        growth_rate_monthly=0.05,
        months=12
    )
    assert result.current_size == "5GB"
    assert result.projected_size == "8.97GB"  # 5GB * 1.05^12

Query Optimization Tests

def test_explain_analysis():
    explain_output = """
    Seq Scan on orders (rows=1000000)
      Filter: (user_id = 123)
    """
    result = analyze_explain(explain_output)
    assert result.issues[0].type == "SEQ_SCAN"
    assert result.suggestions[0] == "Add index on orders(user_id)"

def test_composite_index_order():
    queries = [
        "WHERE user_id = ? AND status = ?",
        "WHERE user_id = ? ORDER BY created_at"
    ]
    result = optimize_index_order(queries)
    # user_id should be first (equality), created_at last (sort)
    assert result.recommended_order == ["user_id", "status", "created_at"]

Version History

| Version | Date | Changes | |---------|------|---------| | 2.0.0 | 2025-01 | Production-grade rewrite with optimization patterns | | 1.0.0 | 2024-12 | Initial release |

Agent Skills: Database Design Skill

Install this agent skill to your local

Skill Files