Database Workflow Skill

Database Workflow

Language-agnostic guidelines for database design, migrations, schema management, ORM patterns, and query optimization.

CRITICAL: No Premature Optimization

Default to simplicity. Optimize only when you have measured evidence of a problem.

Don't add indexes "just in case"
Don't over-normalize without understanding query patterns
Don't implement complex caching without profiling first
Measure before optimizing (EXPLAIN, query logs, monitoring)

Migration Strategies

Version Control for Schema Changes

Treat migrations as code:

Store in version control alongside application code
Timestamp or sequential numbering (001, 002, 003...)
Atomic, single-responsibility changes
Document WHY in migration files, not just WHAT

Naming convention:

migrations/
├── 001_create_users_table.sql
├── 002_add_email_index.sql
├── 003_create_orders_table.sql
└── 004_add_user_fk_to_orders.sql

Up/Down Migrations

Every migration must be reversible:

-- UP: Create table
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- DOWN: Drop table
DROP TABLE users;

Complex reversals require care:

-- UP: Add constraint
ALTER TABLE orders ADD CONSTRAINT fk_user
    FOREIGN KEY (user_id) REFERENCES users(id);

-- DOWN: Remove constraint (PostgreSQL)
ALTER TABLE orders DROP CONSTRAINT fk_user;

Idempotent Migrations

Migrations must be safely re-runnable:

-- ✅ Good: Idempotent (safe to run multiple times)
CREATE TABLE IF NOT EXISTS users (...);
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);

-- ❌ Bad: Fails if run twice
CREATE TABLE users (...);
CREATE INDEX idx_users_email ON users(email);

Rollback Strategies

Two approaches:

Down migrations (reversible):
- Run DOWN SQL to undo changes
- Works if migration is reversible
- Better for critical systems
Snapshot migrations (not reversible):
- Never roll back; always migrate forward
- Create new migration to fix issues
- Simpler, safer in practice
- Better for high-availability systems

Best practice: Design migrations to be reversible when possible, but plan for forward-only rollbacks in production.

Migration Naming Conventions

Use descriptive, action-oriented names:

✅ Good:
- 001_create_users_table
- 002_add_email_unique_constraint
- 003_create_index_users_email
- 004_rename_column_user_id_to_author_id
- 005_add_soft_delete_columns

❌ Bad:
- 001_update
- 002_fix
- 003_schema_change
- 004_v2

Include timestamp + sequence:

2024_11_17_001_create_users_table.sql
2024_11_17_002_add_email_index.sql

Schema Design Principles

Normalization (1NF, 2NF, 3NF)

First Normal Form (1NF):

Eliminate repeating groups
All columns contain atomic (non-divisible) values
Each row is unique

-- ❌ NOT 1NF: phone_numbers is a repeating group
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),
    phone_numbers VARCHAR(255) -- "555-1234, 555-5678"
);

-- ✅ 1NF: Separate table for phone numbers
CREATE TABLE user_phones (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    phone_number VARCHAR(20)
);

Second Normal Form (2NF):

Meets 1NF
Remove partial dependencies (non-key columns depend on ALL of primary key)

-- ❌ NOT 2NF: course_name depends only on course_id, not on (student_id, course_id)
CREATE TABLE enrollments (
    student_id BIGINT,
    course_id BIGINT,
    course_name VARCHAR(255), -- Should be in courses table
    grade CHAR(1),
    PRIMARY KEY (student_id, course_id)
);

-- ✅ 2NF: Move course_name to separate table
CREATE TABLE courses (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE enrollments (
    student_id BIGINT,
    course_id BIGINT REFERENCES courses(id),
    grade CHAR(1),
    PRIMARY KEY (student_id, course_id)
);

Third Normal Form (3NF):

Meets 2NF
Remove transitive dependencies (non-key columns don't depend on other non-key columns)

-- ❌ NOT 3NF: city and state depend on zip_code, not on user_id
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    zip_code VARCHAR(5),
    city VARCHAR(100),
    state CHAR(2)
);

-- ✅ 3NF: Move location info to separate table
CREATE TABLE zip_codes (
    code VARCHAR(5) PRIMARY KEY,
    city VARCHAR(100),
    state CHAR(2)
);

CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    zip_code VARCHAR(5) REFERENCES zip_codes(code)
);

Denormalization (When and Why)

Denormalize when:

Query patterns are read-heavy (much more than writes)
Measurement shows join performance is a bottleneck
Reporting queries need fast access to aggregated data
Cache invalidation is simpler than join performance

Common denormalization patterns:

Stored aggregates:

-- Track order count on user without JOIN
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    order_count INT DEFAULT 0
);

-- Keep in sync with trigger or application code

Purposeful redundancy:

-- Store user email on order to avoid JOIN at read time
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    user_id BIGINT REFERENCES users(id),
    user_email VARCHAR(255), -- Denormalized for reporting
    total_amount DECIMAL(10,2)
);

Pre-computed views:

-- Materialized view for dashboards
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as order_count,
    SUM(total_amount) as total_revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at);

-- Refresh periodically, not on every insert
REFRESH MATERIALIZED VIEW monthly_sales;

Indexing Strategies

Index only when needed. Measure first.

-- ✅ Create index for frequently searched columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- ❌ Avoid: Index on every column
-- ❌ Avoid: Index on low-cardinality columns (boolean flags)
-- ❌ Avoid: Duplicate indexes

Index types:

Single-column index (most common):

CREATE INDEX idx_users_email ON users(email);

Composite index (for multi-column WHERE/JOIN):

-- Good for: WHERE user_id = X AND created_at > Y
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Partial index (index subset of rows):

-- Index only active users (avoid indexing soft-deleted rows)
CREATE INDEX idx_active_users_email ON users(email)
    WHERE deleted_at IS NULL;

Unique index (enforce constraint):

CREATE UNIQUE INDEX idx_users_email ON users(email);

Index maintenance:

Monitor for unused indexes (query database statistics)
Drop unused indexes after measurement
ANALYZE/VACUUM regularly to update statistics

Primary Keys

Use surrogate keys (auto-incrementing ID) by default:

-- ✅ Recommended: Surrogate key
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255)
);

-- ✅ Good for high-volume tables
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    event_type VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Natural keys only if:

Column(s) are guaranteed immutable
Never reassigned or repurposed
Are short and stable

-- ✅ Natural key (country code is stable, never changes)
CREATE TABLE countries (
    code CHAR(2) PRIMARY KEY,
    name VARCHAR(255)
);

-- ❌ Natural key (email changes, should use surrogate)
CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY
);

Foreign Keys and Constraints

Always define foreign key relationships:

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    total_amount DECIMAL(10,2) NOT NULL
);

-- With explicit constraint name for cleaner error messages
CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    quantity INT NOT NULL CHECK (quantity > 0),
    CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id),
    CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id)
);

Cascade behaviors:

-- DELETE CASCADE: Delete orders when user is deleted (use with caution)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE

-- RESTRICT: Prevent user deletion if orders exist (safer default)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT

-- SET NULL: Set user_id to NULL if user deleted (for optional relationships)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL

Constraints

Use constraints to enforce data integrity at database level:

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    age INT CHECK (age >= 18),
    role VARCHAR(50) DEFAULT 'user' CHECK (role IN ('user', 'admin', 'moderator')),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount > 0),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    -- Composite constraint
    CONSTRAINT valid_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    -- Unique constraint on composite columns
    CONSTRAINT unique_user_order_date UNIQUE (user_id, DATE(created_at))
);

ORM Patterns

Active Record vs Data Mapper

Active Record:

Model contains both data and database logic
Simple for small projects
Model directly calls database
Example: Rails, Django ORM

# Active Record pattern
class User(Model):
    name = CharField()
    email = CharField()

    def save(self):
        # Object knows how to save itself
        db.insert('users', {...})

    @staticmethod
    def find_by_email(email):
        return db.query('SELECT * FROM users WHERE email = ?', email)

# Usage
user = User(name='John', email='john@example.com')
user.save()
found_user = User.find_by_email('john@example.com')

Data Mapper:

Model contains only data; repository handles database logic
Better separation of concerns
Model is a plain object
Example: SQLAlchemy, TypeORM

# Data Mapper pattern
class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email

class UserRepository:
    def save(self, user):
        # Repository handles persistence
        db.insert('users', {'name': user.name, 'email': user.email})

    def find_by_email(self, email):
        row = db.query('SELECT * FROM users WHERE email = ?', email)
        return User(row['name'], row['email']) if row else None

# Usage
user = User('John', 'john@example.com')
repo = UserRepository()
repo.save(user)
found_user = repo.find_by_email('john@example.com')

Choose based on project scale:

Small apps: Active Record (simpler)
Medium to large: Data Mapper (more flexible)

Query Builders

Use query builders to avoid string concatenation and SQL injection:

# ❌ Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE email = '{email}'"
result = db.execute(query)

# ✅ Safe: Parameterized query
result = db.query('SELECT * FROM users WHERE email = ?', [email])

# ✅ Better: Query builder
result = (
    db.select(User)
    .where(User.email == email)
    .where(User.active == True)
    .order_by(User.created_at.desc())
    .limit(10)
    .execute()
)

Benefits of query builders:

Prevent SQL injection
Cleaner, more maintainable code
Database-agnostic (can switch databases)
Compose queries dynamically

Eager Loading vs Lazy Loading

Lazy Loading (default, but can cause N+1):

# Each user fetch triggers a separate query for posts
users = db.query(User).limit(10).all()
for user in users:
    print(user.posts)  # Additional query per user = 10+ queries!

Eager Loading (prevent N+1):

# Single query with JOIN
users = db.query(User).join(Post).limit(10).all()
for user in users:
    print(user.posts)  # No additional queries

# Or use explicit eager loading
users = db.query(User).options(joinedload(User.posts)).limit(10).all()

N+1 Query Problem

Recognize and fix N+1 problems:

# ❌ N+1 Problem: 1 query for users + N queries for posts
users = db.query(User).all()  # 1 query
for user in users:
    posts = db.query(Post).filter(Post.user_id == user.id).all()  # N more queries

# ✅ Fix 1: Eager loading with JOIN
users = db.query(User).join(Post).distinct().all()

# ✅ Fix 2: Use IN clause for batch loading
user_ids = [u.id for u in users]
posts = db.query(Post).filter(Post.user_id.in_(user_ids)).all()
# Now merge posts back to users in memory

# ✅ Fix 3: Use ORM eager loading
users = db.query(User).options(selectinload(User.posts)).all()

Query Optimization

Index Usage

Write queries that use indexes:

-- ✅ Uses index on email
SELECT * FROM users WHERE email = 'john@example.com';

-- ✅ Uses index on user_id, created_at
SELECT * FROM orders
WHERE user_id = 123 AND created_at > '2024-01-01'
ORDER BY created_at DESC;

-- ❌ Can't use index (function on column)
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Fix: CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- ❌ Can't use index (leading wildcard)
SELECT * FROM users WHERE email LIKE '%@example.com';
-- Fix: Use full-text search or store domain separately

-- ✅ Can use index (trailing wildcard)
SELECT * FROM users WHERE email LIKE 'john%';

Query Analysis (EXPLAIN)

Always analyze slow queries before optimizing:

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.email = 'john@example.com'
ORDER BY o.created_at DESC
LIMIT 10;

-- MySQL
EXPLAIN
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.email = 'john@example.com'
ORDER BY o.created_at DESC
LIMIT 10;

Look for:

Seq Scan (full table scan) - add index?
Hash Join (expensive) - add index on join column
Sort (expensive) - add index with correct sort order
High execution time - which step is slow?

Avoiding SELECT *

Fetch only columns you need:

-- ❌ Fetches all columns (slower, more bandwidth)
SELECT * FROM orders WHERE user_id = 123;

-- ✅ Fetch only needed columns
SELECT id, user_id, total_amount, created_at
FROM orders
WHERE user_id = 123;

-- ✅ Reduces memory/bandwidth especially for large text columns
SELECT id, user_id, total_amount
FROM orders
WHERE user_id = 123;

Connection Pooling

Use connection pooling in production:

# ❌ Anti-pattern: New connection per query
def get_user(user_id):
    conn = connect()  # New connection!
    user = conn.query('SELECT * FROM users WHERE id = ?', user_id)
    conn.close()
    return user

# ✅ Use connection pool
pool = ConnectionPool(
    host='localhost',
    database='myapp',
    min_size=5,
    max_size=20,
    timeout=30
)

def get_user(user_id):
    with pool.get_connection() as conn:  # Reuses from pool
        return conn.query('SELECT * FROM users WHERE id = ?', user_id)

Pool configuration (tune for your workload):

min_size: Minimum idle connections (default 5-10)
max_size: Maximum concurrent connections (default 20-50)
timeout: Connection acquisition timeout
idle_timeout: Close idle connections after N seconds

SQL vs NoSQL Considerations

When to Use SQL (ACID, Relations)

Use SQL when:

Data has strong relationships (orders → users → addresses)
Consistency is critical (financial transactions, inventory)
Need complex queries with JOINs
Data is structured and schema is stable
Multi-record transactions (ACID guarantees)

Example scenarios:

E-commerce: Products, Orders, Users with complex relationships
Banking: Transactions must be atomic and consistent
CRM: Complex queries across related entities
Content management: Structured articles with authors, tags, categories

When to Use NoSQL (Scale, Flexibility)

Use NoSQL when:

Data is unstructured or semi-structured
Schema evolves rapidly (different object shapes)
Horizontal scaling is priority (sharding)
High write throughput needed
Document-oriented data (JSON-like)

Example scenarios:

Real-time analytics: Time-series data
Content platforms: Variable document structures
Caching layer: Key-value store
Event streaming: Logs, events, activity feeds
Catalog systems: Products with variable attributes

Document database (MongoDB, Firebase):

✅ Good: User profiles with flexible attributes
✅ Good: Product catalog with variable specs
❌ Bad: Complex multi-entity queries and JOINs

Document structure:
{
  _id: 1,
  name: "John",
  email: "john@example.com",
  preferences: {
    language: "en",
    theme: "dark",
    notifications: true
  }
}

Key-value store (Redis, Memcached):

✅ Good: Caching, sessions, rate limiting
✅ Good: Real-time leaderboards
❌ Bad: Complex queries, relationships

Structure:
user:123 → { name, email, created_at }
session:abc123 → { user_id, expires_at }

Graph database (Neo4j):

✅ Good: Social networks, recommendations
✅ Good: Complex relationship queries
❌ Bad: Simple CRUD operations

Relationships:
User -[:FOLLOWS]-> User
User -[:COMMENTED_ON]-> Post

Testing Database Code

Test Databases

Use separate test database:

# config.py
if os.getenv('ENV') == 'test':
    DATABASE_URL = 'postgresql://test_user:test_pass@localhost/test_db'
else:
    DATABASE_URL = os.getenv('DATABASE_URL')

# conftest.py (pytest)
@pytest.fixture(autouse=True)
def setup_test_db():
    """Create test database and tables before each test."""
    # Create tables
    db.create_all()
    yield
    # Cleanup
    db.drop_all()

Fixtures and Seeds

Use fixtures for test data:

# conftest.py
import pytest
from app.models import User, Order

@pytest.fixture
def sample_user(db):
    """Create a test user."""
    user = User(name='John Doe', email='john@example.com')
    db.add(user)
    db.commit()
    return user

@pytest.fixture
def sample_orders(db, sample_user):
    """Create orders for test user."""
    orders = [
        Order(user_id=sample_user.id, total_amount=100.00),
        Order(user_id=sample_user.id, total_amount=200.00),
    ]
    db.add_all(orders)
    db.commit()
    return orders

# test_orders.py
def test_get_user_orders(sample_user, sample_orders):
    """Test fetching user orders."""
    orders = Order.query.filter_by(user_id=sample_user.id).all()
    assert len(orders) == 2
    assert sum(o.total_amount for o in orders) == 300.00

Transaction Rollback

Rollback transactions to isolate tests:

# conftest.py - Automatic rollback after each test
@pytest.fixture(autouse=True)
def db_transaction(db):
    """Wrap each test in a transaction that rolls back."""
    transaction = db.begin_nested()
    yield
    transaction.rollback()  # Undo all changes from this test

# Or use explicit rollback
def test_create_user():
    db.begin()
    user = User(name='John', email='john@example.com')
    db.add(user)
    db.commit()
    assert user.id is not None

    db.rollback()
    # Changes are undone, database is clean for next test

Common Patterns (SQL and NoSQL)

Soft Deletes

Mark records as deleted instead of removing:

-- Add deleted_at column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;

-- Soft delete (update)
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE id = 123;

-- Query only active records
SELECT * FROM users WHERE deleted_at IS NULL;

-- Create index on deleted_at for efficient queries
CREATE INDEX idx_users_active ON users(deleted_at) WHERE deleted_at IS NULL;

Pros: Recoverable, audit trail, can restore data Cons: Need to remember to filter deleted records everywhere

Audit Logs

Track all data changes:

CREATE TABLE audit_log (
    id BIGSERIAL PRIMARY KEY,
    entity_type VARCHAR(100),
    entity_id BIGINT,
    action VARCHAR(20), -- CREATE, UPDATE, DELETE
    old_values JSONB,
    new_values JSONB,
    changed_by BIGINT REFERENCES users(id),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- PostgreSQL trigger to auto-log changes
CREATE FUNCTION audit_trigger() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO audit_log (entity_type, entity_id, action, new_values, changed_at)
    VALUES ('user', NEW.id, TG_OP, row_to_json(NEW), NOW());
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER users_audit AFTER UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger();

Timestamps (Created/Updated)

Track record creation and modification:

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Auto-update updated_at on changes (PostgreSQL)
CREATE FUNCTION update_timestamp() RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER users_update_timestamp BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_timestamp();

Hierarchical Data (Trees)

Store tree structures in relational database:

-- Option 1: Adjacency List (simple, slow to query)
CREATE TABLE categories (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255),
    parent_id BIGINT REFERENCES categories(id)
);

-- Query children
SELECT * FROM categories WHERE parent_id = 5;

-- Query ancestors (recursive, expensive)
WITH RECURSIVE ancestors AS (
    SELECT id, name, parent_id FROM categories WHERE id = 5
    UNION ALL
    SELECT c.id, c.name, c.parent_id
    FROM categories c
    JOIN ancestors a ON c.id = a.parent_id
)
SELECT * FROM ancestors;

-- Option 2: Closure Table (trade space for query speed)
CREATE TABLE categories (id BIGSERIAL PRIMARY KEY, name VARCHAR(255));

CREATE TABLE category_closure (
    ancestor_id BIGINT REFERENCES categories(id),
    descendant_id BIGINT REFERENCES categories(id),
    depth INT,
    PRIMARY KEY (ancestor_id, descendant_id)
);

-- Query all descendants
SELECT c.* FROM categories c
JOIN category_closure cc ON c.id = cc.descendant_id
WHERE cc.ancestor_id = 5 AND cc.depth > 0;

Pagination

Implement efficient pagination:

-- ❌ OFFSET is slow for large offsets
SELECT * FROM orders LIMIT 10 OFFSET 100000;

-- ✅ Better: Keyset pagination (cursor-based)
SELECT * FROM orders
WHERE id > 12345  -- Last ID from previous page
ORDER BY id
LIMIT 10;

-- ✅ With composite key
SELECT * FROM orders
WHERE (user_id, created_at) > (123, '2024-11-17')
ORDER BY user_id, created_at
LIMIT 10;

Transactions

Use transactions for data consistency:

# ❌ No transaction - inconsistent state if error occurs
user = db.query(User).get(123)
user.balance -= 50
db.commit()

account = db.query(Account).get(456)
account.balance += 50
db.commit()  # If this fails, money disappears!

# ✅ Transaction - all-or-nothing
try:
    with db.transaction():
        user = db.query(User).get(123)
        user.balance -= 50

        account = db.query(Account).get(456)
        account.balance += 50

        db.flush()
except Exception:
    # Everything rolls back automatically
    raise

Common Pitfalls

N+1 Queries

Problem: One query per item instead of batching Fix: Use eager loading or batch queries

Missing Indexes

Problem: Slow queries on large tables Fix: Analyze slow queries with EXPLAIN, add indexes on WHERE/JOIN columns

No Transaction Boundaries

Problem: Inconsistent data if error occurs mid-operation Fix: Wrap multi-step operations in transactions

Over-Normalization

Problem: Too many JOINs make queries slow and complex Fix: Denormalize strategically where proven necessary

Unbounded Queries

Problem: SELECT * without LIMIT causes memory exhaustion Fix: Always LIMIT and paginate large result sets

Wrong Cascade Rules

Problem: Accidental data loss or orphaned records Fix: Choose CASCADE, RESTRICT, or SET NULL deliberately

Storing Passwords in Plain Text

Problem: Security breach if database compromised Fix: Store bcrypt hashes, never plain passwords

Accidental Type Mismatches

Problem: VARCHAR(255) used for numbers, can't query properly Fix: Use correct data types (INTEGER, BIGINT, DECIMAL, not VARCHAR)

Not Testing Migrations

Problem: Migration fails in production, downtime Fix: Test migrations on production-like data before deploying

Schema Changes Without Backward Compatibility

Problem: Old application code breaks with new schema Fix: Support both old and new columns temporarily, add deprecation period

No Query Timeouts

Problem: Slow query locks database, cascading failures Fix: Set statement timeouts and query timeouts in production

See project-specific database configuration in .claude/CLAUDE.md if present.

Agent Skills: Database Workflow

Install this agent skill to your local

Skill Files