Database Workflow
Language-agnostic guidelines for database design, migrations, schema management, ORM patterns, and query optimization.
CRITICAL: No Premature Optimization
Default to simplicity. Optimize only when you have measured evidence of a problem.
- Don't add indexes "just in case"
- Don't over-normalize without understanding query patterns
- Don't implement complex caching without profiling first
- Measure before optimizing (EXPLAIN, query logs, monitoring)
Migration Strategies
Version Control for Schema Changes
Treat migrations as code:
- Store in version control alongside application code
- Timestamp or sequential numbering (001, 002, 003...)
- Atomic, single-responsibility changes
- Document WHY in migration files, not just WHAT
Naming convention:
migrations/
├── 001_create_users_table.sql
├── 002_add_email_index.sql
├── 003_create_orders_table.sql
└── 004_add_user_fk_to_orders.sql
Up/Down Migrations
Every migration must be reversible:
-- UP: Create table
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(255) UNIQUE NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- DOWN: Drop table
DROP TABLE users;
Complex reversals require care:
-- UP: Add constraint
ALTER TABLE orders ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id);
-- DOWN: Remove constraint (PostgreSQL)
ALTER TABLE orders DROP CONSTRAINT fk_user;
Idempotent Migrations
Migrations must be safely re-runnable:
-- ✅ Good: Idempotent (safe to run multiple times)
CREATE TABLE IF NOT EXISTS users (...);
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
-- ❌ Bad: Fails if run twice
CREATE TABLE users (...);
CREATE INDEX idx_users_email ON users(email);
Rollback Strategies
Two approaches:
-
Down migrations (reversible):
- Run DOWN SQL to undo changes
- Works if migration is reversible
- Better for critical systems
-
Snapshot migrations (not reversible):
- Never roll back; always migrate forward
- Create new migration to fix issues
- Simpler, safer in practice
- Better for high-availability systems
Best practice: Design migrations to be reversible when possible, but plan for forward-only rollbacks in production.
Migration Naming Conventions
Use descriptive, action-oriented names:
✅ Good:
- 001_create_users_table
- 002_add_email_unique_constraint
- 003_create_index_users_email
- 004_rename_column_user_id_to_author_id
- 005_add_soft_delete_columns
❌ Bad:
- 001_update
- 002_fix
- 003_schema_change
- 004_v2
Include timestamp + sequence:
2024_11_17_001_create_users_table.sql
2024_11_17_002_add_email_index.sql
Schema Design Principles
Normalization (1NF, 2NF, 3NF)
First Normal Form (1NF):
- Eliminate repeating groups
- All columns contain atomic (non-divisible) values
- Each row is unique
-- ❌ NOT 1NF: phone_numbers is a repeating group
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
phone_numbers VARCHAR(255) -- "555-1234, 555-5678"
);
-- ✅ 1NF: Separate table for phone numbers
CREATE TABLE user_phones (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
phone_number VARCHAR(20)
);
Second Normal Form (2NF):
- Meets 1NF
- Remove partial dependencies (non-key columns depend on ALL of primary key)
-- ❌ NOT 2NF: course_name depends only on course_id, not on (student_id, course_id)
CREATE TABLE enrollments (
student_id BIGINT,
course_id BIGINT,
course_name VARCHAR(255), -- Should be in courses table
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
-- ✅ 2NF: Move course_name to separate table
CREATE TABLE courses (
id BIGINT PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE enrollments (
student_id BIGINT,
course_id BIGINT REFERENCES courses(id),
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
Third Normal Form (3NF):
- Meets 2NF
- Remove transitive dependencies (non-key columns don't depend on other non-key columns)
-- ❌ NOT 3NF: city and state depend on zip_code, not on user_id
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(5),
city VARCHAR(100),
state CHAR(2)
);
-- ✅ 3NF: Move location info to separate table
CREATE TABLE zip_codes (
code VARCHAR(5) PRIMARY KEY,
city VARCHAR(100),
state CHAR(2)
);
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(5) REFERENCES zip_codes(code)
);
Denormalization (When and Why)
Denormalize when:
- Query patterns are read-heavy (much more than writes)
- Measurement shows join performance is a bottleneck
- Reporting queries need fast access to aggregated data
- Cache invalidation is simpler than join performance
Common denormalization patterns:
- Stored aggregates:
-- Track order count on user without JOIN
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
order_count INT DEFAULT 0
);
-- Keep in sync with trigger or application code
- Purposeful redundancy:
-- Store user email on order to avoid JOIN at read time
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
user_email VARCHAR(255), -- Denormalized for reporting
total_amount DECIMAL(10,2)
);
- Pre-computed views:
-- Materialized view for dashboards
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as order_count,
SUM(total_amount) as total_revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at);
-- Refresh periodically, not on every insert
REFRESH MATERIALIZED VIEW monthly_sales;
Indexing Strategies
Index only when needed. Measure first.
-- ✅ Create index for frequently searched columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- ❌ Avoid: Index on every column
-- ❌ Avoid: Index on low-cardinality columns (boolean flags)
-- ❌ Avoid: Duplicate indexes
Index types:
- Single-column index (most common):
CREATE INDEX idx_users_email ON users(email);
- Composite index (for multi-column WHERE/JOIN):
-- Good for: WHERE user_id = X AND created_at > Y
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
- Partial index (index subset of rows):
-- Index only active users (avoid indexing soft-deleted rows)
CREATE INDEX idx_active_users_email ON users(email)
WHERE deleted_at IS NULL;
- Unique index (enforce constraint):
CREATE UNIQUE INDEX idx_users_email ON users(email);
Index maintenance:
- Monitor for unused indexes (query database statistics)
- Drop unused indexes after measurement
- ANALYZE/VACUUM regularly to update statistics
Primary Keys
Use surrogate keys (auto-incrementing ID) by default:
-- ✅ Recommended: Surrogate key
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255)
);
-- ✅ Good for high-volume tables
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
event_type VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Natural keys only if:
- Column(s) are guaranteed immutable
- Never reassigned or repurposed
- Are short and stable
-- ✅ Natural key (country code is stable, never changes)
CREATE TABLE countries (
code CHAR(2) PRIMARY KEY,
name VARCHAR(255)
);
-- ❌ Natural key (email changes, should use surrogate)
CREATE TABLE users (
email VARCHAR(255) PRIMARY KEY
);
Foreign Keys and Constraints
Always define foreign key relationships:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
total_amount DECIMAL(10,2) NOT NULL
);
-- With explicit constraint name for cleaner error messages
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL,
product_id BIGINT NOT NULL,
quantity INT NOT NULL CHECK (quantity > 0),
CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id),
CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id)
);
Cascade behaviors:
-- DELETE CASCADE: Delete orders when user is deleted (use with caution)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
-- RESTRICT: Prevent user deletion if orders exist (safer default)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
-- SET NULL: Set user_id to NULL if user deleted (for optional relationships)
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
Constraints
Use constraints to enforce data integrity at database level:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
age INT CHECK (age >= 18),
role VARCHAR(50) DEFAULT 'user' CHECK (role IN ('user', 'admin', 'moderator')),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending',
total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount > 0),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Composite constraint
CONSTRAINT valid_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
-- Unique constraint on composite columns
CONSTRAINT unique_user_order_date UNIQUE (user_id, DATE(created_at))
);
ORM Patterns
Active Record vs Data Mapper
Active Record:
- Model contains both data and database logic
- Simple for small projects
- Model directly calls database
- Example: Rails, Django ORM
# Active Record pattern
class User(Model):
name = CharField()
email = CharField()
def save(self):
# Object knows how to save itself
db.insert('users', {...})
@staticmethod
def find_by_email(email):
return db.query('SELECT * FROM users WHERE email = ?', email)
# Usage
user = User(name='John', email='john@example.com')
user.save()
found_user = User.find_by_email('john@example.com')
Data Mapper:
- Model contains only data; repository handles database logic
- Better separation of concerns
- Model is a plain object
- Example: SQLAlchemy, TypeORM
# Data Mapper pattern
class User:
def __init__(self, name, email):
self.name = name
self.email = email
class UserRepository:
def save(self, user):
# Repository handles persistence
db.insert('users', {'name': user.name, 'email': user.email})
def find_by_email(self, email):
row = db.query('SELECT * FROM users WHERE email = ?', email)
return User(row['name'], row['email']) if row else None
# Usage
user = User('John', 'john@example.com')
repo = UserRepository()
repo.save(user)
found_user = repo.find_by_email('john@example.com')
Choose based on project scale:
- Small apps: Active Record (simpler)
- Medium to large: Data Mapper (more flexible)
Query Builders
Use query builders to avoid string concatenation and SQL injection:
# ❌ Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE email = '{email}'"
result = db.execute(query)
# ✅ Safe: Parameterized query
result = db.query('SELECT * FROM users WHERE email = ?', [email])
# ✅ Better: Query builder
result = (
db.select(User)
.where(User.email == email)
.where(User.active == True)
.order_by(User.created_at.desc())
.limit(10)
.execute()
)
Benefits of query builders:
- Prevent SQL injection
- Cleaner, more maintainable code
- Database-agnostic (can switch databases)
- Compose queries dynamically
Eager Loading vs Lazy Loading
Lazy Loading (default, but can cause N+1):
# Each user fetch triggers a separate query for posts
users = db.query(User).limit(10).all()
for user in users:
print(user.posts) # Additional query per user = 10+ queries!
Eager Loading (prevent N+1):
# Single query with JOIN
users = db.query(User).join(Post).limit(10).all()
for user in users:
print(user.posts) # No additional queries
# Or use explicit eager loading
users = db.query(User).options(joinedload(User.posts)).limit(10).all()
N+1 Query Problem
Recognize and fix N+1 problems:
# ❌ N+1 Problem: 1 query for users + N queries for posts
users = db.query(User).all() # 1 query
for user in users:
posts = db.query(Post).filter(Post.user_id == user.id).all() # N more queries
# ✅ Fix 1: Eager loading with JOIN
users = db.query(User).join(Post).distinct().all()
# ✅ Fix 2: Use IN clause for batch loading
user_ids = [u.id for u in users]
posts = db.query(Post).filter(Post.user_id.in_(user_ids)).all()
# Now merge posts back to users in memory
# ✅ Fix 3: Use ORM eager loading
users = db.query(User).options(selectinload(User.posts)).all()
Query Optimization
Index Usage
Write queries that use indexes:
-- ✅ Uses index on email
SELECT * FROM users WHERE email = 'john@example.com';
-- ✅ Uses index on user_id, created_at
SELECT * FROM orders
WHERE user_id = 123 AND created_at > '2024-01-01'
ORDER BY created_at DESC;
-- ❌ Can't use index (function on column)
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Fix: CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- ❌ Can't use index (leading wildcard)
SELECT * FROM users WHERE email LIKE '%@example.com';
-- Fix: Use full-text search or store domain separately
-- ✅ Can use index (trailing wildcard)
SELECT * FROM users WHERE email LIKE 'john%';
Query Analysis (EXPLAIN)
Always analyze slow queries before optimizing:
-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.email = 'john@example.com'
ORDER BY o.created_at DESC
LIMIT 10;
-- MySQL
EXPLAIN
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.email = 'john@example.com'
ORDER BY o.created_at DESC
LIMIT 10;
Look for:
- Seq Scan (full table scan) - add index?
- Hash Join (expensive) - add index on join column
- Sort (expensive) - add index with correct sort order
- High execution time - which step is slow?
Avoiding SELECT *
Fetch only columns you need:
-- ❌ Fetches all columns (slower, more bandwidth)
SELECT * FROM orders WHERE user_id = 123;
-- ✅ Fetch only needed columns
SELECT id, user_id, total_amount, created_at
FROM orders
WHERE user_id = 123;
-- ✅ Reduces memory/bandwidth especially for large text columns
SELECT id, user_id, total_amount
FROM orders
WHERE user_id = 123;
Connection Pooling
Use connection pooling in production:
# ❌ Anti-pattern: New connection per query
def get_user(user_id):
conn = connect() # New connection!
user = conn.query('SELECT * FROM users WHERE id = ?', user_id)
conn.close()
return user
# ✅ Use connection pool
pool = ConnectionPool(
host='localhost',
database='myapp',
min_size=5,
max_size=20,
timeout=30
)
def get_user(user_id):
with pool.get_connection() as conn: # Reuses from pool
return conn.query('SELECT * FROM users WHERE id = ?', user_id)
Pool configuration (tune for your workload):
min_size: Minimum idle connections (default 5-10)max_size: Maximum concurrent connections (default 20-50)timeout: Connection acquisition timeoutidle_timeout: Close idle connections after N seconds
SQL vs NoSQL Considerations
When to Use SQL (ACID, Relations)
Use SQL when:
- Data has strong relationships (orders → users → addresses)
- Consistency is critical (financial transactions, inventory)
- Need complex queries with JOINs
- Data is structured and schema is stable
- Multi-record transactions (ACID guarantees)
Example scenarios:
- E-commerce: Products, Orders, Users with complex relationships
- Banking: Transactions must be atomic and consistent
- CRM: Complex queries across related entities
- Content management: Structured articles with authors, tags, categories
When to Use NoSQL (Scale, Flexibility)
Use NoSQL when:
- Data is unstructured or semi-structured
- Schema evolves rapidly (different object shapes)
- Horizontal scaling is priority (sharding)
- High write throughput needed
- Document-oriented data (JSON-like)
Example scenarios:
- Real-time analytics: Time-series data
- Content platforms: Variable document structures
- Caching layer: Key-value store
- Event streaming: Logs, events, activity feeds
- Catalog systems: Products with variable attributes
Document database (MongoDB, Firebase):
✅ Good: User profiles with flexible attributes
✅ Good: Product catalog with variable specs
❌ Bad: Complex multi-entity queries and JOINs
Document structure:
{
_id: 1,
name: "John",
email: "john@example.com",
preferences: {
language: "en",
theme: "dark",
notifications: true
}
}
Key-value store (Redis, Memcached):
✅ Good: Caching, sessions, rate limiting
✅ Good: Real-time leaderboards
❌ Bad: Complex queries, relationships
Structure:
user:123 → { name, email, created_at }
session:abc123 → { user_id, expires_at }
Graph database (Neo4j):
✅ Good: Social networks, recommendations
✅ Good: Complex relationship queries
❌ Bad: Simple CRUD operations
Relationships:
User -[:FOLLOWS]-> User
User -[:COMMENTED_ON]-> Post
Testing Database Code
Test Databases
Use separate test database:
# config.py
if os.getenv('ENV') == 'test':
DATABASE_URL = 'postgresql://test_user:test_pass@localhost/test_db'
else:
DATABASE_URL = os.getenv('DATABASE_URL')
# conftest.py (pytest)
@pytest.fixture(autouse=True)
def setup_test_db():
"""Create test database and tables before each test."""
# Create tables
db.create_all()
yield
# Cleanup
db.drop_all()
Fixtures and Seeds
Use fixtures for test data:
# conftest.py
import pytest
from app.models import User, Order
@pytest.fixture
def sample_user(db):
"""Create a test user."""
user = User(name='John Doe', email='john@example.com')
db.add(user)
db.commit()
return user
@pytest.fixture
def sample_orders(db, sample_user):
"""Create orders for test user."""
orders = [
Order(user_id=sample_user.id, total_amount=100.00),
Order(user_id=sample_user.id, total_amount=200.00),
]
db.add_all(orders)
db.commit()
return orders
# test_orders.py
def test_get_user_orders(sample_user, sample_orders):
"""Test fetching user orders."""
orders = Order.query.filter_by(user_id=sample_user.id).all()
assert len(orders) == 2
assert sum(o.total_amount for o in orders) == 300.00
Transaction Rollback
Rollback transactions to isolate tests:
# conftest.py - Automatic rollback after each test
@pytest.fixture(autouse=True)
def db_transaction(db):
"""Wrap each test in a transaction that rolls back."""
transaction = db.begin_nested()
yield
transaction.rollback() # Undo all changes from this test
# Or use explicit rollback
def test_create_user():
db.begin()
user = User(name='John', email='john@example.com')
db.add(user)
db.commit()
assert user.id is not None
db.rollback()
# Changes are undone, database is clean for next test
Common Patterns (SQL and NoSQL)
Soft Deletes
Mark records as deleted instead of removing:
-- Add deleted_at column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;
-- Soft delete (update)
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE id = 123;
-- Query only active records
SELECT * FROM users WHERE deleted_at IS NULL;
-- Create index on deleted_at for efficient queries
CREATE INDEX idx_users_active ON users(deleted_at) WHERE deleted_at IS NULL;
Pros: Recoverable, audit trail, can restore data Cons: Need to remember to filter deleted records everywhere
Audit Logs
Track all data changes:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
entity_type VARCHAR(100),
entity_id BIGINT,
action VARCHAR(20), -- CREATE, UPDATE, DELETE
old_values JSONB,
new_values JSONB,
changed_by BIGINT REFERENCES users(id),
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- PostgreSQL trigger to auto-log changes
CREATE FUNCTION audit_trigger() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_log (entity_type, entity_id, action, new_values, changed_at)
VALUES ('user', NEW.id, TG_OP, row_to_json(NEW), NOW());
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_audit AFTER UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger();
Timestamps (Created/Updated)
Track record creation and modification:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Auto-update updated_at on changes (PostgreSQL)
CREATE FUNCTION update_timestamp() RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_update_timestamp BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_timestamp();
Hierarchical Data (Trees)
Store tree structures in relational database:
-- Option 1: Adjacency List (simple, slow to query)
CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
parent_id BIGINT REFERENCES categories(id)
);
-- Query children
SELECT * FROM categories WHERE parent_id = 5;
-- Query ancestors (recursive, expensive)
WITH RECURSIVE ancestors AS (
SELECT id, name, parent_id FROM categories WHERE id = 5
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
JOIN ancestors a ON c.id = a.parent_id
)
SELECT * FROM ancestors;
-- Option 2: Closure Table (trade space for query speed)
CREATE TABLE categories (id BIGSERIAL PRIMARY KEY, name VARCHAR(255));
CREATE TABLE category_closure (
ancestor_id BIGINT REFERENCES categories(id),
descendant_id BIGINT REFERENCES categories(id),
depth INT,
PRIMARY KEY (ancestor_id, descendant_id)
);
-- Query all descendants
SELECT c.* FROM categories c
JOIN category_closure cc ON c.id = cc.descendant_id
WHERE cc.ancestor_id = 5 AND cc.depth > 0;
Pagination
Implement efficient pagination:
-- ❌ OFFSET is slow for large offsets
SELECT * FROM orders LIMIT 10 OFFSET 100000;
-- ✅ Better: Keyset pagination (cursor-based)
SELECT * FROM orders
WHERE id > 12345 -- Last ID from previous page
ORDER BY id
LIMIT 10;
-- ✅ With composite key
SELECT * FROM orders
WHERE (user_id, created_at) > (123, '2024-11-17')
ORDER BY user_id, created_at
LIMIT 10;
Transactions
Use transactions for data consistency:
# ❌ No transaction - inconsistent state if error occurs
user = db.query(User).get(123)
user.balance -= 50
db.commit()
account = db.query(Account).get(456)
account.balance += 50
db.commit() # If this fails, money disappears!
# ✅ Transaction - all-or-nothing
try:
with db.transaction():
user = db.query(User).get(123)
user.balance -= 50
account = db.query(Account).get(456)
account.balance += 50
db.flush()
except Exception:
# Everything rolls back automatically
raise
Common Pitfalls
N+1 Queries
Problem: One query per item instead of batching Fix: Use eager loading or batch queries
Missing Indexes
Problem: Slow queries on large tables Fix: Analyze slow queries with EXPLAIN, add indexes on WHERE/JOIN columns
No Transaction Boundaries
Problem: Inconsistent data if error occurs mid-operation Fix: Wrap multi-step operations in transactions
Over-Normalization
Problem: Too many JOINs make queries slow and complex Fix: Denormalize strategically where proven necessary
Unbounded Queries
Problem: SELECT * without LIMIT causes memory exhaustion Fix: Always LIMIT and paginate large result sets
Wrong Cascade Rules
Problem: Accidental data loss or orphaned records Fix: Choose CASCADE, RESTRICT, or SET NULL deliberately
Storing Passwords in Plain Text
Problem: Security breach if database compromised Fix: Store bcrypt hashes, never plain passwords
Accidental Type Mismatches
Problem: VARCHAR(255) used for numbers, can't query properly Fix: Use correct data types (INTEGER, BIGINT, DECIMAL, not VARCHAR)
Not Testing Migrations
Problem: Migration fails in production, downtime Fix: Test migrations on production-like data before deploying
Schema Changes Without Backward Compatibility
Problem: Old application code breaks with new schema Fix: Support both old and new columns temporarily, add deprecation period
No Query Timeouts
Problem: Slow query locks database, cascading failures Fix: Set statement timeouts and query timeouts in production
See project-specific database configuration in .claude/CLAUDE.md if present.