SQL Fundamentals Skill | Agent Skills

SQL Fundamentals

Foundational SQL patterns covering query writing, schema design, and dialect differences across PostgreSQL, MySQL, and SQLite. This skill serves as a base for specialized SQL skills and SQL-to-X conversions.

Overview

┌─────────────────────────────────────────────────────────────────┐
│                        SQL Skill Hierarchy                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                      ┌──────────────┐                           │
│                      │   lang-sql   │ ◄── You are here          │
│                      │ (foundation) │                           │
│                      └──────┬───────┘                           │
│                             │                                   │
│         ┌───────────────────┼───────────────────┐               │
│         │                   │                   │               │
│         ▼                   ▼                   ▼               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ sql-to-     │    │    sql-     │    │   data-     │         │
│  │  polars     │    │optimization │    │  postgres   │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

This skill covers:

Query writing (SELECT, JOIN, CTEs, window functions)
Schema design (tables, constraints, normalization)
Dialect differences (PostgreSQL, MySQL, SQLite)
Performance basics (EXPLAIN, indexing fundamentals)
SQL syntax patterns useful for conversion

This skill does NOT cover:

Deep optimization strategies - see sql-optimization-patterns
ORM usage (SQLAlchemy, Prisma, etc.)
Database administration (backups, replication, users)
Platform-specific features (stored procedures, triggers)

Quick Reference

| Task | PostgreSQL | MySQL | SQLite | |------|------------|-------|--------| | Show tables | \dt | SHOW TABLES | .tables | | Describe table | \d table | DESCRIBE table | .schema table | | Current database | SELECT current_database() | SELECT DATABASE() | N/A (file-based) | | List indexes | \di | SHOW INDEX FROM table | .indexes table | | Explain query | EXPLAIN ANALYZE | EXPLAIN | EXPLAIN QUERY PLAN |

Query Patterns

SELECT Fundamentals

-- Basic SELECT with filtering
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1 DESC
LIMIT 10;

-- Column aliases
SELECT
    first_name AS "First Name",
    last_name AS "Last Name",
    salary * 12 AS annual_salary
FROM employees;

JOIN Types

-- INNER JOIN (matching rows only)
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

-- LEFT JOIN (all from left, matching from right)
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;

-- Multiple JOINs
SELECT
    o.id,
    c.name as customer,
    p.name as product
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;

Common Table Expressions (CTEs)

-- Basic CTE
WITH active_users AS (
    SELECT id, name, email
    FROM users
    WHERE status = 'active'
)
SELECT * FROM active_users WHERE email LIKE '%@company.com';

-- Multiple CTEs
WITH
    recent_orders AS (
        SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
    ),
    order_totals AS (
        SELECT customer_id, SUM(amount) as total
        FROM recent_orders
        GROUP BY customer_id
    )
SELECT c.name, ot.total
FROM order_totals ot
JOIN customers c ON ot.customer_id = c.id;

-- Recursive CTE (hierarchical data)
WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers
    SELECT id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees under managers
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, name;

Window Functions

-- ROW_NUMBER: Sequential numbering within partition
SELECT
    department,
    name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

-- Running totals
SELECT
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;

-- Compare to previous row
SELECT
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) as prev_revenue,
    revenue - LAG(revenue) OVER (ORDER BY date) as change
FROM daily_sales;

-- Percentile ranking
SELECT
    name,
    score,
    PERCENT_RANK() OVER (ORDER BY score) as percentile
FROM test_results;

Subqueries

-- Scalar subquery (returns single value)
SELECT name, salary,
    (SELECT AVG(salary) FROM employees) as avg_salary
FROM employees;

-- IN subquery
SELECT * FROM products
WHERE category_id IN (
    SELECT id FROM categories WHERE name LIKE '%Electronics%'
);

-- EXISTS subquery (often more efficient)
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.id
    AND o.created_at > NOW() - INTERVAL '30 days'
);

-- Correlated subquery (references outer query)
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);

Aggregation Patterns

GROUP BY

-- Basic aggregation
SELECT
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    MIN(salary) as min_salary,
    MAX(salary) as max_salary
FROM employees
GROUP BY department;

-- HAVING (filter groups)
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

-- Multiple grouping columns
SELECT
    department,
    job_title,
    COUNT(*) as count
FROM employees
GROUP BY department, job_title
ORDER BY department, count DESC;

CASE Expressions

-- Conditional aggregation
SELECT
    COUNT(CASE WHEN status = 'active' THEN 1 END) as active_count,
    COUNT(CASE WHEN status = 'inactive' THEN 1 END) as inactive_count,
    COUNT(*) as total
FROM users;

-- Bucketing data
SELECT
    CASE
        WHEN age < 18 THEN 'minor'
        WHEN age < 65 THEN 'adult'
        ELSE 'senior'
    END as age_group,
    COUNT(*) as count
FROM users
GROUP BY 1;  -- Group by first select column

Schema Design

Table Creation

-- Basic table with constraints
CREATE TABLE users (
    id SERIAL PRIMARY KEY,           -- PostgreSQL
    -- id INT AUTO_INCREMENT PRIMARY KEY,  -- MySQL
    -- id INTEGER PRIMARY KEY AUTOINCREMENT,  -- SQLite

    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Table with foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    status VARCHAR(20) DEFAULT 'pending',
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Many-to-many junction table
CREATE TABLE user_roles (
    user_id INT REFERENCES users(id) ON DELETE CASCADE,
    role_id INT REFERENCES roles(id) ON DELETE CASCADE,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, role_id)
);

Normalization Quick Guide

| Form | Rule | Example | |------|------|---------| | 1NF | No repeating groups | Split phone1, phone2 into separate rows | | 2NF | No partial dependencies | Move dept_name to departments table if key is (emp_id, dept_id) | | 3NF | No transitive dependencies | Move city, state to addresses if they depend on zip_code |

Common Constraints

-- CHECK constraint
ALTER TABLE products
ADD CONSTRAINT positive_price CHECK (price > 0);

-- UNIQUE constraint on multiple columns
ALTER TABLE subscriptions
ADD CONSTRAINT unique_user_plan UNIQUE (user_id, plan_id);

-- NOT NULL with default
ALTER TABLE users
ALTER COLUMN status SET DEFAULT 'active',
ALTER COLUMN status SET NOT NULL;

Index Basics

When to Index

| Index | Use Case | |-------|----------| | Primary key | Automatic, unique identifier | | Foreign key | Speed up JOINs | | Frequently filtered columns | WHERE clauses | | Frequently sorted columns | ORDER BY clauses | | Composite | Multi-column WHERE/ORDER |

Index Creation

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
-- Good for: WHERE status = 'active' AND created_at > '2024-01-01'
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

Reading EXPLAIN

-- PostgreSQL: Full analysis with timing
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Key things to look for:
-- ✓ Index Scan / Index Only Scan - good
-- ✗ Seq Scan on large tables - investigate
-- ✗ High "actual rows" vs "estimated rows" - stale statistics

Dialect Differences

String Concatenation

-- PostgreSQL
SELECT first_name || ' ' || last_name AS full_name FROM users;

-- MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;

-- SQLite (both work)
SELECT first_name || ' ' || last_name AS full_name FROM users;

Date/Time Operations

-- Current timestamp
-- PostgreSQL: NOW(), CURRENT_TIMESTAMP
-- MySQL: NOW(), CURRENT_TIMESTAMP
-- SQLite: datetime('now')

-- Date arithmetic
-- PostgreSQL
SELECT created_at + INTERVAL '7 days' FROM orders;

-- MySQL
SELECT DATE_ADD(created_at, INTERVAL 7 DAY) FROM orders;

-- SQLite
SELECT datetime(created_at, '+7 days') FROM orders;

-- Extract parts
-- PostgreSQL
SELECT EXTRACT(YEAR FROM created_at) FROM orders;

-- MySQL
SELECT YEAR(created_at) FROM orders;

-- SQLite
SELECT strftime('%Y', created_at) FROM orders;

UPSERT (Insert or Update)

-- PostgreSQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

-- MySQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON DUPLICATE KEY UPDATE name = VALUES(name);

-- SQLite (3.24+)
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = excluded.name;

Pagination

-- Standard (PostgreSQL, MySQL, SQLite)
SELECT * FROM products ORDER BY id LIMIT 10 OFFSET 20;

-- Keyset pagination (more efficient for large offsets)
SELECT * FROM products
WHERE id > 1000  -- Last seen ID
ORDER BY id
LIMIT 10;

Boolean Handling

-- PostgreSQL: Native BOOLEAN
SELECT * FROM users WHERE is_active = true;

-- MySQL: TINYINT(1) or BOOLEAN (alias)
SELECT * FROM users WHERE is_active = 1;

-- SQLite: INTEGER (0/1)
SELECT * FROM users WHERE is_active = 1;

SQL for Conversion

When converting SQL to DataFrame operations (Pandas, Polars), map these patterns:

| SQL | DataFrame Equivalent | |-----|---------------------| | SELECT col1, col2 | .select(["col1", "col2"]) | | WHERE condition | .filter(condition) | | ORDER BY col DESC | .sort("col", descending=True) | | LIMIT n | .head(n) or .limit(n) | | GROUP BY | .group_by() | | JOIN | .join() | | DISTINCT | .unique() or .distinct() |

See derivative skills for specific conversion patterns:

sql-to-polars - SQL to Polars DataFrame
sql-to-pandas - SQL to Pandas DataFrame

Anti-Patterns to Avoid

1. SELECT *

-- Bad: Fetches unnecessary data
SELECT * FROM users;

-- Good: Only fetch needed columns
SELECT id, name, email FROM users;

2. N+1 Queries

-- Bad: Query per user (in application loop)
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... repeated N times

-- Good: Single query with JOIN or IN
SELECT * FROM orders WHERE user_id IN (1, 2, 3, ...);

3. Functions on Indexed Columns

-- Bad: Prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Good: Store normalized, or use expression index
SELECT * FROM users WHERE email = 'test@example.com';

4. Implicit Type Conversion

-- Bad: String compared to integer
SELECT * FROM users WHERE id = '123';

-- Good: Matching types
SELECT * FROM users WHERE id = 123;

5. Missing WHERE on UPDATE/DELETE

-- DANGEROUS: Affects all rows!
UPDATE users SET status = 'inactive';

-- Safe: Always include WHERE
UPDATE users SET status = 'inactive' WHERE last_login < '2024-01-01';

Troubleshooting

Query Returns No Results

Check WHERE conditions: Test each condition separately
Verify JOIN keys: Ensure matching data types and values
NULL handling: Use IS NULL not = NULL
Case sensitivity: Check collation settings

Query Too Slow

Run EXPLAIN: Look for Seq Scans on large tables
Check indexes: Are filtered/joined columns indexed?
Reduce data early: Filter before JOINing
**Avoid SELECT ***: Fetch only needed columns

Unexpected Duplicates

Missing DISTINCT: Add if needed
Many-to-many JOINs: Each match creates a row
GROUP BY missing columns: All non-aggregated columns must be grouped

NULL Surprises

-- NULL comparisons always return NULL (unknown)
SELECT * FROM users WHERE department = NULL;  -- Returns nothing!
SELECT * FROM users WHERE department IS NULL; -- Correct

-- NULL in aggregations
SELECT AVG(salary) FROM employees;  -- NULLs ignored
SELECT COUNT(*) vs COUNT(column);   -- COUNT(*) includes NULL rows

References

PostgreSQL Documentation
MySQL Reference Manual
SQLite Documentation
sql-optimization-patterns - Deep performance optimization
sql-expert - Advanced query patterns

Agent Skills: SQL Fundamentals

Install this agent skill to your local

Skill Files