Agent Skills: SQL Optimization — Comprehensive Reference

SQL optimization for OLTP systems: EXPLAIN analysis, indexing, schema design, migrations, HA, and security across major SQL platforms.

UncategorizedID: vasilyu1983/ai-agents-public/data-sql-optimization

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vasilyu1983/AI-Agents-public/tree/HEAD/frameworks/shared-skills/skills/data-sql-optimization

Skill Files

Browse the full folder contents for data-sql-optimization.

Download Skill

Loading file tree…

frameworks/shared-skills/skills/data-sql-optimization/SKILL.md

Skill Metadata

Name
data-sql-optimization
Description
"SQL optimization for OLTP systems: EXPLAIN analysis, indexing, schema design, migrations, HA, and security across major SQL platforms."

SQL Optimization — Comprehensive Reference

This skill provides actionable checklists, patterns, and templates for transactional (OLTP) SQL optimization: measurement-first triage, EXPLAIN/plan interpretation, balanced indexing (avoiding over-indexing), performance monitoring, schema evolution, migrations, backup/recovery, high availability, and security.

Supported Platforms: PostgreSQL, MySQL, SQL Server, Oracle, SQLite

For OLAP/Analytics: See data-lake-platform (ClickHouse, DuckDB, Doris, StarRocks)


Quick Reference

| Task | Tool/Framework | Command | When to Use | |------|----------------|---------|-------------| | Query Performance Analysis | EXPLAIN ANALYZE | EXPLAIN (ANALYZE, BUFFERS) SELECT ... (PG) / EXPLAIN ANALYZE SELECT ... (MySQL) | Diagnose slow queries, identify missing indexes | | Find Slow Queries | pg_stat_statements / slow query log | SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; | Identify performance bottlenecks in production | | Index Analysis | pg_stat_user_indexes / SHOW INDEX | SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0; | Find unused indexes, validate index coverage | | Schema Migration | Flyway / Liquibase | flyway migrate / liquibase update | Version-controlled database changes | | Backup & Recovery | pg_dump / mysqldump | pg_dump -Fc dbname > backup.dump | Point-in-time recovery, disaster recovery | | Replication Setup | Streaming / GTID | Configure postgresql.conf / my.cnf | High availability, read scaling | | Safe Tuning Loop | Measure -> Explain -> Change -> Verify | Use tuning worksheet template | Reduce latency/cost without regressions |


Decision Tree: Choosing the Right Approach

Query performance issue?
    ├─ Identify slow queries first?
    │   ├─ PostgreSQL -> pg_stat_statements (top queries by total_exec_time)
    │   └─ MySQL -> Performance Schema / slow query log
    │
    ├─ Analyze execution plan?
    │   ├─ PostgreSQL -> EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
    │   ├─ MySQL -> EXPLAIN FORMAT=JSON or EXPLAIN ANALYZE
    │   └─ SQL Server -> SET STATISTICS IO ON; SET STATISTICS TIME ON;
    │
    ├─ Need indexing strategy?
    │   ├─ PostgreSQL -> B-tree (default), GIN (JSONB), GiST (spatial), partial indexes
    │   ├─ MySQL -> BTREE (default), FULLTEXT (text search), SPATIAL
    │   └─ Check: Table >10k rows AND selectivity <10% AND 10x+ speedup verified
    │
    ├─ Schema changes needed?
    │   ├─ New database -> template-schema-design.md
    │   ├─ Modify schema -> template-migration.md (Flyway/Liquibase)
    │   └─ Large tables (MySQL) -> gh-ost / pt-online-schema-change (avoid locks)
    │
    ├─ High availability setup?
    │   ├─ PostgreSQL -> Streaming replication (template-replication-ha.md)
    │   └─ MySQL -> GTID-based replication (template-replication-ha.md)
    │
    ├─ Backup/disaster recovery?
    │   └─ template-backup-restore.md (pg_dump, mysqldump, PITR)
    │
    └─ Analytics on large datasets (OLAP)?
        └─ See data-lake-platform (ClickHouse, DuckDB, Doris, StarRocks)

When to Use This Skill

Codex should invoke this skill when users ask for:

Query Optimization (Modern Approaches)

  • SQL query performance review and tuning
  • EXPLAIN/plan interpretation with optimization suggestions
  • Index creation strategies with balanced approach (avoiding over-indexing)
  • Troubleshooting slow queries using pg_stat_statements or Performance Schema
  • Identifying and remediating SQL anti-patterns with operational fixes
  • Query rewrite suggestions or migration from slow to fast patterns
  • Statistics maintenance and auto-analyze configuration

Database Operations

  • Schema design with normalization and performance trade-offs
  • Database migrations with version control (Liquibase, Flyway)
  • Backup and recovery strategies (point-in-time recovery, automated testing)
  • High availability and replication setup (streaming, GTID-based)
  • Database security auditing (access controls, encryption, SQL injection prevention)
  • Lock analysis and deadlock troubleshooting
  • Connection pooling (pgBouncer, Pgpool-II, ProxySQL)

Performance Tuning (Modern Standards)

  • Memory configuration (work_mem, shared_buffers, effective_cache_size)
  • Automated monitoring with pg_stat_statements and query pattern analysis
  • Index health monitoring (unused index detection, index bloat analysis)
  • Vacuum strategy and autovacuum tuning (PostgreSQL)
  • InnoDB buffer pool optimization (MySQL)
  • Partition pruning improvements (PostgreSQL 18+)

Resources (Best Practices Guides)

Find detailed operational patterns and quick references in:

Each file includes:

  • Copy-paste ready checklists (e.g., "query review", "index design", "explain review")
  • Anti-patterns with operational fixes and alternatives
  • Query rewrite and indexing strategies with examples
  • Troubleshooting guides (step-by-step)

Templates (Copy-Paste Ready)

Templates are organized by database technology for precision and clarity:

Cross-Platform Templates (All Databases)

PostgreSQL Templates

MySQL Templates

Microsoft SQL Server Templates

Oracle Templates

SQLite Templates


Related Skills

Infrastructure & Operations:

Application Integration:

Quality & Security:

Data Engineering:


Navigation

Resources

Templates

Data


Operational Deep Dives

See references/operational-patterns.md for:

  • End-to-end optimization checklists and anti-pattern fixes
  • Database-specific quick references (PostgreSQL, MySQL, SQL Server, Oracle, SQLite)
  • Slow query troubleshooting workflow and reliability drills
  • Template selection decision tree and platform migration notes

Do / Avoid

GOOD: Do

  • Measure baseline before any optimization
  • Change one variable at a time
  • Verify results match after query changes
  • Update statistics before concluding "needs index"
  • Test with production-like data volumes
  • Document all optimization decisions
  • Include performance tests in CI/CD

BAD: Avoid

  • Adding indexes without checking if they'll be used
  • Using SELECT * in production queries
  • Optimizing for test data (use representative volumes)
  • Ignoring write performance impact of indexes
  • Skipping EXPLAIN analysis before changes
  • Multiple simultaneous changes (can't attribute improvement)
  • N+1 query patterns in application code

Anti-Patterns Quick Reference

| Anti-Pattern | Problem | Fix | |--------------|---------|-----| | **SELECT *** | Reads unnecessary columns | Explicit column list | | N+1 queries | Multiplied round trips | JOIN or batch fetch | | Missing WHERE | Full table scan | Add predicates | | Function on indexed column | Can't use index | Move function to RHS | | Implicit type conversion | Index bypass | Match types explicitly | | LIKE '%prefix' | Leading wildcard = scan | Full-text search | | Unbounded result set | Memory explosion | Add LIMIT/pagination | | OR conditions | Index may not be used | UNION or rewrite |

See references/sql-antipatterns.md for detailed fixes.


OLTP vs OLAP Decision Tree

Is your query for...?
├─ Point lookups (by ID/key)?
│   └─ OLTP database (this skill)
│       - Ensure proper indexes
│       - Use connection pooling
│       - Optimize for low latency
│
├─ Aggregations over recent data (dashboard)?
│   └─ OLTP database (this skill)
│       - Consider materialized views
│       - Index common filter columns
│       - Watch for lock contention
│
├─ Full table scans or historical analysis?
│   └─ OLAP database (data-lake-platform)
│       - ClickHouse, DuckDB, Doris
│       - Columnar storage
│       - Partitioning by date
│
└─ Mixed workload (both)?
    └─ Separate OLTP and OLAP
        - OLTP for transactions
        - Replicate to OLAP for analytics
        - Avoid running analytics on primary

Optional: AI/Automation

Note: AI tools assist but require human validation of correctness.

  • EXPLAIN summarization — Identify bottlenecks from complex plans
  • Query rewrite suggestions — Must verify result equivalence
  • Index recommendations — Check selectivity and write impact first

Bounded Claims

  • AI cannot determine correct query results
  • Automated index suggestions may miss workload context
  • Human review required for production changes

Analytical Databases (OLAP)

For OLAP databases and data lake infrastructure, see data-lake-platform:

  • Query engines: ClickHouse, DuckDB, Apache Doris, StarRocks
  • Table formats: Apache Iceberg, Delta Lake, Apache Hudi
  • Transformation: SQLMesh, dbt (staging/marts layers)
  • Ingestion: dlt, Airbyte (connectors)
  • Streaming: Apache Kafka patterns

This skill focuses on transactional database optimization (PostgreSQL, MySQL, SQL Server, Oracle, SQLite). Use data-lake-platform for analytical workloads.


Related Skills

This skill focuses on query optimization within a single database. For related workflows:

SQL Transformation & Analytics Engineering: -> ai-ml-data-science skill

  • SQLMesh templates for building staging/intermediate/marts layers
  • Incremental models (FULL, INCREMENTAL_BY_TIME_RANGE, INCREMENTAL_BY_UNIQUE_KEY)
  • DAG management and model dependencies
  • Unit tests and audits for SQL transformations

Data Ingestion (Loading into Warehouses): -> ai-mlops skill

  • dlt templates for extracting from REST APIs, databases
  • Loading to Snowflake, BigQuery, Redshift, Postgres, DuckDB
  • Incremental loading patterns (timestamp, ID-based, merge/upsert)
  • Database replication (Postgres, MySQL, MongoDB -> warehouse)

Data Lake Infrastructure: -> data-lake-platform skill

  • ClickHouse, DuckDB, Doris, StarRocks query engines
  • Iceberg, Delta Lake, Hudi table formats
  • Kafka streaming, Dagster/Airflow orchestration

Use Case Decision:

  • Query is slow in production -> Use this skill (data-sql-optimization)
  • Building feature pipelines in SQL -> Use ai-ml-data-science (SQLMesh)
  • Loading data from APIs/DBs to warehouse -> Use ai-mlops (dlt)
  • Analytics on large datasets (OLAP) -> Use data-lake-platform

External Resources

See data/sources.json for 62+ curated resources including:

Core Documentation:

  • RDBMS Documentation: PostgreSQL, MySQL, SQL Server, Oracle, SQLite, DuckDB official docs
  • Query Optimization: Use The Index, Luke, SQL Performance Explained, vendor optimization guides
  • Schema Design: Database Refactoring (Fowler), normalization guides, data type selection

Modern Optimization (Current):

  • PostgreSQL: official release notes and "current" docs for planner/optimizer changes
  • MySQL: official reference manual sections for EXPLAIN, optimizer, and Performance Schema
  • SQL Server / Oracle: official docs for execution plans, indexing, and concurrency controls

Operations & Infrastructure:

  • HA & Replication: Streaming replication, GTID-based replication, failover automation
  • Migrations: Liquibase, Flyway version control and deployment patterns
  • Backup/Recovery: pgBackRest, Percona XtraBackup, point-in-time recovery
  • Monitoring: pg_stat_statements, Performance Schema, EXPLAIN visualizers (Dalibo, depesz)
  • Security: OWASP SQL Injection Prevention, Postgres hardening, encryption standards
  • Analytical Databases: DuckDB extensions, Parquet specification, columnar storage patterns

Use references/operational-patterns.md and the templates directory for detailed workflows, migration notes, and ready-to-run commands.

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.