SQL Query Optimizer
Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.
Overview
This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.
Capabilities
- Query execution plan analysis - Parse and analyze EXPLAIN outputs
- Index recommendations - Suggest clustering keys, sort keys, partition keys
- Join optimization - Identify inefficient join patterns and suggest improvements
- Subquery elimination - Convert correlated subqueries to CTEs or joins
- CTE optimization - Materialize vs reference optimization
- Window function optimization - Frame and partition optimization
- Predicate pushdown validation - Verify filter pushdown effectiveness
- Clustering key recommendations - Platform-specific clustering strategies
- Materialized view suggestions - Identify candidates for materialized views
- Platform-specific optimizations - Snowflake, BigQuery, Redshift, Databricks
Input Schema
{
"query": {
"type": "string",
"description": "The SQL query to analyze",
"required": true
},
"platform": {
"type": "string",
"enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
"required": true,
"description": "Target data warehouse platform"
},
"tableStatistics": {
"type": "object",
"description": "Table statistics including row counts, column cardinality",
"properties": {
"tables": {
"type": "array",
"items": {
"name": "string",
"rowCount": "number",
"sizeGB": "number",
"columns": "array"
}
}
}
},
"executionPlan": {
"type": "object",
"description": "Query execution plan (EXPLAIN output)"
},
"queryHistory": {
"type": "object",
"description": "Historical query performance metrics"
},
"optimizationGoals": {
"type": "array",
"items": {
"type": "string",
"enum": ["latency", "cost", "throughput", "scan_reduction"]
},
"default": ["latency", "cost"]
}
}
Output Schema
{
"optimizedQuery": {
"type": "string",
"description": "The optimized SQL query"
},
"improvements": {
"type": "array",
"items": {
"type": {
"type": "string",
"enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
},
"description": "string",
"impact": "high|medium|low",
"lineNumber": "number",
"originalCode": "string",
"optimizedCode": "string"
}
},
"indexRecommendations": {
"type": "array",
"items": {
"table": "string",
"type": "clustering|sort|partition|index",
"columns": "array",
"rationale": "string",
"ddl": "string"
}
},
"estimatedImprovement": {
"scanReduction": {
"type": "number",
"description": "Percentage reduction in data scanned"
},
"timeReduction": {
"type": "number",
"description": "Percentage reduction in execution time"
},
"costReduction": {
"type": "number",
"description": "Percentage reduction in query cost"
}
},
"antiPatterns": {
"type": "array",
"items": {
"pattern": "string",
"severity": "high|medium|low",
"location": "string",
"suggestion": "string"
}
},
"platformSpecificNotes": {
"type": "array",
"items": "string"
}
}
Usage Examples
Basic Query Optimization
{
"query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
"platform": "snowflake"
}
With Execution Plan Analysis
{
"query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
"platform": "bigquery",
"executionPlan": {
"stages": [...],
"totalBytesProcessed": 1073741824
},
"optimizationGoals": ["cost", "scan_reduction"]
}
With Table Statistics
{
"query": "SELECT ... complex query ...",
"platform": "redshift",
"tableStatistics": {
"tables": [
{
"name": "orders",
"rowCount": 10000000,
"sizeGB": 50,
"columns": [
{"name": "order_id", "cardinality": 10000000},
{"name": "customer_id", "cardinality": 500000}
]
}
]
}
}
Platform-Specific Optimizations
Snowflake
| Optimization | Description | |--------------|-------------| | Clustering keys | Recommend micro-partition clustering | | Result cache | Identify queries benefiting from caching | | Query acceleration | Suggest QUERY_ACCELERATION_MAX_SCALE_FACTOR | | Warehouse sizing | Right-size warehouse recommendations |
BigQuery
| Optimization | Description | |--------------|-------------| | Partitioning | DATE/TIMESTAMP partitioning recommendations | | Clustering | Up to 4 clustering columns | | BI Engine | Identify BI Engine-eligible queries | | Slots | Estimate slot usage optimization |
Redshift
| Optimization | Description | |--------------|-------------| | Sort keys | COMPOUND vs INTERLEAVED recommendations | | Distribution | KEY, EVEN, ALL distribution strategies | | Compression | Column encoding recommendations | | Vacuum | VACUUM and ANALYZE recommendations |
Databricks
| Optimization | Description | |--------------|-------------| | Z-ordering | Multi-column Z-order recommendations | | Delta cache | Caching strategy recommendations | | Photon | Photon-eligible query patterns | | Adaptive execution | AQE configuration suggestions |
Common Anti-Patterns Detected
Query Structure
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | SELECT * | High | Specify columns explicitly | | Correlated subqueries | High | Convert to JOIN or CTE | | DISTINCT on large datasets | Medium | Use GROUP BY or window functions | | Non-SARGable predicates | High | Rewrite for index usage |
Join Issues
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | Cartesian products | Critical | Add join conditions | | Implicit joins | Medium | Use explicit JOIN syntax | | Wrong join order | High | Reorder by selectivity | | Missing indexes on join keys | High | Add clustering/sort keys |
Aggregation Issues
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | GROUP BY ordinal | Low | Use column names | | Aggregating before filter | High | Filter first, then aggregate | | Over-grouping | Medium | Reduce GROUP BY columns |
Integration Points
MCP Server Integration
- Snowflake MCP - Real-time execution plan analysis
- BigQuery MCP - Cost estimation and slot analysis
- Redshift MCP - Query execution and statistics
Related Skills
- Data Quality Profiler (SK-DEA-005) - Table statistics gathering
- dbt Project Analyzer (SK-DEA-003) - Model query optimization
Applicable Processes
- Query Optimization (
query-optimization.js) - Data Warehouse Setup (
data-warehouse-setup.js) - BI Dashboard Development (
bi-dashboard.js) - OBT Creation (
obt-creation.js)
References
- Snowflake Query Performance
- BigQuery Best Practices
- Redshift Performance Tuning
- Databricks SQL Best Practices
Version History
- 1.0.0 - Initial release with multi-platform support