Agent Skills: ClickHouse Query Optimization

20+ optimization rules for ClickHouse queries. Load when writing queries, debugging slow queries, fixing memory errors, or optimizing JOINs. Achieves 10-100x faster queries, 90% less memory usage.

UncategorizedID: obsessiondb/clickhouse-plugin/clickhouse-query-optimization

Install this agent skill to your local

pnpm dlx add-skill https://github.com/obsessiondb/clickhouse-skills/tree/HEAD/skills/clickhouse-query-optimization

Skill Files

Browse the full folder contents for clickhouse-query-optimization.

Download Skill

Loading file tree…

skills/clickhouse-query-optimization/SKILL.md

Skill Metadata

Name
clickhouse-query-optimization
Description
"20+ optimization rules for ClickHouse queries. Load when writing queries, debugging slow queries, fixing memory errors, or optimizing JOINs. Achieves 10-100x faster queries, 90% less memory usage."

ClickHouse Query Optimization

Load when writing, debugging, or optimizing ClickHouse queries.

Goals

  • 10-100x faster query execution
  • 90%+ reduction in memory usage
  • Eliminate full table scans (read <1% of data)
  • JOINs that complete in milliseconds instead of minutes

Reference Documentation

Search terms: query optimization, PREWHERE, JOIN, dictionary, EXPLAIN, performance, slow query, memory limit, timeout, full table scan, dictGet, query cache, DELETE, UPDATE, mutation, aggregation, GROUP BY, COUNT DISTINCT, uniq, query_log

Critical Rules

[CRITICAL]

  1. Filter on ORDER BY prefix columns. Queries that skip the prefix scan the entire table.
  2. *Avoid SELECT . Only select columns you need.
  3. Always check EXPLAIN indexes=1. Verify index usage and row estimates before optimizing.

[HIGH]

  1. Use PREWHERE for selective filters. Reduces I/O before main WHERE clause.
  2. Prefer dictionaries over JOINs. For key-value lookups on tables <10M rows.
  3. Use IN subqueries for existence checks. Faster than JOIN when you don't need columns from the other table.

[MEDIUM]

  1. Limit JOINs when possible. Each JOIN increases complexity exponentially.

Querying basics: ORDER BY, indexes and PREWHERE

ORDER BY

The ORDER BY clause in table definition determines query performance. ClickHouse stores data sorted by these columns and uses a sparse primary index to skip irrelevant data.

How it works:

  • Data is divided into granules (default 8192 rows), can vary
  • Primary index stores first value of each granule for ORDER BY columns
  • Queries filtering on ORDER BY prefix in exact order or a prefix of it skip entire granules
-- Table: ORDER BY (tenant_id, event_date, user_id)

-- FAST: Filters on ORDER BY prefix (left to right)
WHERE tenant_id = 1                              -- Uses index
WHERE tenant_id = 1 AND event_date = '2024-01-15' -- Uses index
WHERE tenant_id = 1 AND event_date >= '2024-01-01' AND user_id = 123 -- Uses index

-- SLOW: Skips columns or filters only on later columns
WHERE event_date = '2024-01-15'                  -- Skips tenant_id, full scan
WHERE user_id = 123                              -- Skips tenant_id and event_date, full scan

Key insight: Always filter on the leftmost ORDER BY columns first for best performance.

Indexes

Skip indexes (data skipping indexes) help when filtering on columns NOT in ORDER BY:

-- Table has ORDER BY (tenant_id, event_date, user_id)
-- But you need to query by session_id

-- Add bloom filter index
ALTER TABLE events
ADD INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 4;

-- Now this query can skip granules instead of full scan
SELECT * FROM events WHERE session_id = 'abc123';

Check if index is used:

EXPLAIN indexes = 1
SELECT * FROM events WHERE session_id = 'abc123';

-- Look for "Skip" section showing granules skipped

Common index types:

  • bloom_filter - High-cardinality columns (UUIDs, session IDs)
  • minmax - Range queries on numeric/date columns
  • set(N) - Low-cardinality columns (<N unique values)

PREWHERE vs WHERE

| Clause | When columns are read | Use for | | -------- | ----------------------------------- | ---------------------------------- | | PREWHERE | Before other columns | Small columns, selective filters | | WHERE | After all selected columns are read | Large columns, complex expressions |

-- Small status column in PREWHERE, large error_message in WHERE
SELECT * FROM events
PREWHERE status = 'error'
WHERE error_message LIKE '%timeout%';

## Column Selection

### Always Specify Columns

```sql
-- BAD: Reads all columns from disk
SELECT * FROM events WHERE user_id = 123;

-- GOOD: Reads only needed columns
SELECT event_time, event_type, properties
FROM events
WHERE user_id = 123;

Full Example

-- Table definition
CREATE TABLE events (
    tenant_id UInt32,
    event_date Date,
    user_id UInt64,
    status LowCardinality(String),    -- 4 bytes per row
    error_message String,              -- 500 bytes avg per row
    payload String                     -- 2KB avg per row
) ENGINE = MergeTree()
ORDER BY (tenant_id, event_date, user_id);

-- Table has 100M rows across 12,000 granules (8192 rows each)
-- Query
SELECT user_id, error_message, payload
FROM events
PREWHERE tenant_id = 5 AND event_date = '2024-01-15' AND status = 'error'
WHERE error_message LIKE '%timeout%';

Step 1: Primary index (granule filtering)

Primary index stores first row of each granule:
  Granule 0:    tenant_id=1,  event_date=2024-01-01
  Granule 1:    tenant_id=1,  event_date=2024-01-01
  ...
  Granule 846:  tenant_id=5,  event_date=2024-01-14
  Granule 847:  tenant_id=5,  event_date=2024-01-15  ← Match
  Granule 848:  tenant_id=5,  event_date=2024-01-15  ← Match
  Granule 849:  tenant_id=5,  event_date=2024-01-15  ← Match
  Granule 850:  tenant_id=5,  event_date=2024-01-16
  ...

Result: Read 3 granules out of 12,000 (skip 99.97% of data)

Note: Skip indexes work similarly—they allow ClickHouse to skip granules for columns not in ORDER BY. Both mechanisms achieve the same goal: read less data by eliminating irrelevant granules before touching the actual rows.

Step 2: PREWHERE (column filtering within granules)

For the 3 matching granules (24,576 rows):

  1. Read tenant_id column      →  98 KB   (4 bytes × 24,576)
  2. Read event_date column     →  98 KB   (4 bytes × 24,576)
  3. Read status column         →  98 KB   (4 bytes × 24,576)
  4. Filter: 1,247 rows match PREWHERE conditions

  5. Read user_id column        →  10 KB   (8 bytes × 1,247 rows only)
  6. Read error_message column  →  623 KB  (500 bytes × 1,247 rows only)
  7. Read payload column        →  2.4 MB  (2KB × 1,247 rows only)

Total read: ~3.3 MB
Without PREWHERE: ~62 MB (read all columns for all 24,576 rows)

Step 3: WHERE (final filtering)

Apply: error_message LIKE '%timeout%'
Result: 43 rows returned

Impact

Reading fewer columns:

  • Reduces disk I/O
  • Improves compression efficiency
  • Lowers memory usage

JOIN Optimization

Decision Tree

Need data from another table?
│
├─ Only filtering rows, don't need columns from other table?
│  Example: "Get orders, but only for premium users"
│  └─ Use IN subquery
│
├─ Need columns from other table?
│  Example: "Get orders WITH user name and email"
│  │
│  ├─ Small lookup table (<10M rows), staleness OK?
│  │  └─ Use Dictionary
│  │
│  └─ Need real-time data or large table?
│     └─ Use JOIN with filtered subqueries
│
└─ Same join needed on every query?
   └─ Denormalize at insert time

Full Example: Dictionary vs JOIN

-- Tables
-- events: 500M rows, ORDER BY (tenant_id, event_date, user_id)
-- users: 2M rows (user_id, name, tier, country)

-- Task: Get user name and tier for today's events

Slow: JOIN

SELECT e.event_time, e.event_type, u.name, u.tier
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.tenant_id = 1 AND e.event_date = today();

-- Execution:
-- 1. Scan events: 850K rows
-- 2. Build hash table from users: 2M rows loaded into memory
-- 3. Probe hash table for each event row
-- Time: 3.2s, Memory: 890MB

Fast: Dictionary

-- Create once
CREATE DICTIONARY user_dict (
    user_id UInt64,
    name String,
    tier String
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(TABLE 'users'))
LAYOUT(HASHED())
LIFETIME(300);

-- Query
SELECT
    event_time,
    event_type,
    dictGet('user_dict', 'name', user_id) AS name,
    dictGet('user_dict', 'tier', user_id) AS tier
FROM events
WHERE tenant_id = 1 AND event_date = today();

-- Execution:
-- 1. Scan events: 850K rows
-- 2. Dictionary lookup: O(1) hash lookup per row, already in memory
-- Time: 0.4s, Memory: 120MB

Tradeoff: Dictionaries are eventually consistent. LIFETIME(300) means data can be up to 5 minutes stale. For real-time accuracy, use JOIN. Always ask the user if this delay is acceptable before implementing a dictioary.

-- Check when dictionary was last updated
SELECT name, last_successful_update_time, loading_duration
FROM system.dictionaries;

-- Force immediate reload
SYSTEM RELOAD DICTIONARY user_dict;

IN Subquery for Filtering

When you only need to filter (not fetch columns):

-- Get orders from premium users
SELECT *
FROM orders
WHERE user_id IN (
    SELECT user_id FROM users WHERE tier = 'premium'
);

-- NOT this (builds unnecessary hash table)
SELECT o.*
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE u.tier = 'premium';

When You Must JOIN

Filter both sides first, smaller table on the right:

SELECT e.event_time, u.name, u.email
FROM (
    SELECT event_time, user_id
    FROM events
    WHERE tenant_id = 1 AND event_date = today()
) e
JOIN (
    SELECT user_id, name, email
    FROM users
    WHERE active = 1
) u ON e.user_id = u.user_id;

## Query Analysis

### EXPLAIN Queries

```sql
-- Basic execution plan
EXPLAIN SELECT ...;

-- Show index usage (most useful)
EXPLAIN indexes = 1 SELECT ...;

-- Show execution pipeline
EXPLAIN PIPELINE SELECT ...;

-- Show query AST
EXPLAIN AST SELECT ...;

Reading EXPLAIN indexes Output

EXPLAIN indexes = 1
SELECT count()
FROM events
WHERE tenant_id = 1 AND event_date = '2024-01-15';

-- Output interpretation:
-- ReadFromMergeTree
--   Indexes:
--     PrimaryKey
--       Keys: tenant_id, event_date
--       Condition: (tenant_id = 1) AND (event_date = '2024-01-15')
--       Parts: 5/120           <- 5 parts selected out of 120
--       Granules: 42/15000     <- 42 granules out of 15000 (good!)

Find Slow Queries

-- Recent slow queries
SELECT
    query,
    query_duration_ms,
    read_rows,
    read_bytes,
    memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
  AND event_date >= today() - 1
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Queries with full table scans
SELECT
    query,
    read_rows,
    total_rows_approx
FROM system.query_log
WHERE type = 'QueryFinish'
  AND read_rows > 0.9 * total_rows_approx  -- Reading >90% of table
  AND event_date >= today() - 1
ORDER BY read_rows DESC
LIMIT 10;

Profile Query Execution

-- Enable query profiling
SET log_queries = 1;
SET log_query_threads = 1;

-- Run your query
SELECT ...;

-- Check thread-level metrics
SELECT
    thread_name,
    sum(read_rows) AS rows,
    sum(read_bytes) AS bytes,
    max(peak_memory_usage) AS peak_memory
FROM system.query_thread_log
WHERE query_id = '<your-query-id>'
GROUP BY thread_name;

Query Cache (22.8+)

Cache query results for repeated identical queries:

-- Enable query cache for a session
SET use_query_cache = 1;

-- Run query (first execution populates cache)
SELECT count(*), avg(amount)
FROM orders
WHERE order_date >= '2024-01-01';

-- Subsequent identical queries return cached result instantly
-- Cache invalidated automatically when underlying data changes

Configuration:

-- Set cache TTL (default: 60 seconds)
SET query_cache_ttl = 300;  -- 5 minutes

-- Minimum rows to cache (skip small results)
SET query_cache_min_query_runs = 2;  -- Cache after 2nd run

-- Check cache status
SELECT * FROM system.query_cache;

Best for:

  • Dashboard queries hitting same data repeatedly
  • Reports regenerated frequently
  • Multi-tenant queries with shared dimensions
  • Queries that do not need real-time data

Update rows without full table rewrite:

-- Update specific columns
ALTER TABLE events
UPDATE status = 'processed'
WHERE event_id = 12345;

-- Can update multiple columns
ALTER TABLE events
UPDATE
    status = 'cancelled',
    cancelled_at = now()
WHERE order_id IN (SELECT order_id FROM cancelled_orders);

Limitations:

  • Cannot update columns in ORDER BY or PARTITION BY
  • Uses mutations (async, check progress in system.mutations)
-- Check mutation progress
SELECT table, command, is_done, latest_fail_reason
FROM system.mutations
WHERE database = 'default'
ORDER BY create_time DESC;

EXPLAIN ESTIMATE

Get row/byte estimates without running query:

-- Estimate data to be read
EXPLAIN ESTIMATE
SELECT * FROM events
WHERE tenant_id = 1 AND event_date = '2024-01-15';

-- Output shows:
--   estimated_rows
--   estimated_bytes
--   number of parts/granules to read

Use for:

  • Validating filter effectiveness before running
  • Cost comparison between query approaches
  • Capacity planning

Troubleshooting

Always ask for user confirmation before applying schema changes (adding indexes, creating MVs, altering tables).

Memory Limit Exceeded

Problem: Memory limit exceeded, DB::Exception: Memory limit exceeded, query killed by OOM

Diagnose:

EXPLAIN indexes = 1 SELECT ...;  -- Check if filtering is effective

Solutions:

| Cause | Fix | | ------------------------- | ------------------------------------------------------------- | | High-cardinality GROUP BY | Use approximate: uniqHLL12() instead of COUNT(DISTINCT) | | Large aggregation | Pre-aggregate with AggregatingMergeTree + Materialized View | | Full table scan | Add filters on ORDER BY columns (see below) | | Too many columns | Select only needed columns |

-- BAD: Exact distinct on high cardinality
SELECT COUNT(DISTINCT user_id) FROM events;

-- GOOD: Approximate (2% error, 10x less memory)
SELECT uniqHLL12(user_id) FROM events;

-- BETTER: Pre-aggregate with MV (see clickhouse-materialized-views skill)

Full Table Scan Despite Filters

Problem: Query reads all rows, slow despite WHERE clause, Granules: N/N in EXPLAIN

Diagnose:

EXPLAIN indexes = 1
SELECT * FROM events WHERE user_id = 123;
-- If "Granules: N/N" → index not used

Solutions:

| Cause | Fix | | ----------------------------- | ------------------------------------------------ | | Filter not on ORDER BY prefix | Add ORDER BY columns to WHERE, or add skip index | | Function on column | Store computed column, filter on that | | OR across ORDER BY columns | Rewrite as UNION ALL | | Type mismatch | Match filter type to column type |

-- BAD: Function prevents index use
WHERE toDate(event_time) = '2024-01-15'

-- GOOD: Store date column, filter on it
WHERE event_date = '2024-01-15'

-- BAD: Skips ORDER BY prefix (tenant_id, event_date, user_id)
WHERE user_id = 123

-- GOOD: Include prefix columns
WHERE tenant_id = 1 AND event_date = today() AND user_id = 123

-- ALT: Add skip index if you can't change query
ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4;

Slow JOINs

Problem: JOINs take minutes, high memory during JOIN, query hangs on JOIN step

Solutions (in order of preference):

| Approach | When to Use | | ------------------ | --------------------------------------------------- | | Dictionary | Lookup table <10M rows, staleness OK | | IN subquery | Only filtering, don't need columns from other table | | Filter before JOIN | Must JOIN, but can reduce rows first | | Denormalize | Same JOIN on every query |

-- BAD: JOIN builds hash table of entire users table
SELECT e.*, u.name FROM events e JOIN users u ON e.user_id = u.user_id;

-- GOOD: Dictionary lookup (see "Dictionary vs JOIN" section)
SELECT *, dictGet('user_dict', 'name', user_id) AS name FROM events;

-- GOOD: IN subquery when only filtering
SELECT * FROM events WHERE user_id IN (SELECT user_id FROM users WHERE tier = 'premium');

-- GOOD: Filter both sides before JOIN
SELECT e.*, u.name
FROM (SELECT * FROM events WHERE tenant_id = 1 AND event_date = today()) e
JOIN (SELECT user_id, name FROM users WHERE active = 1) u ON e.user_id = u.user_id;

High Disk I/O

Problem: Queries slow with high read_bytes, disk wait times high, reading GBs for simple queries

Solutions:

| Cause | Fix | | ------------------- | ------------------------------------------------ | | Reading all columns | Select only needed columns | | No early filtering | Use PREWHERE on small, selective columns | | Missing skip index | Add bloom_filter for high-cardinality lookups | | Poor compression | Use appropriate codecs (see schema-design skill) |

-- BAD: Reads all columns, filters late
SELECT * FROM events WHERE status = 'error';

-- GOOD: PREWHERE filters before reading large columns
SELECT event_time, error_message
FROM events
PREWHERE status = 'error'
WHERE error_message LIKE '%timeout%';
ClickHouse Query Optimization Skill | Agent Skills