DEV Community 2h ago

Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks

When a query becomes slow, the first instinct is often to add more CPU or increase memory. In reality, the problem may have nothing to do with hardware. A query can be slow because it scans too much data, performs expensive calculations, uses excessive memory, or transfers large amounts of data across the network. Without profiling, these bottlenecks are difficult to distinguish because they often produce similar symptoms: high query latency.

ClickHouse® provides built-in profiling tools that expose detailed execution statistics, making it possible to understand why a query is slow before attempting optimization. In this article, we'll explore how to profile queries in ClickHouse®, interpret the most important metrics, and identify common performance bottlenecks.

Why Query Profiling Matters

A slow query does not necessarily mean a slow database. Common causes include:

Reading far more rows than necessary
Poor primary key filtering
Large aggregations or sorts
Memory-intensive joins
Network overhead in distributed queries
CPU-heavy expressions
Ineffective partition pruning

Profiling provides visibility into:

Execution time
Rows and bytes processed
Memory consumption
CPU utilization
Disk reads
Thread-level behavior
Execution plans
Low-level performance counters

The goal is simple: identify the real bottleneck before optimizing.

A Practical Profiling Workflow

A typical ClickHouse® profiling workflow looks like this:

Find slow queries in system.query_log.
Inspect the execution plan with EXPLAIN.
Review rows, bytes, and memory usage.
Analyze thread-level behavior if needed.
Examine ProfileEvents.
Apply a targeted optimization.
Measure the results again.

This evidence-driven approach is far more effective than making multiple changes simultaneously.

Step 1: Find Slow Queries

The system.query_log table stores execution statistics for completed queries. Key columns:

Column	Meaning
`query_duration_ms`	Total execution time
`read_rows`	Rows scanned
`read_bytes`	Data read from storage
`memory_usage`	Peak memory consumption
`normalized_query_hash`	Groups similar queries together

normalized_query_hash is especially useful because it reveals repeated expensive query patterns, not just isolated slow queries.

Step 2: Inspect the Execution Plan

Execution time alone does not explain why a query is slow. Use EXPLAIN to see how ClickHouse® plans to execute it. The output shows:

Index usage
Selected parts
Granules scanned
Predicate pushdown
Read operations

If most granules are scanned, the primary key is not pruning data effectively.

Step 3: Analyze Resource Usage

Rows Read – If a query returns 100 rows but scans hundreds of millions, filtering is likely inefficient.

Bytes Read – Large values usually indicate:

Full table scans
Reading unnecessary columns
Poor partition pruning

Peak Memory Usage – High memory consumption is common with:

JOIN operations
GROUP BY
ORDER BY
DISTINCT

Large intermediate datasets can eventually spill to disk depending on server settings.

Step 4: Use `system.query_thread_log`

Some queries are not uniformly slow across all threads. system.query_thread_log records statistics for individual execution threads, including:

CPU time
Memory usage
Read statistics
Thread duration

This is useful for diagnosing parallel execution imbalance and identifying threads doing disproportionate work.

Step 5: Examine `ProfileEvents`

One of the most valuable profiling features is the ProfileEvents map available in system.query_log. Common events:

Event	Meaning
`SelectedRows`	Rows processed
`SelectedBytes`	Bytes processed
`NetworkSendBytes`	Network traffic generated
`ReadCompressedBytes`	Compressed bytes read from storage
`UserTimeMicroseconds`	CPU time in user space
`SystemTimeMicroseconds`	CPU time in kernel space

These counters often reveal whether the dominant cost is CPU, storage I/O, or network communication.

Identifying Common Bottlenecks

1. Excessive Data Scanning

Symptoms: High read_rows, high read_bytes, long execution time

Typical causes: Missing filters, poor primary key design, ineffective partition pruning

2. CPU Bottlenecks

Symptoms: High CPU-related ProfileEvents, moderate data reads, complex expressions

Common causes: Expensive functions, large aggregations, complex JOIN conditions

3. Memory Bottlenecks

Symptoms: High memory_usage, slow aggregations, slow joins

Common causes: Large hash tables, wide intermediate datasets, massive GROUP BY operations

4. Disk I/O Bottlenecks

Symptoms: High compressed bytes read, large storage reads, long execution time despite moderate CPU usage

Typical causes: Reading unnecessary columns, large table scans, poor data locality

5. Network Bottlenecks

Symptoms: High NetworkSendBytes, large intermediate result exchange, slow distributed aggregations

Reducing data movement between nodes can significantly improve distributed query performance.

Practical Example

Suppose a query takes 20 seconds to complete.

Metric	Value
Duration	20 s
Rows Read	450 million
Result Rows	150
Memory	180 MB
CPU	Moderate
Read Bytes	Very High

This profile suggests that CPU is not the limiting factor. The query is spending most of its time scanning data, so the best optimization is likely:

Better filtering
Improved primary key design
More effective partition pruning

Adding more CPU would not solve the root cause.

Best Practices

Enable query logging in production.
Use EXPLAIN before modifying queries.
Review ProfileEvents alongside execution time.
Track recurring slow queries with normalized_query_hash.
Measure before and after every optimization.
Test with realistic datasets.
Avoid optimizing based on a single metric.

It is also useful to compare cold-cache and warm-cache performance when evaluating storage-related changes.

Conclusion

Query optimization is fundamentally about identifying bottlenecks, not applying random performance tweaks. ClickHouse® provides powerful profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents. Together, these tools reveal how a query executes and which resources it consumes.

By combining execution plans with runtime metrics, engineers can determine whether a query is limited by CPU, memory, storage, or network activity and apply targeted optimizations that improve performance without unnecessary infrastructure changes.

Day 60 takeaway: Measure first, optimize second. The fastest way to improve a slow ClickHouse® query is to understand exactly where the time is being spent.