Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks
When a query becomes slow, the first instinct is often to add more CPU or increase memory. In reality, the problem may have nothing to do with hardware. A query can be slow because it scans too much data, performs expensive calculations, uses excessive memory, or transfers large amounts of data across the network. Without profiling, these bottlenecks are difficult to distinguish because they often produce similar symptoms: high query latency.
ClickHouse® provides built-in profiling tools that expose detailed execution statistics, making it possible to understand why a query is slow before attempting optimization. In this article, we'll explore how to profile queries in ClickHouse®, interpret the most important metrics, and identify common performance bottlenecks.
Why Query Profiling Matters
A slow query does not necessarily mean a slow database. Common causes include:
- Reading far more rows than necessary
- Poor primary key filtering
- Large aggregations or sorts
- Memory-intensive joins
- Network overhead in distributed queries
- CPU-heavy expressions
- Ineffective partition pruning
Profiling provides visibility into:
- Execution time
- Rows and bytes processed
- Memory consumption
- CPU utilization
- Disk reads
- Thread-level behavior
- Execution plans
- Low-level performance counters
The goal is simple: identify the real bottleneck before optimizing.
A Practical Profiling Workflow
A typical ClickHouse® profiling workflow looks like this:
- Find slow queries in
system.query_log. - Inspect the execution plan with
EXPLAIN. - Review rows, bytes, and memory usage.
- Analyze thread-level behavior if needed.
- Examine
ProfileEvents. - Apply a targeted optimization.
- Measure the results again.
This evidence-driven approach is far more effective than making multiple changes simultaneously.
Step 1: Find Slow Queries
The system.query_log table stores execution statistics for completed queries. Key columns:
| Column | Meaning |
|---|---|
query_duration_ms |
Total execution time |
read_rows |
Rows scanned |
read_bytes |
Data read from storage |
memory_usage |
Peak memory consumption |
normalized_query_hash |
Groups similar queries together |
normalized_query_hash is especially useful because it reveals repeated expensive query patterns, not just isolated slow queries.
Step 2: Inspect the Execution Plan
Execution time alone does not explain why a query is slow. Use EXPLAIN to see how ClickHouse® plans to execute it. The output shows:
- Index usage
- Selected parts
- Granules scanned
- Predicate pushdown
- Read operations
If most granules are scanned, the primary key is not pruning data effectively.
Step 3: Analyze Resource Usage
Rows Read – If a query returns 100 rows but scans hundreds of millions, filtering is likely inefficient.
Bytes Read – Large values usually indicate:
- Full table scans
- Reading unnecessary columns
- Poor partition pruning
Peak Memory Usage – High memory consumption is common with:
JOINoperationsGROUP BYORDER BYDISTINCT
Large intermediate datasets can eventually spill to disk depending on server settings.
Step 4: Use system.query_thread_log
Some queries are not uniformly slow across all threads. system.query_thread_log records statistics for individual execution threads, including:
- CPU time
- Memory usage
- Read statistics
- Thread duration
This is useful for diagnosing parallel execution imbalance and identifying threads doing disproportionate work.
Step 5: Examine ProfileEvents
One of the most valuable profiling features is the ProfileEvents map available in system.query_log. Common events:
| Event | Meaning |
|---|---|
SelectedRows |
Rows processed |
SelectedBytes |
Bytes processed |
NetworkSendBytes |
Network traffic generated |
ReadCompressedBytes |
Compressed bytes read from storage |
UserTimeMicroseconds |
CPU time in user space |
SystemTimeMicroseconds |
CPU time in kernel space |
These counters often reveal whether the dominant cost is CPU, storage I/O, or network communication.
Identifying Common Bottlenecks
1. Excessive Data Scanning
Symptoms: High read_rows, high read_bytes, long execution time
Typical causes: Missing filters, poor primary key design, ineffective partition pruning
2. CPU Bottlenecks
Symptoms: High CPU-related ProfileEvents, moderate data reads, complex expressions
Common causes: Expensive functions, large aggregations, complex JOIN conditions
3. Memory Bottlenecks
Symptoms: High memory_usage, slow aggregations, slow joins
Common causes: Large hash tables, wide intermediate datasets, massive GROUP BY operations
4. Disk I/O Bottlenecks
Symptoms: High compressed bytes read, large storage reads, long execution time despite moderate CPU usage
Typical causes: Reading unnecessary columns, large table scans, poor data locality
5. Network Bottlenecks
Symptoms: High NetworkSendBytes, large intermediate result exchange, slow distributed aggregations
Reducing data movement between nodes can significantly improve distributed query performance.
Practical Example
Suppose a query takes 20 seconds to complete.
| Metric | Value |
|---|---|
| Duration | 20 s |
| Rows Read | 450 million |
| Result Rows | 150 |
| Memory | 180 MB |
| CPU | Moderate |
| Read Bytes | Very High |
This profile suggests that CPU is not the limiting factor. The query is spending most of its time scanning data, so the best optimization is likely:
- Better filtering
- Improved primary key design
- More effective partition pruning
Adding more CPU would not solve the root cause.
Best Practices
- Enable query logging in production.
- Use
EXPLAINbefore modifying queries. - Review
ProfileEventsalongside execution time. - Track recurring slow queries with
normalized_query_hash. - Measure before and after every optimization.
- Test with realistic datasets.
- Avoid optimizing based on a single metric.
It is also useful to compare cold-cache and warm-cache performance when evaluating storage-related changes.
Conclusion
Query optimization is fundamentally about identifying bottlenecks, not applying random performance tweaks. ClickHouse® provides powerful profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents. Together, these tools reveal how a query executes and which resources it consumes.
By combining execution plans with runtime metrics, engineers can determine whether a query is limited by CPU, memory, storage, or network activity and apply targeted optimizations that improve performance without unnecessary infrastructure changes.
Day 60 takeaway: Measure first, optimize second. The fastest way to improve a slow ClickHouse® query is to understand exactly where the time is being spent.
Comments
No comments yet. Start the discussion.