Day 50 - How to Migrate Data from MySQL to ClickHouse®: A Step-by-Step Guide
DEV Community

Day 50 - How to Migrate Data from MySQL to ClickHouse®: A Step-by-Step Guide

Introduction

As applications grow, traditional relational databases such as MySQL may struggle with analytical workloads involving millions of records and complex aggregations. While MySQL excels at Online Transaction Processing (OLTP), ClickHouse® is purpose-built for Online Analytical Processing (OLAP), enabling lightning-fast analytical queries on massive datasets.

Migrating data from MySQL to ClickHouse® allows organizations to build high-performance reporting systems, dashboards, and real-time analytics without impacting transactional workloads. In this guide, you'll learn several approaches to migrate data from MySQL to ClickHouse®, along with their advantages, limitations, and ideal use cases.

Why Migrate from MySQL to ClickHouse®?

MySQL and ClickHouse® are designed for different workloads.

Feature MySQL ClickHouse®
Storage Model Row-based Columnar
Best For Transactions (OLTP) Analytics (OLAP)
Query Speed Fast for row lookups Extremely fast for large scans
Aggregation Performance Moderate Extremely fast
Scalability Primarily Vertical Optimized for analytical scaling
Typical Use Cases Applications and transactional systems Reporting, dashboards, and analytics

Migrating from MySQL to ClickHouse® makes sense when:

  • Analytical queries are becoming slow in MySQL.
  • You need real-time dashboards over large datasets.
  • Reporting queries are impacting your production database.
  • You regularly process millions or billions of rows.

Migration Architecture

MySQL
│
▼
Export / Synchronization
│
▼
Data Transformation
│
▼
ClickHouse®
│
▼
Dashboards / Analytics

Migration Methods

There are multiple ways to migrate data depending on your requirements.

Method 1: CSV Export and Import (Recommended for Beginners)

This is the simplest approach for performing a one-time migration of historical data.

Step 1: Export Data from MySQL

Run the following command inside MySQL:

SELECT * INTO OUTFILE '/tmp/employees.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM employees;

Alternatively, export using the MySQL CLI:

mysql -u root -p mydb \
  -e "SELECT * FROM orders" \
  | sed 's/\t/,/g' > /tmp/orders.csv

Step 2: Create the Destination Table in ClickHouse®

Before importing data, create the corresponding MergeTree table. Remember to map MySQL data types to ClickHouse® equivalents.

CREATE TABLE employees (
    id UInt32,
    name String,
    department String,
    salary Float64
) ENGINE = MergeTree
ORDER BY id;

Step 3: Import the CSV File

Using clickhouse-client:

clickhouse-client \
  --query "INSERT INTO employees FORMAT CSV" \
  < /tmp/employees.csv

Using the HTTP API:

curl -u default:password \
  "http://localhost:8123/?query=INSERT+INTO+default.employees+FORMAT+CSV" \
  --data-binary @/tmp/employees.csv

Step 4: Verify the Import

SELECT count() FROM default.employees;

Example output:

count()
5

Method 2: Using the MySQL Table Engine

ClickHouse® can directly connect to and query MySQL tables without requiring an intermediate export.

Step 1: Create a MySQL Engine Table

CREATE TABLE mysql_employees (
    id UInt32,
    name String,
    department String,
    salary Float64
) ENGINE = MySQL('localhost:3306', 'default', 'employees', 'user', 'password');

You can now query the MySQL table directly:

SELECT * FROM mysql_employees;

Step 2: Create the Destination MergeTree Table

CREATE TABLE employees (
    id UInt32,
    name String,
    department String,
    salary Float64
) ENGINE = MergeTree
ORDER BY id;

Step 3: Copy the Data

Once the MySQL engine is configured, migrate the data using a single query:

INSERT INTO employees SELECT * FROM mysql_employees;

This is one of the fastest and cleanest approaches for one-time migrations.

Step 4: Verify the Migration

SELECT * FROM employees LIMIT 5;

For most one-time migrations, this is the recommended approach because everything happens inside ClickHouse® without intermediate files.

Method 3: Using ClickPipes (ClickHouse Cloud)

If you're using ClickHouse Cloud, ClickPipes provides a fully managed ingestion service.

Migration steps:

  • Create a ClickPipe.
  • Connect your MySQL database.
  • Select the tables you want to synchronize.
  • Start continuous replication.

Advantages:

  • Fully managed service
  • Continuous incremental synchronization
  • Minimal operational maintenance

Limitation:

  • Available only for ClickHouse Cloud.

Method 4: Real-Time Migration with Kafka and Debezium

For continuous synchronization between MySQL and ClickHouse®, Change Data Capture (CDC) using Debezium is the most robust solution. Debezium captures every INSERT, UPDATE, and DELETE operation from MySQL and streams the changes into Kafka, where ClickHouse® consumes them.

Architecture:

MySQL
│
▼
Debezium (CDC)
│
▼
Kafka
│
▼
ClickHouse Kafka Engine
│
▼
Materialized View
│
▼
MergeTree Table

This architecture enables near real-time synchronization. Best suited for:

  • Event streaming
  • Real-time dashboards
  • IoT platforms
  • Monitoring systems
  • Operational analytics

MySQL to ClickHouse® Data Type Mapping

MySQL Type ClickHouse® Type
INT Int32 / UInt32
BIGINT Int64 / UInt64
VARCHAR(n) String
TEXT String
DECIMAL(p,s) Decimal(p,s)
FLOAT Float32
DOUBLE Float64
DATETIME DateTime
DATE Date
TINYINT(1) UInt8 (Boolean)
JSON String

Verifying the Migration

Always verify that both databases contain identical data after migration.

Row Count Validation

MySQL:

SELECT COUNT(*) FROM orders;

ClickHouse®:

SELECT count() FROM default.orders;

Both queries should return identical counts. If they don't, investigate:

  • Data type mismatches
  • NULL handling differences
  • Missing records during export
  • Failed import batches

Best Practices

  1. Choose the Right ORDER BY Key - Unlike MySQL's primary key, ClickHouse® uses ORDER BY to determine how data is physically sorted. Choose columns that are frequently used in filtering conditions.

  2. Use LowCardinality for Repetitive Strings - Columns with relatively few distinct values benefit from better compression.

country LowCardinality(String),
status LowCardinality(String)
  1. Partition by Time - For time-series data, partition by month or day.
PARTITION BY toYYYYMM(order_date)
  1. Remove AUTO_INCREMENT - ClickHouse® doesn't support AUTO_INCREMENT. Instead, simply store identifiers as UInt32 or UInt64.

  2. Handle NULL Values Carefully - ClickHouse® columns are NOT NULL by default. If nullable columns exist in MySQL, either use Nullable(Type) or replace NULL values during migration.

Example:

SELECT id, COALESCE(department, 'Unknown') AS department
FROM default.mysql_employee;

Common Challenges

  • Data Type Mismatches - Ensure MySQL data types are correctly mapped before migration.
  • Duplicate Records - Implement deduplication strategies or unique identifiers when performing incremental loads.
  • Large Tables - Split large datasets into smaller batches to reduce memory consumption and improve reliability.
  • Character Encoding - Use UTF-8 encoding consistently to prevent text corruption.

Performance Tips

  • Use the MergeTree engine for analytical workloads.
  • Batch inserts instead of inserting one row at a time.
  • Compress data whenever possible.
  • Choose an efficient partitioning strategy.
  • Avoid unnecessary indexes - ClickHouse® relies on sorting and data-skipping indexes instead.

Quick Reference

Scenario Recommended Method
One-time migration CSV Export & Import
Direct access to MySQL MySQL Table Engine
Full table migration INSERT INTO ... SELECT
Continuous synchronization ClickPipes
Real-time streaming Kafka + Debezium

Conclusion

Migrating data from MySQL to ClickHouse® is an effective way to accelerate analytical workloads while keeping your transactional database unchanged. In this architecture, MySQL continues handling application writes and transactional operations, while ClickHouse® powers dashboards, reporting, and large-scale analytical queries.

For most organizations, using the MySQL Table Engine together with INSERT INTO ... SELECT offers the simplest and most reliable solution for one-time migrations. If continuous synchronization is required, Debezium with Kafka provides a production-ready Change Data Capture (CDC) pipeline capable of replicating changes in near real time.

With proper schema design, efficient partitioning, and the right migration strategy, ClickHouse® can deliver dramatically faster analytical performance from the very first day.

Comments

No comments yet. Start the discussion.