ClickHouse® table engines

ClickHouse® provides multiple table engines, each designed to address different data storage and processing requirements:

MergeTree

MergeTree is the default table engine in ClickHouse®. It's designed for inserting a very large amount of data into a table. The data is written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.

MergeTree tables are sorted by a primary key, and data is stored in sorted order, enabling high-performance range queries and efficient data compression.

MergeTree limitations for production clusters

The MergeTree engine can't synchronize data between replicas.

If you want to use replication in your cluster, use the ReplicatedMergeTree engine. It's specifically designed to handle data replication and synchronization within shards, thus providing better performance and fault tolerance.

To set up this table engine, use the following syntax in your CREATE TABLE query:

ENGINE = MergeTree()

MergeTree official ClickHouse® documentation

This table engine has multiple extensions designed to handle more specific use-cases:

ReplicatedMergeTree

ReplicatedMergeTree is an extension of the MergeTree engine designed to optimize data replication and fault tolerance. It's used when high availability and durability are essential. ReplicatedMergeTree creates multiple replicas per shard, ensuring data redundancy and automatic synchronization across replicas.

Recommended option for multi-replica sharded clusters

We suggest using this table engine when operating your multi-shard clusters with more than a single replica per shard.

Every MergeTree family table engine has a replicated version (see below)

To set up this table engine, use the following syntax in your CREATE TABLE query:

ENGINE = ReplicatedMergeTree()

ReplicatedMergeTree official ClickHouse® documentation

For in-depth step-by-step instructions on creating replicated tables, see Create replicated tables in your Managed ClickHouse® cluster.

ReplicatedAggregatingMergeTree

ReplicatedAggregatingMergeTree is an extension of the MergeTree engine designed for pre-aggregated data storage. It combines the benefits of ReplicatedMergeTree with automatic pre-aggregation, allowing you to store and query pre-computed aggregate data efficiently. It's commonly used for time-series data analysis or reporting applications.

To set up this table engine, use the following syntax in your CREATE TABLE query:

ENGINE = ReplicatedAggregatingMergeTree()

AggregatingMergeTree official ClickHouse® documentation

ReplicatedCollapsingMergeTree

ReplicatedCollapsingMergeTree is an extension of the ReplicatedMergeTree engine used to store and query time-series data with collapsing capability. It allows you to efficiently collapse duplicate or redundant rows in a dataset, reducing storage requirements.

To set up this table engine, use the following syntax in your CREATE TABLE query:

ENGINE = ReplicatedCollapsingMergeTree()

CollapsingMergeTree official ClickHouse® documentation

ReplicatedSummingMergeTree

ReplicatedSummingMergeTree is used for efficient storage and retrieval of data with pre-computed sum values. It's suitable for scenarios where aggregating and retrieving the sum of certain columns is a frequent operation.

To set up this table engine, use the following syntax in your CREATE TABLE query:

ENGINE = ReplicatedSummingMergeTree()

SummingMergeTree official ClickHouse® documentation

Replication

Logs