ClickHouse® table engines
ClickHouse® provides multiple table engines, each designed to address different data storage and processing requirements:
MergeTree
MergeTree is the default table engine in ClickHouse®. It's designed for inserting a very large amount of data into a table. The data is written to the table part by part, then rules are applied for merging the parts in the background. This method is much more efficient than continually rewriting the data in storage during insert.
MergeTree tables are sorted by a primary key, and data is stored in sorted order, enabling high-performance range queries and efficient data compression.
MergeTree limitations for production clusters
To set up this table engine, use the following syntax in your CREATE TABLE
query:
ENGINE = MergeTree()
This table engine has multiple extensions designed to handle more specific use-cases:
ReplicatedMergeTree
ReplicatedMergeTree is an extension of the MergeTree engine designed to optimize data replication and fault tolerance. It's used when high availability and durability are essential. ReplicatedMergeTree creates multiple replicas per shard, ensuring data redundancy and automatic synchronization across replicas.
Recommended option for multi-replica sharded clusters
We suggest using this table engine when operating your multi-shard clusters with more than a single replica per shard.
Every MergeTree family table engine has a replicated version (see below)
To set up this table engine, use the following syntax in your CREATE TABLE
query:
ENGINE = ReplicatedMergeTree()
For in-depth step-by-step instructions on creating replicated tables, see Create replicated tables in your Managed ClickHouse® cluster.
ReplicatedAggregatingMergeTree
ReplicatedAggregatingMergeTree is an extension of the MergeTree engine designed for pre-aggregated data storage. It combines the benefits of ReplicatedMergeTree with automatic pre-aggregation, allowing you to store and query pre-computed aggregate data efficiently. It's commonly used for time-series data analysis or reporting applications.
To set up this table engine, use the following syntax in your CREATE TABLE
query:
ENGINE = ReplicatedAggregatingMergeTree()
ReplicatedCollapsingMergeTree
ReplicatedCollapsingMergeTree is an extension of the ReplicatedMergeTree engine used to store and query time-series data with collapsing capability. It allows you to efficiently collapse duplicate or redundant rows in a dataset, reducing storage requirements.
To set up this table engine, use the following syntax in your CREATE TABLE
query:
ENGINE = ReplicatedCollapsingMergeTree()
ReplicatedSummingMergeTree
ReplicatedSummingMergeTree is used for efficient storage and retrieval of data with pre-computed sum values. It's suitable for scenarios where aggregating and retrieving the sum of certain columns is a frequent operation.
To set up this table engine, use the following syntax in your CREATE TABLE
query:
ENGINE = ReplicatedSummingMergeTree()