ClickHouse® glossary

This glossary defines the concepts and terms essential for understanding and working with ClickHouse®.

Cluster

A group of ClickHouse® nodes or servers that work together to distribute, store, and process data.

Dictionary

A type of data structure useful for various reference lists. Dictionaries serve as external lookup tables and allow for better query performance. They're often more efficient than JOIN commands with reference tables.

Keeper

Also referred to as ClickHouse® Keeper. A coordination system for data replication and distributed DDL queries execution. Keeper can be used as a standalone replacement for ZooKeeper or an internal part of a ClickHouse® server. Learn more

Materialized view

A database object that contains query results. Materialized views are calculated when data is inserted into the base table, but they don't reflect changes in the existing data. You can use them to pre-aggregate data and then dispose raw data.

Part

A physical file structure that stores a portion of data from a table. Not to be confused with a partition.

Partition

A logical division of data in a table created with a specific partition key. Partitioning data improves performance because ClickHouse® can query only relevant partitions.

Projection

A projection is a pre-aggregated subset of data optimized for certain query patterns. Unlike materialized views, projections don't contain query results, but rather store existing data in a different way.

Primary key

Primary keys in ClickHouse® consist of one or multiple columns and determine how data is stored and retrieved. Unlike traditional relational databases, primary keys aren't unique. Their main role is to optimize query performance rather than ensure data integrity.

Replica

A ClickHouse® host that holds a copy of the data in the cluster (or shard). Storing the same data in several replicas enables higher data availability and redundancy. Learn more

Shard

A subset of data. Sharding data across multiple servers allows you to distribute and divide the load so that you don't exceed the capacity of a single server. Learn more

Table engine

The table engine determines how data is written, stored, and accessed in ClickHouse®. The most common table engine is MergeTree that allows quick insertion of large amounts of data and background data processing. Learn more

ZooKeeper

An open-source service for coordinating and maintaining distributed systems. ClickHouse® can use ZooKeeper for managing replication of changes and table data across the cluster.