ClickHouse® glossary

This glossary defines the concepts and terms essential for understanding and working with ClickHouse®.

Cluster

A group of ClickHouse® nodes or servers that work together to distribute, store, and process data.

Dictionary

A type of data structure useful for various reference lists. Dictionaries in ClickHouse® serve as external lookup tables and allow for better query performance. They're often more efficient than JOIN commands with reference tables.

Granule

A batch of rows in an uncompressed block. When reading data, ClickHouse® accesses granules, but not individual rows, which enables faster data processing in analytical workloads. By default, a granule contains 8192 rows. The primary index contains one entry per granule.

Keeper

Also referred to as ClickHouse® Keeper. A coordination system for data replication and distributed DDL queries execution. Keeper can be used as a standalone replacement for ZooKeeper or an internal part of a ClickHouse® server. Learn more

Materialized view

A database object that contains query results. Materialized views are calculated when data is inserted into the base table, but they don't reflect changes in the existing data. You can use them to pre-aggregate data and then dispose raw data.

Part

A physical file structure that stores a portion of data from a table. Not to be confused with a partition.

Partition

A logical division of data in a table created with a specific partition key. Partitioning data improves performance because ClickHouse® can query only relevant partitions.

Primary key

Primary keys in ClickHouse® consist of one or multiple columns and determine how data is stored and retrieved. Unlike traditional relational databases, primary keys aren't unique. Their main role is to optimize query performance rather than ensure data integrity. Primary keys in ClickHouse® determine the sort order of tables.

Projection

A projection is a pre-aggregated subset of data optimized for certain query patterns. Unlike materialized views, projections don't contain query results, but rather store existing data in a different way.

Replica

A ClickHouse® host that holds a copy of the data in the cluster (or shard). Storing the same data in several replicas enables higher data availability and redundancy. Learn more

Sampling

A method of querying large datasets faster by processing only a fraction of the data called a sample. Sampling provides performance benefits and is especially useful when you don't need exact results.

Shard

A subset of data. Sharding data across multiple servers allows you to distribute and divide the load so that you don't exceed the capacity of a single server. Learn more

Sparse index

A type of indexing when the primary index contains one entry per a group of rows, but not one individual row. This entry that corresponds to a group of rows is referred to as a mark.

With sparse indexes, ClickHouse® first identifies groups of rows that potentially match the query and then processes them separately to find a match. Thanks to this, the primary index is small enough to be loaded into the memory.

Table engine

The table engine determines how data is written, stored, and accessed in ClickHouse®. The most common table engine is MergeTree that allows quick insertion of large amounts of data and background data processing. Learn more

TTL

TTL (time to live) is a ClickHouse® feature that automatically moves, deletes, or rolls up columns or rows after a certain time period. It allows you to manage storage more efficiently because you can delete, move, or archive the data that you no longer need to access frequently.

User-defined function

User-defined functions (UDFs) allow you to extend the functionality of ClickHouse® by creating custom functions and using them in queries. Managed Service for ClickHouse® supports only SQL UDFs.

ZooKeeper

An open-source service for coordinating and maintaining distributed systems. ClickHouse® can use ZooKeeper for managing replication of changes and table data across the cluster.