ClickHouse Keeper hosts

ClickHouse Keeper is a coordination system analogous to Zookeeper with a compatible interface. It's a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Resource allocation

ClickHouse Keeper operates on the basis of majority quorums forming an ensemble. This approach requires setting up hardware infrastructure with the following requirements:

  • Number of hosts:

    ClickHouse Keeper requires a minimum of three servers to form a majority quorum and ensure fault tolerance. It's generally recommended to use an odd number of servers.

  • CPU and Memory:

    ClickHouse Keeper resource requirements are relatively modest. However, the CPU and memory capacity should be sufficient to handle the expected workload and concurrent client requests. The specific CPU and memory requirements depend on factors such as the number of clients, the complexity of the operations, and the data size being managed.

  • Storage:

    ClickHouse Keeper primary data storage is in-memory, which makes it optimal for read-heavy workloads. However, it also writes transaction logs to disk for durability. The disk storage capacity should be able to accommodate the transaction logs and any additional data, such as snapshots, if enabled. The required disk I/O performance depends on the write load and the rate of changes in the data.

DoubleCloud offers two ClickHouse Keeper deployment types:

Embedded

The infrastructure is shared with the ClickHouse® installation. This means the ClickHouse Keeper hosts employ ride-sharing with the ClickHouse® hosts.

Dedicated

The infrastructure is deployed on separate hosts, ensuring improved performance, availability and scalability

Recommended option for the ClickHouse Keeper

We recommend using dedicated hosts for high-load production clusters. Dedicated ClickHouse Keeper hosts ensure that your production cluster's performance remains unaffected under heavy loads - they don't use its CPU or memory.

High-availability configuration

High availability can be achieved for clusters with at least 3 ClickHouse Keeper hosts.

The following table describes the minimal configurations to achieve high availability when using dedicated and embedded ClickHouse Keeper hosts:

Dedicated Embedded
1 shard with 2 replicas 2 shards with 3 replicas each

This is how dedicated ClickHouse Keeper hosts contribute to high availability of your Managed Service for ClickHouse® cluster:

  • Leader election

    When a ClickHouse Keeper loses its leader (the replica responsible for handling write requests), it elects a new one among the available replicas. This process ensures that data can still be written to the cluster, even in the event of host failures.

  • Data Consistency

    ClickHouse Keeper helps maintain consistency across replicas. All replicas store metadata of data parts there and use it for data replication and consistent merges.

The embedded ClickHouse Keeper solution requires at least 3 hosts running at any time, so you'll need to add one extra shard for enhanced fault-tolerance, and, since the shard configurations must be mirrored, you end up with a more expensive setup.