Work with sharding in ClickHouse®

For these step-by-step instructions, let's assume that you've created the cluster called Test-Cluster with 5 shards and the database called examples.

Create a distributed table

  1. Connect to your database.

  2. Create a table with ReplicatedMergeTree engine . For the purpose of these instructions, let's call it player_ratings-A:

    CREATE TABLE examples.player_ratings-A ON CLUSTER default
    ( 
       `<table structure>` 
    )
    ENGINE = ReplicatedMergeTree()
    PARTITION BY toYYYYMM(EventDate)
    ORDER BY (CounterID, EventDate)
    

    Let's look through this query in detail:

    default is the cluster name. ON CLUSTER default automatically creates tables on all hosts. <table structure> - describe every column and its data type according to ClickHouse documentation .

  3. Create a distributed table player_ratings-A_distributed with Distributed engine :

    CREATE TABLE examples.player_ratings_distributed ON CLUSTER default 
    AS examples.data_table1
    ENGINE = Distributed(default, examples, player_ratings-A, rand())
    

    In this case, instead of explicitly describing the table structure, you can use AS examples.player_ratings-A to copy the structure of the initial table you created earlier.

Distributed table health check

To check the availability of the distributed table you created, run a test query for a number of rows in the table:

SELECT count() FROM examples.player_ratings-A_distributed

You should receive a number as a response to this query.