Getting started with Managed Service for ClickHouse®
To get started with the service:
Before you start
Your primary tool to interact with the DoubleCloud is the console. We need to set it up and then configure it before moving on.
-
Go to the console.
-
Log in to DoubleCloud if you already have an account, or create one if you are opening the console for the first time.
Note
This tutorial shows how to connect to a cluster with a native CLI client and Docker
-
Install the software:
DockerDEB-basedRPM-based-
Open your terminal.
-
(Optional) Start Docker
sudo service docker start
-
Pull the clickhouse-client
docker pull clickhouse/clickhouse-client
-
Open your terminal.
-
Connect to the ClickHouse® official DEB repository
sudo apt update && sudo apt install -y apt-transport-https ca-certificates dirmngr && \ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 && \ echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list
-
Refresh the package list and install the clickhouse-client
sudo apt update && sudo apt install -y clickhouse-client
-
Open your terminal.
-
Connect to a ClickHouse® official RPM repository
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
-
Install the clickhouse-client
sudo yum install -y clickhouse-client
Warning
If you run a RedHat 7-based Linux distribution, including Cent OS 7, Oracle Linux 7 and others, you need to download and install trusted certificates and manually add the path to them in the clickhouse-client configuration file as follows:
-
Install the
root
certificate:curl https://letsencrypt.org/certs/isrg-root-x2-cross-signed.pem > \ /etc/pki/ca-trust/source/anchors/isrg-root-x2-cross-signed.pem
-
Install the
intermediate
certificate:curl https://letsencrypt.org/certs/lets-encrypt-r3-cross-signed.pem > \ /etc/pki/ca-trust/source/anchors/lets-encrypt-r3-cross-signed.pem
-
Update the list of trusted certificates:
sudo update-ca-trust
-
Locate your clickhouse-client configuration file (by default, you can find it at
/etc/clickhouse-client/config.xml
) and add the path to the certificates into the<openSSL>
section:<client> <!-- Used for connection to server's secure tcp port --> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> <caConfig>/etc/ssl/certs/ca-bundle.crt</caConfig> <!-- Use for self-signed: <verificationMode>none</verificationMode> --> <invalidCertificateHandler> <!-- Use for self-signed: <name>AcceptCertificateHandler</name> --> <name>RejectCertificateHandler</name> </invalidCertificateHandler> </client>
-
Create your cluster
ClickHouse® clusters are one or more database hosts.
-
Go to the Clusters overview
-
Click Create cluster in the upper-right corner of the page.
-
Select ClickHouse®.
-
Choose a provider and a region.
-
Under Resources:
-
Select the
s1-c2-m4
preset for CPU, RAM capacity, and storage space to create a cluster with minimal configuration.Understand your ClickHouse® resource presetA resource preset has the following structure:
<CPU platform> - C<number of CPU cores> - M<number of gigabytes of RAM>
There are three available CPU platforms:
-
g
- ARM Graviton -
i
- Intel (x86) -
s
- AMD (x86)
For example, the
i1-c2-m8
preset means that it's an Intel platform 2-core CPU with 8 gigabytes of RAM.You can see the availability of CPU platforms across our Managed Service for ClickHouse® areas and regions.
-
-
Choose the number of replicas. Let's keep it as is with a single replica.
-
Select the number of shards. Keep a single shard.
Understand shards and replicas
Shards refer to the servers that contain different parts of the data (to read all the data, you must access all the shards). Replicas are duplicating servers (to read all the data, you can access the data on any of the replicas).
-
-
Under Basic settings:
-
Enter the cluster Name, in this scenario -
tutorial-cluster
. -
From the Version drop-down list, select the ClickHouse® version the Managed ClickHouse® cluster will use. For most clusters, we recommend using the latest version.
-
-
Under Advanced:
-
Under Maintenance settings, select the scheduling type:
-
Arbitrary to delegate maintenance window selection to DoubleCloud. Usually, your cluster will perform maintenance procedure at the earliest available time slot.
Warning
We suggest not to use this scheduling type with single-host clusters, as it can lead to your cluster becoming unavailable at random.
-
By schedule to set the weekday and time (UTC) when DoubleCloud may perform maintenance on your cluster.
-
-
Under Networking → VPC, specify in which DoubleCloud VPC to locate your cluster. Use the
default
value in the previously selected region if you don't need to create this cluster in a specific network. -
Select the allocation for the ClickHouse Keeper service -
embedded
ordedicated
.We recommend using dedicated hosts for high-load production clusters. Dedicated ClickHouse Keeper hosts ensure that your production cluster's performance remains unaffected under heavy loads - they don't use its CPU or memory.
ClickHouse Keeper host location is irreversible
After creating the cluster, you won't be able to change the ClickHouse Keeper deployment type.
For dedicated ClickHouse Keeper hosts, select the appropriate resource preset. Please note that this resource preset will apply to all three hosts and will be billed accordingly.
-
Specify or adjust your cluster's DBMS settings. For more information, see the Settings reference.
-
-
Click Submit.
Your cluster will appear with the Creating
status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive
.
Click on the cluster to open its information page:
Tip
The DoubleCloud service creates the superuser admin
and its password automatically. You can find both the User and the Password in the Overview tab on the cluster information page.
To create users for other roles, see Manage ClickHouse® users
To create a ClickHouse® cluster, use the ClusterService
create method. The required parameters to create a functional cluster:
-
project_id
- the ID of your project. You can get this value on your project's information page. -
cloud_type
-aws
. -
region_id
- for this quickstart, specifyeu-central-1
. -
name
- your cluster's name,tutorial-cluster
. -
resources
- specify the following from the doublecloud.clickhouse.v1.ClusterResources model:
Connect to the cluster and create a database
-
Go to the Clusters overview
-
Select the name of your cluster to open its information page. By default, you will see the Overview tab.
-
Under Connection strings, find the Native interface string and click Copy.
-
Open your terminal and run a command to connect to your cluster:
DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string>
The complete Docker command structuredocker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <cluster user name> \ --password <cluster user password> \ --port 9440
<Native interface connection string>
-
After you have connected to the cluster, run the CREATE
CREATE DATABASE IF NOT EXISTS first_database
After that, you can check the result by executing the following command:
SHOW DATABASES
The output will contain the name of your newly created database:
┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ first_database │ │ information_schema │ │ system │ └────────────────────┘
-
Now, it's time to add a table to the
first_database
on your cluster. The table will have the following structure adjusted to the dataset that we'll upload later:CREATE TABLE first_database.hits ON CLUSTER default ( Hit_ID Int32, Date Date, Time_Spent Float32, Cookie_Enabled Int32, Region_ID Int32, Gender String, Browser String, Traffic_Source String, Technology String ) ENGINE = ReplicatedMergeTree() ORDER BY (Hit_ID, Date)
ReplicatedMergeTree table engine
In the code example above, we used the ReplicatedMergeTree engine.
This is the most cost-efficient solution for clusters with multiple replicas per shard, to achieve data replication, strong consistency, fault tolerance, and automatic synchronization for your data.
Check if your query was successful and the table exists in the database:
SHOW TABLES FROM first_database
The readout will be the following:
┌─name─┐ │ hits │ └──────┘
Tip
Alternatively, you can use a GUI-based IDE of your choice to connect to a cluster and create a database.
Upload data
-
Now you can open another terminal instance or exit from your cluster by typing the
exit
command. -
Run the following query to fetch the data from our S3 bucket and combine it with the INSERT
curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.csv | docker run --network host --rm -it clickhouse/<Native interface connection string> --query="INSERT INTO first_database.hits FORMAT CSVWithNames" --format_csv_delimiter=";"
The complete Docker command structurecurl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.csv | docker run --network host --rm -i clickhouse/clickhouse-client --host <FQDN of your cluster> --port 9440 --secure --user <your cluster username> --password <your cluster password> --query="INSERT INTO first_database.hits FORMAT CSVWithNames" --format_csv_delimiter=";"
-
If the query was successful, perform the
SELECT
query to see the loaded data:docker run --network host --rm -it clickhouse/<Native interface connection string> --query="SELECT * FROM first_database.hits LIMIT 20"
The complete Docker command structuredocker run --network host --rm -i clickhouse/clickhouse-client --host <FQDN of your cluster> --port 9440 --secure --user <your cluster username> --password <your cluster password> --query="SELECT * FROM first_database.hits LIMIT 20"
-
Run the following query that will fetch the data from our S3 bucket and combine it with the INSERT
<Native interface connection string> --query="INSERT INTO first_database.hits FORMAT CSVWithNames"
-
If the query was successful, perform the
SELECT
query to see the loaded data:<Native interface connection string> --query="SELECT * FROM first_database.hits LIMIT 20"
You should see your data in the table:
67112 2016-01-10 388.93976 0 184 Male Firefox Direct PC (Mac)
65411 2016-01-30 267.13525 0 229 Female Firefox Search engine PC (Windows)
57240 2016-02-05 321.99017 0 229 Female Firefox Search engine PC (Windows)
34703 2016-02-11 300.81143 1 34 Male Safari Recommendations Tablet (Android)
55134 2016-02-17 324.48718 0 2 Male Firefox Direct Other
46247 2016-02-23 307.71362 1 2 Male Safari Recommendations Smartphone (Android)
15422 2016-02-29 118.30818 1 2 Female Safari Recommendations Smartphone (Android)
36341 2016-03-06 330.35165 1 51 Male Chrome Social network PC (Windows)
54562 2016-03-12 325.20392 0 51 Female Chrome Search engine Smartphone (I0S)
63833 2016-03-18 316.09775 1 51 Female Chrome Search engine Smartphone (I0S)
43941 2016-03-24 263.73654 1 51 Male Firefox Search engine PC (Windows)
38583 2016-03-30 363.8421 0 51 Female Safari Internal Smartphone (Android)
65839 2016-04-05 284.4762 1 51 Male Firefox Direct PC (Windows)
38024 2016-04-11 305.58286 1 51 Male Edge Search engine PC (Windows)
30573 2016-04-17 170.59955 0 52 Male Edge Search engine PC (Windows)
59083 2016-04-23 512.53 0 52 Male Edge Search engine PC (Windows)
40305 2016-04-29 324.144 0 52 Female Firefox Direct Other
Now you have a Managed ClickHouse® cluster with data in it. See the links below to continue exploring!