Get started with Managed Service for ClickHouse®
ClickHouse® is the fastest, most resource-efficient OLAP database, which queries billions of rows in milliseconds and is trusted by thousands of companies for real-time analytics. This guide walks you through how to create a Managed ClickHouse® cluster on DoubleCloud, connect to it, and upload sample data.
To make it easier for you to test ClickHouse®, DoubleCloud provides sample datasets. For this guide, you can use a small sample dataset with website hits that's stored in an S3 bucket.
Tip
If you're already familiar with ClickHouse® and know how to configure it, refer to Create a Managed ClickHouse® cluster with more detailed instructions instead.
Before you begin
-
Log in or sign up to the DoubleCloud console
Note
If you're a new DoubleCloud user, this tutorial won't incur you any costs — you can use the trial period credits to test the platform, including creating fully operational clusters.
Step 1. Create a cluster
-
Go to the Clusters
-
Select ClickHouse.
Tip
The cluster creation page contains various options that allow you to configure the cluster for your needs. If you're just testing ClickHouse® and DoubleCloud now, you can go with the default settings that will create a fully functional cluster with minimal resource configuration. To do that, click Submit at the bottom of the page and skip to Step 2. Connect to the cluster.
Otherwise, if you want to learn how you can configure the cluster, continue with the following steps.
-
Review the Provider and Region settings.
You can create Managed ClickHouse® clusters on AWS or Google Cloud in any of the available regions. By default, DoubleCloud preselects the region nearest to you.
-
Review Resources.
For this getting started guide, the defaults are enough. However, when you create a production cluster, make sure to select three replicas to ensure high availability.
-
Under Basic settings enter the cluster name, such as
clickhouse-dev
. Leave the latest LTS version that's preselected. -
Review the Advanced settings.
For this getting started guide, the defaults are enough. For a production cluster, make sure to select dedicated keeper hosts, so that they don't compete for resources with ClickHouse® itself.
-
Click Submit.
Creating a cluster usually takes five to seven minutes depending on the cloud provider and region. When the cluster is ready, its status changes from Creating to Alive.
Step 2. Connect to the cluster and create a database and table
-
After the cluster status has changed to Alive, select it in the cluster list.
-
Click WebSQL at the top right.
-
In WebSQL, click on any database in the connection manager on the left to open the query editor.
-
Create a database:
CREATE DATABASE IF NOT EXISTS website_data ON CLUSTER default
-
Make sure that the database has been created:
SHOW DATABASES
┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ website_data │ // your database │ information_schema │ │ system │ └────────────────────┘
-
Add a table to the database. The columns will match the data in the example dataset:
CREATE TABLE website_data.hits ON CLUSTER default ( Hit_ID Int32, Date Date, Time_Spent Float32, Cookie_Enabled Int32, Region_ID Int32, Gender String, Browser String, Traffic_Source String, Technology String ) ENGINE = ReplicatedMergeTree() ORDER BY (Hit_ID, Date)
-
Make sure that the table has been created:
SHOW TABLES FROM website_data
┌─name─┐ │ hits │ └──────┘
Step 3. Insert and query data
-
To fetch sample data and insert it into the database, run the following command:
INSERT INTO website_data.hits SELECT * FROM s3('https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.csv', CSVWithNames) SETTINGS format_csv_delimiter = ';'
-
To view the uploaded data, run a
SELECT
query:SELECT * FROM website_data.hits LIMIT 5
The output should look as follows:
┌─Hit_ID─┬───────Date─┬─Time_Spent─┬─Cookie_Enabled─┬─Region_ID─┬─Gender─┬─Browser─┬─Traffic_Source──┬─Technology───────────┐ │ 14230 │ 2017-01-30 │ 265.70175 │ 1 │ 2 │ Female │ Firefox │ Direct │ PC (Windows) │ │ 14877 │ 2017-04-12 │ 317.82758 │ 0 │ 229 │ Female │ Firefox │ Direct │ PC (Windows) │ │ 14892 │ 2017-07-29 │ 191.0125 │ 1 │ 55 │ Female │ Safari │ Recommendations │ Smartphone (Android) │ │ 15071 │ 2017-06-11 │ 148.58064 │ 1 │ 159 │ Female │ Chrome │ Ad traffic │ PC (Windows) │ │ 15110 │ 2016-09-02 │ 289.48334 │ 1 │ 169 │ Female │ Chrome │ Search engine │ Smartphone (IOS) │ └────────┴────────────┴────────────┴────────────────┴───────────┴────────┴─────────┴─────────────────┴──────────────────────┘
Step 4 (optional). Clean up
When you no longer need resources, it's good practice to stop or delete them, so that you don't incur additional costs.
-
To stop a Managed ClickHouse® cluster, select it on the Clusters
Note
When your cluster is stopped, DoubleCloud doesn't charge you for the CPU and RAM, but you're still billed for SSD Storage.
-
To delete a cluster, select it on the Clusters
What’s next
Now that you have learned how to create a cluster and upload sample data to it, continue exploring the DoubleCloud platform or create a production Managed ClickHouse® cluster for your needs.