Use Managed Service for ClickHouse® API
Before you start
Step 1. Create your cluster
ClickHouse® clusters are one or more database hosts.
Use the ClusterService
create method.
To create a cluster with the minimal configuration, provide the following parameters:The required parameters to create a functional cluster:
-
project_id
: ID of your project. You can get it on your project's information page. -
cloud_type
:aws
. -
region_id
:eu-central-1
. -
name
: Name of your cluster, such asclickhouse-dev
. -
resources
: Specify the parameters from the doublecloud.clickhouse.v1.ClusterResources model:-
resource_preset_id
:s2-c2-m4
. -
disk_size
:34359738368
(32 GB). -
replica_count
:1
. -
shard_count
:1
.
-
import json
import logging
from google.protobuf.wrappers_pb2 import Int64Value
import doublecloud
from doublecloud.clickhouse.v1.cluster_pb2 import ClusterResources
from doublecloud.clickhouse.v1.cluster_service_pb2 import CreateClusterRequest
from doublecloud.clickhouse.v1.cluster_service_pb2_grpc import ClusterServiceStub
def create_cluster(sdk, project_id, region_id, name, network_id):
cluster_service = sdk.client(ClusterServiceStub)
operation = cluster_service.Create(
CreateClusterRequest(
project_id=project_id,
cloud_type="aws",
region_id="eu-central-1",
name="tutorial-cluster",
resources=ClusterResources(
clickhouse=ClusterResources.Clickhouse(
resource_preset_id="s2-c2-m4",
disk_size=Int64Value(value=34359738368),
replica_count=Int64Value(value=1),
shard_count=Int64Value(value=1),
)
),
network_id=network_id,
)
)
logging.info("Creating initiated")
return operation
Step 2. Connect to the cluster and create a database and table
-
Go to the Clusters
-
If the cluster status has changed to Alive, select it in the cluster list.
-
Click WebSQL at the top right.
-
In WebSQL, click on any database in the connection manager on the left to open the query editor.
-
Create a database with the CREATE query:
CREATE DATABASE IF NOT EXISTS website_data ON CLUSTER default
-
Make sure that the database has been created:
SHOW DATABASES
┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ website_data │ // your database │ information_schema │ │ system │ └────────────────────┘
-
Add a table to the database. The columns will match the data in the example dataset:
CREATE TABLE website_data.hits ON CLUSTER default ( Hit_ID Int32, Date Date, Time_Spent Float32, Cookie_Enabled Int32, Region_ID Int32, Gender String, Browser String, Traffic_Source String, Technology String ) ENGINE = ReplicatedMergeTree() ORDER BY (Hit_ID, Date)
-
Make sure that the table has been created:
SHOW TABLES FROM website_data
┌─name─┐ │ hits │ └──────┘
Tip
Alternatively, you can use an IDE of your choice to connect to the cluster and create a database.
Step 3. Upload data
-
To fetch sample data and insert it into the database, run the following command:
INSERT INTO website_data.hits SELECT * FROM s3('https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.csv', CSVWithNames) SETTINGS format_csv_delimiter = ';'
-
To view the uploaded data, run a
SELECT
query:SELECT * FROM website_data.hits LIMIT 5
The output should look as follows:
┌─Hit_ID─┬───────Date─┬─Time_Spent─┬─Cookie_Enabled─┬─Region_ID─┬─Gender─┬─Browser─┬─Traffic_Source──┬─Technology───────────┐ │ 14230 │ 2017-01-30 │ 265.70175 │ 1 │ 2 │ Female │ Firefox │ Direct │ PC (Windows) │ │ 14877 │ 2017-04-12 │ 317.82758 │ 0 │ 229 │ Female │ Firefox │ Direct │ PC (Windows) │ │ 14892 │ 2017-07-29 │ 191.0125 │ 1 │ 55 │ Female │ Safari │ Recommendations │ Smartphone (Android) │ │ 15071 │ 2017-06-11 │ 148.58064 │ 1 │ 159 │ Female │ Chrome │ Ad traffic │ PC (Windows) │ │ 15110 │ 2016-09-02 │ 289.48334 │ 1 │ 169 │ Female │ Chrome │ Search engine │ Smartphone (IOS) │ └────────┴────────────┴────────────┴────────────────┴───────────┴────────┴─────────┴─────────────────┴──────────────────────┘
Now you have a Managed ClickHouse® cluster with data in it. See the links below to continue exploring!
Download the complete code example
You can download the full code listing for all the steps above from our Python SDK GitHub repository