Your quick-start into the DoubleCloud world
Take your first steps into the exciting new world of data management with DoubleCloud!
To understand how our service works and to start using it in your day-to-day work, let's learn the fundamentals:
- Create a cluster and a database for data storage.
- Transfer the data from a remote data warehouse to a DoubleCloud cluster.
Following this straightforward step-by-step quick-start tutorial, you will learn how to:
Step 1. Prepare to acquire the data
The first things you need to do are to create a cluster and a database that will store the data:
-
Create a Managed ClickHouse® cluster
This is your resource allocation tool. It allows you to acquire CPU, memory, and storage quotas to operate your databases.
-
This section will explain how to connect to the cluster from your browser using WebSQL.
Step 1.1. Create a Managed ClickHouse® cluster
Warning
During the trial period, you can create clusters with up to 8 cores, 32 GB RAM, and 400 GB storage. If you need to raise the quotas, don't hesitate to contact our support.
-
Go to the console.
-
Log in to DoubleCloud if you already have an account, or create one if you open the console for the first time.
-
Select Clusters from the list of services on the left.
-
Click Create cluster at the top right.
-
Select ClickHouse®.
-
Choose a provider and a region closest to your geographical location.
-
Under Resources:
-
Select a preset for CPU, RAM capacity, and storage space. The minimal
s2-c2-m4
preset will be more than enough for this tutorial.A resource preset has the following structure:
<cpu-platform>-c<number-of-cpu-cores>-m<gigabytes-of-ram>
There are three available CPU platforms:
-
g
: ARM Graviton -
i
: Intel (x86) -
s
: AMD (x86)
For example, the
i2-c2-m8
preset corresponds to an Intel-family machine with 2 virtual cores and 8 GB of RAM.Learn more about the availability of CPU platforms in areas and regions
-
-
Keep
1
replica and1
shard.
-
-
Under Basic settings:
-
Enter the cluster Name:
clickhouse-dev
. -
Keep the Version as is - this is the latest stable version of ClickHouse®.
-
-
Under Networking → VPC, select the network where you want to create the cluster.
If you don’t need to place the cluster in a specific network, leave the preselected default option.
-
Click Submit.
-
-
When the cluster is ready to operate, its state in the console will change to Alive:
Tip
DoubleCloud creates the
admin
superuser and its password automatically. You can the username and password under Credentials in the Overview tab on the cluster page.To learn how to create users for other roles, refer to Manage ClickHouse® users
Step 1.2. Create a ClickHouse® database
-
After the cluster status has changed to Alive, select it in the cluster list.
-
Click WebSQL at the top right.
-
In WebSQL, click on any database in the connection manager on the left to open the query editor.
-
To create a database, enter the following query and click Execute:
CREATE DATABASE IF NOT EXISTS start_db ON CLUSTER default
-
Let's test if the database was created successfully. Type
SHOW DATABASES
. You should seestart_db
in the readout:┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ information_schema │ │ start_db │ // your database │ system │ └────────────────────┘
Step 2. Transfer the data
Now it's time to set up the tools to get the data from a remote source and transfer it to your start_db
ClickHouse® database. To accomplish this, you need to complete the following steps:
-
This is your data fetcher. It will connect to a remote source and send the data to your Managed ClickHouse® cluster.
-
This is your receiver. It will acquire the data sent by the source endpoint and write it to the database on your Managed ClickHouse® cluster.
-
Create and activate a transfer
This is your data pipeline tool. It will connect your endpoints and ensure the integrity of the data.
Step 2.1. Create a source endpoint
-
In the list of services, select Transfer.
-
Click Create → Source endpoint.
-
In Source type, select Object storage.
-
Under Basic settings:
-
Enter the Name of the endpoint:
s3-source-dev
. -
(optional) Enter a Description of the endpoint.
-
-
Move on to Endpoint parameters.
-
Under S3: Bucket connection configuration:
- In Bucket name, enter
doublecloud-docs
.
Leave all other parameters in this block empty. This bucket is public, and Transfer can connect to it using default parameters.
- In Bucket name, enter
-
In Path pattern, enter
data-sets/bookings.csv
. -
Under Data format:
-
Select CSV in the dropdown.
-
Under CSV → Delimiter, select Common.
-
In the dropdown, select Semicolon ; as the common delimiter.
Leave all other parameters in this block empty.
-
-
Under Dataset:
-
In Schema, enter
{}
. This tells Transfer to auto-infer the schema. -
In Table, enter
bookings
.
-
-
(Optional). Test the source endpoint:
-
Click Test connection.
-
Select the runtime type you want Transfer to use for connecting to the database.
-
Dedicated: Transfer connects to the database using a specified internal or external network.
-
Serverless: Transfer connects to the database available from the internet using an automatically chosen network.
Runtime compatibility warning
Don't use endpoints with different runtime types in the same transfer — this will cause the transfer to fail.
-
-
If you selected the dedicated runtime, select the network in the dropdown.
-
Click Test connection.
Testing the connection may take a few minutes.
-
-
Click Submit. You'll see the following line on your Endpoints list:
The transmitter is ready to go. We need to create an endpoint to receive the data from a remote source.
Step 2.2. Create a target endpoint
-
In the list of services, select Transfer.
-
Click Create → Target endpoint.
-
In Target type, select ClickHouse.
-
Under Basic settings:
-
Enter the Name of the endpoint:
clickhouse-target-dev
-
(optional) Enter a Description of the endpoint.
-
-
Move on to Endpoint parameters.
-
Under Connection settings:
-
In Connection type, select Managed cluster.
-
In Managed cluster, select clickhouse-dev in the dropdown.
-
In Authentication, select Default. The endpoint will connect to the cluster as the
admin
user. -
In Database, enter
start_db
. This is the database where your data is transferred to.
This is what it should look like on your screen:
-
-
In Cleanup policy, select Drop.
-
Leave all the other fields blank or with their default values.
-
Click Submit. You'll see the following line on your Endpoints list:
Good work. Now we've created an endpoint that will receive and write the data to your ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.
Step 2.3. Create and activate a transfer
-
In the list of services, select Transfer.
-
Click Create transfer.
-
Under Endpoints:
-
From the Source dropdown menu, select s3-source-dev.
-
From Target, select clickhouse-target-dev .
-
-
Under Basic settings:
-
Enter the transfer Name:
transfer-dev
-
(optional) Enter the transfer Description.
-
-
Under Transfer settings, select the Transfer type. In this use case, we choose
Snapshot
to make the transfer process as fast as possible.This is what it should look like on your screen:
-
Leave all the other fields blank or with their default values.
-
Click Submit. You will see the following line in your Transfers tab:
-
After you've created a transfer, click
-
Wait until your transfer status changes to Done.
-
Check the data transferred to your ClickHouse® database:
-
Open WebSQL.
-
Run the following command:
SELECT * FROM "start_db".bookings LIMIT 100
-
Nice work! You have all the data transferred from a remote source and replicated with complete integrity in your own ClickHouse® database.
Keep exploring
For more information on what you can do with DoubleCloud, see the links below and continue exploring!
-
You can also follow this tutorial in a short 5-minute video on our Youtube channel: