Your quick-start into the DoubleCloud world

Take your first steps into the exciting new world of data management with DoubleCloud!

To understand how our service works and to start using it in your day-to-day work, let's learn the fundamentals:

  • Create a cluster and a database for data storage.
  • Transfer the data from a remote data warehouse to a DoubleCloud cluster.

Following this straightforward step-by-step quick-start tutorial, you will learn how to:

  1. Prepare to acquire the data

    1. Create a Managed ClickHouse® cluster

    2. Create a ClickHouse® database

  2. Transfer the data

    1. Create a source endpoint

    2. Create a target endpoint

    3. Create and activate a transfer

Step 1. Prepare to acquire the data

The first things you need to do are to create a cluster and a database that will store the data:

  1. Create a Managed ClickHouse® cluster

    This is your resource allocation tool. It allows you to acquire CPU, memory, and storage quotas to operate your databases.

  2. Create a ClickHouse® database

    This section will explain how to connect to the cluster from your browser using WebSQL.

Step 1.1. Create a Managed ClickHouse® cluster

Warning

During the trial period, you can create clusters with up to 8 cores, 32 GB RAM, and 400 GB storage. If you need to raise the quotas, don't hesitate to contact our support.

  1. Go to the console.

  2. Log in to DoubleCloud if you already have an account, or create one if you open the console for the first time.

  3. Select Clusters from the list of services on the left.

  4. Click Create cluster at the top right.

    1. Select ClickHouse®.

    2. Choose a provider and a region closest to your geographical location.

    3. Under Resources:

      • Select a preset for CPU, RAM capacity, and storage space. The minimal s2-c2-m4 preset will be more than enough for this tutorial.

        A resource preset has the following structure:

        <cpu-platform>-c<number-of-cpu-cores>-m<gigabytes-of-ram>
        

        There are three available CPU platforms:

        • g: ARM Graviton

        • i: Intel (x86)

        • s: AMD (x86)

        For example, the i2-c2-m8 preset corresponds to an Intel-family machine with 2 virtual cores and 8 GB of RAM.

        Learn more about the availability of CPU platforms in areas and regions

      • Keep 1 replica and 1 shard.

    4. Under Basic settings:

      • Enter the cluster Name: clickhouse-dev.

      • Keep the Version as is - this is the latest stable version of ClickHouse®.

    5. Under NetworkingVPC, select the network where you want to create the cluster.

      If you don’t need to place the cluster in a specific network, leave the preselected default option.

    6. Click Submit.

  5. When the cluster is ready to operate, its state in the console will change to Alive:

    Screenshot of a ClickHouse® cluster page in the DoubleCloud console

    Tip

    DoubleCloud creates the admin superuser and its password automatically. You can the username and password under Credentials in the Overview tab on the cluster page.

    To learn how to create users for other roles, refer to Manage ClickHouse® users

Step 1.2. Create a ClickHouse® database

  1. After the cluster status has changed to Alive, select it in the cluster list.

  2. Click WebSQL at the top right.

  3. In WebSQL, click on any database in the connection manager on the left to open the query editor.

  4. To create a database, enter the following query and click Execute:

    CREATE DATABASE IF NOT EXISTS start_db ON CLUSTER default
    
  5. Let's test if the database was created successfully. Type SHOW DATABASES. You should see start_db in the readout:

    ┌─name───────────────┐
    │ INFORMATION_SCHEMA │
    │ _system            │
    │ default            │
    │ information_schema │
    │ start_db           │  // your database
    │ system             │
    └────────────────────┘
    

Step 2. Transfer the data

Now it's time to set up the tools to get the data from a remote source and transfer it to your start_db ClickHouse® database. To accomplish this, you need to complete the following steps:

  • Create a source endpoint

    This is your data fetcher. It will connect to a remote source and send the data to your Managed ClickHouse® cluster.

  • Create a target endpoint

    This is your receiver. It will acquire the data sent by the source endpoint and write it to the database on your Managed ClickHouse® cluster.

  • Create and activate a transfer

    This is your data pipeline tool. It will connect your endpoints and ensure the integrity of the data.

Step 2.1. Create a source endpoint

  1. In the list of services, select Transfer.

  2. Click CreateSource endpoint.

  3. In Source type, select Object storage.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: s3-source-dev.

    2. (optional) Enter a Description of the endpoint.

  5. Move on to Endpoint parameters.

  6. Under S3: Bucket connection configuration:

    1. In Bucket name, enter doublecloud-docs.

    Leave all other parameters in this block empty. This bucket is public, and Transfer can connect to it using default parameters.

    Screenshot showing basic settings of a new Object storage source endpoint

  7. In Path pattern, enter data-sets/bookings.csv.

  8. Under Data format:

    1. Select CSV in the dropdown.

    2. Under CSVDelimiter, select Common.

    3. In the dropdown, select Semicolon ; as the common delimiter.

    Leave all other parameters in this block empty.

    Screenshot of the Data format block

  9. Under Dataset:

    1. In Schema, enter {}. This tells Transfer to auto-infer the schema.

    2. In Table, enter bookings.

    Screenshot of the Dataset block

  10. (Optional). Test the source endpoint:

    1. Click Test connection.

    2. Select the runtime type you want Transfer to use for connecting to the database.

      • Dedicated: Transfer connects to the database using a specified internal or external network.

      • Serverless: Transfer connects to the database available from the internet using an automatically chosen network.

      Runtime compatibility warning

      Don't use endpoints with different runtime types in the same transfer — this will cause the transfer to fail.

    3. If you selected the dedicated runtime, select the network in the dropdown.

      Screenshot of the endpoint testing dialog

    4. Click Test connection.

    Testing the connection may take a few minutes.

  11. Click Submit. You'll see the following line on your Endpoints list:

    Screenshot of the new source endpoint in the endpoint table

The transmitter is ready to go. We need to create an endpoint to receive the data from a remote source.

Step 2.2. Create a target endpoint

  1. In the list of services, select Transfer.

  2. Click CreateTarget endpoint.

  3. In Target type, select ClickHouse.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: clickhouse-target-dev

    2. (optional) Enter a Description of the endpoint.

  5. Move on to Endpoint parameters.

  6. Under Connection settings:

    1. In Connection type, select Managed cluster.

    2. In Managed cluster, select clickhouse-dev in the dropdown.

    3. In Authentication, select Default. The endpoint will connect to the cluster as the admin user.

    4. In Database, enter start_db. This is the database where your data is transferred to.

    This is what it should look like on your screen:

    Screenshot showing basic settings and endpoint configuration of a new ClickHouse® target endpoint

  7. In Cleanup policy, select Drop.

    Screenshot of the cleanup policy configuration

  8. Leave all the other fields blank or with their default values.

  9. Click Submit. You'll see the following line on your Endpoints list:

    Screenshot of the new target endpoint in the endpoint table

Good work. Now we've created an endpoint that will receive and write the data to your ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.

Step 2.3. Create and activate a transfer

  1. In the list of services, select Transfer.

  2. Click Create transfer.

  3. Under Endpoints:

    1. From the Source dropdown menu, select s3-source-dev.

    2. From Target, select clickhouse-target-dev .

  4. Under Basic settings:

    1. Enter the transfer Name: transfer-dev

    2. (optional) Enter the transfer Description.

  5. Under Transfer settings, select the Transfer type. In this use case, we choose Snapshot to make the transfer process as fast as possible.

    This is what it should look like on your screen:

    transfers-form-filled-in

  6. Leave all the other fields blank or with their default values.

  7. Click Submit. You will see the following line in your Transfers tab:

    transfer-ready

  8. After you've created a transfer, click Activate.

  9. Wait until your transfer status changes to Done.

  10. Check the data transferred to your ClickHouse® database:

    1. Open WebSQL.

    2. Run the following command:

      SELECT * FROM "start_db".bookings LIMIT 100
      

Nice work! You have all the data transferred from a remote source and replicated with complete integrity in your own ClickHouse® database.

Keep exploring

For more information on what you can do with DoubleCloud, see the links below and continue exploring!