Your quick-start into the DoubleCloud world

Take your first steps into the exciting new world of data management with DoubleCloud!

To understand how our service works and to start using it in your day-to-day work, let's learn the fundamentals:

  • Create a cluster and a database for data storage.
  • Transfer the data from a remote data warehouse to a DoubleCloud cluster.

Following this straightforward step-by-step quick-start tutorial, you will learn how to:

  1. Prepare to acquire the data

    1. Create a Managed ClickHouse® cluster

    2. Create a ClickHouse® database

  2. Transfer the data

    1. Create a source endpoint

    2. Create a target endpoint

    3. Create and activate a transfer

Prepare to acquire the data

The first things you need to do are to create a cluster and a database that will store the data:

  1. Create a Managed ClickHouse® cluster

    This is your resource allocation tool. It allows you to acquire CPU, memory, and storage quotas to operate your databases.

  2. Create a ClickHouse® database

    This section will explain how to talk to your cluster directly from the Linux terminal and use the ClickHouse® CLI toolkit.

Create a Managed ClickHouse® cluster

Warning

During the trial period, you can create clusters with up to 8 cores, 32 GB RAM, and 400 GB storage. If you need to raise the quotas, don't hesitate to contact our support.

  1. Go to the console.

  2. Log in to DoubleCloud if you already have an account, or create one if you open the console for the first time.

  3. Select Clusters from the list of services on the left.

  4. Click Create cluster in the upper-right corner of the page.

    1. Select ClickHouse®.

    2. Choose a provider and a region closest to your geographical location.

    3. Under Resources:

      • Select a preset for CPU, RAM capacity, and storage space. The minimal s1-c2-m4 preset will be more than enough for this tutorial.

      • Keep 1 replica and 1 shard.

    4. Under Basic settings:

      • Enter the cluster Name: doublecloud-quickstart.

      • Keep the Version as is - this is the latest stable version of ClickHouse®.

    5. Under NetworkingVPC, specify in which DoubleCloud VPC to locate your cluster. Use the default value in the previously selected region if you don't need to create this cluster in a specific network.

    6. Click Submit.

    Your cluster will appear with the Creating status on the Clusters page. Setting everything up may take some time. You can safely go to the next section of the tutorial while the cogs are moving in the background.

  5. When the cluster is ready to operate, its state in the console will change to Alive:

    quickstart-cluster-ready

    Tip

    The DoubleCloud service creates the superuser admin and its password automatically. You can find both the User and the Password in the Overview tab on the cluster information page.

    To create users for other roles, see Manage ClickHouse® users

Create a ClickHouse® database

This section gives you a glimpse into talking directly to your Managed ClickHouse® cluster from your Linux terminal.

Tip

This tutorial shows how to use a CLI client with Docker , but you can use other tools of your choice. Refer to the following article to see other connection options: Connect to a ClickHouse® database.

  1. Open your terminal.

  2. (Optional) Start Docker if needed:

    service docker start
    
  3. Pull the clickhouse-client Docker image:

    docker pull clickhouse/clickhouse-client
    
  1. Open your terminal.

  2. Connect to the ClickHouse® official DEB repository from your Linux system:

    sudo apt update && sudo apt install -y apt-transport-https ca-certificates dirmngr && \
    sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 && \
    echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
    
  3. Refresh the package list and install the clickhouse-client :

    sudo apt update && sudo apt install -y clickhouse-client
    
  1. Open your terminal.

  2. Connect to a ClickHouse® official RPM repository from your Linux system:

    sudo yum install -y yum-utils
    sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
    
  3. Install the clickhouse-client :

    sudo yum install -y clickhouse-client
    

Warning

If you run a RedHat 7-based Linux distribution, including Cent OS 7, Oracle Linux 7 and others, you need to download and install trusted certificates and manually add the path to them in the clickhouse-client configuration file as follows:

  1. Install the root certificate:

    curl https://letsencrypt.org/certs/isrg-root-x2-cross-signed.pem > \ 
    /etc/pki/ca-trust/source/anchors/isrg-root-x2-cross-signed.pem
    
  2. Install the intermediate certificate:

    curl https://letsencrypt.org/certs/lets-encrypt-r3-cross-signed.pem > \
    /etc/pki/ca-trust/source/anchors/lets-encrypt-r3-cross-signed.pem
    
  3. Update the list of trusted certificates:

    sudo update-ca-trust
    
  4. Locate your clickhouse-client configuration file (by default, you can find it at /etc/clickhouse-client/config.xml) and add the path to the certificates into the <openSSL> section:

    <client> <!-- Used for connection to server's secure tcp port -->
       <loadDefaultCAFile>true</loadDefaultCAFile>
       <cacheSessions>true</cacheSessions>
       <disableProtocols>sslv2,sslv3</disableProtocols>
       <preferServerCiphers>true</preferServerCiphers>
       <caConfig>/etc/ssl/certs/ca-bundle.crt</caConfig>
       <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
       <invalidCertificateHandler>
       <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
       <name>RejectCertificateHandler</name>
       </invalidCertificateHandler>
    </client>
    

The software is ready to go. Let's connect to your new cluster:

  1. Select Clusters from the list of services on the left.

  2. Select the name of your cluster to open its information page. By default, you will see the Overview tab.

  3. Under Connection strings, find the Native interface string and click Copy.

  4. Run the following command in your Linux terminal:

    docker run --network host --rm -it clickhouse/<Native interface connection string>
    
    The complete Docker command structure
    docker run --network host --rm -it \ 
                clickhouse/clickhouse-client \
                --host <FQDN of your cluster> \
                --secure \
                --user admin \
                --password <Cluster user password> \
                --port 9440 
    
    <Native interface connection string>
    

You are now connected to your cluster via the clickhouse-client. It's time to create a database. Let's call it start_db:

CREATE DATABASE IF NOT EXISTS "start_db"
  1. Let's test if the database was created successfully. Type SHOW DATABASES. You should see start_db in the readout:

    ┌─name───────────────┐
    │ INFORMATION_SCHEMA │
    │ _system            │
    │ default            │
    │ information_schema │
    │ start_db           │
    │ system             │
    └────────────────────┘
    

Transfer the data

Now it's time to set up the tools to get the data from a remote source and transfer it to your start_db ClickHouse® database. To accomplish this, you need to complete the following steps:

  • Create a source endpoint

    This is your data fetcher. It will connect to a remote source and send the data to your Managed ClickHouse® cluster.

  • Create a target endpoint

    This is your receiver. It will acquire the data sent by the source endpoint and write it to the database on your Managed ClickHouse® cluster.

  • Create and activate a transfer

    This is your data pipeline tool. It will connect your endpoints and ensure the integrity of the data.

Create a source endpoint

  1. In the list of services, select Transfer.

  2. Select Endpoints tab, click Create endpoint and choose Source.

  3. Select S3 as the Source type.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: s3-source-quickstart.

    2. (optional) Enter a Description of the endpoint.

  5. Specify endpoint parameters under Endpoint settings:

    1. Specify the Dataset: bookings.

    2. Provide the Path pattern: data-sets/bookings.csv.

    3. Auto-infer the Schema by typing {}.

    4. Select the data format - CSV.

  6. Under CSV, specify the Delimiter - ;. Keep the rest of the fields with their default values.

    This is what it should look like on your screen:

    image-namesource-endpoint-filled-in

  7. Under S3: Amazon Web Services, enter the name of the Bucket: doublecloud-docs. As the bucket is public, leave the rest of the fields blank.

  8. Click Submit. You'll see the following line on your Endpoints list:

    source-endpoint-ready

The transmitter is ready to go. We need to create an endpoint to receive the data from a remote source.

Create a target endpoint

  1. In the list of services, select Transfer.

  2. Select the Endpoints tab, click Create endpoint and choose Target.

  3. Select ClickHouse® as the Target type.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: clickhouse-target-quickstart

    2. (optional) Enter a Description of the endpoint.

  5. Specify endpoint parameters under Endpoint settings:

    1. Select connection type. This tutorial transfers data to the Managed cluster.

    2. Specify the connection properties:

      • Under Managed cluster, select your cluster name (doublecloud-quickstart) from the drop-down list.

      • Specify the User of the database: admin.

      • Enter the Password of the database user.

      • Specify the Database name you want to transfer the data to: start_db.

    This is what it should look like on your screen:

    image-namesource-endpoint-filled-in

    1. Under Cleanup policy, select Drop.
  6. Leave all the other fields blank or with their default values.

  7. Click Submit. You'll see the following line on your Endpoints list:

    target-endpoint-ready

Good work. Now we've created an endpoint that will receive and write the data to your ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.

Create and activate a transfer

  1. In the list of services, select Transfer.

  2. Click Create transfer.

  3. Under Endpoints:

    1. Select s3-source-quickstart from the Source drop-down menu.

    2. Select clickhouse-target-quickstart from the Target.

  4. Under Basic settings:

    1. Enter the transfer Name: transfer-quickstart

    2. (optional) Enter the transfer Description.

  5. Under Transfer settings, select the Transfer type. In this use case, we choose Snapshot to make the transfer process as fast as possible.

    This is what it should look like on your screen:

    transfers-form-filled-in

  6. Leave all the other fields blank or with their default values.

  7. Click Submit. You will see the following line in your Transfers tab:

    transfer-ready

  8. After you've created a transfer, click Activate.

  9. Wait until your transfer status changes to Done.

  10. Check the data transferred to your ClickHouse® database:

    1. Open your Linux Terminal.

    2. Connect to your cluster and type the following command:

      SELECT * FROM "start_db".bookings LIMIT 100
      

Nice work! You have all the data transferred from a remote source and replicated with complete integrity in your own ClickHouse® database.

Keep exploring

For more information on what you can do with DoubleCloud, see the links below and continue exploring!