The complete end-to-end DoubleCloud services guide

This comprehensive step-by-step scenario will give you an in-depth look at every major DoubleCloud resource. You'll familiarize yourself with how to create, manage, and make practical use of the resources in your brand new data management toolkit.

A typical use case for our service includes:

  • Creating a cluster and a database for data storage.

  • Transferring the data from the outside source to your DoubleCloud cluster.

  • Analyzing the data using Visualization, our built-in BI tool.

Let's look at the flow of this scenario:

quickstart-flow

There is a CSV file with some data we need to migrate to DoubleCloud and then analyze. To do this, we transfer the file from S3 to a Managed ClickHouse® cluster. From there, we connect our Visualization service to it, prepare the data, and build some impressive visualizations from it.

Following this straightforward step-by-step quickstart tutorial, you will learn how to:

  1. Prepare to acquire the data

    1. Create a Managed ClickHouse® cluster

    2. Create a ClickHouse® database

  2. Transfer the data

    1. Create a source endpoint

    2. Create a target endpoint

    3. Create and activate a transfer

  3. Visualize the data

    1. Create a workbook

    2. Create a connection

    3. Create a dataset

    4. Prepare your dataset for visualization

    5. Create a dashboard

    6. Create charts and selectors

    7. Create an area chart

    8. Create a bar chart

    9. Create an indicator

    10. Add a selector

Let's get to the point right now.

Tip

If you prefer learning by watching, you can complete the same tutorial following the steps from the video on our Youtube channel :

Prepare to acquire the data

First, you need to create a cluster and a database that'll store the data for further processing and visualization.

You need to complete the following steps:

  1. Create a Managed ClickHouse® cluster

    This is a resource allocation tool. It allows you to acquire CPU, memory, and storage quotas to operate your databases.

  2. Create a ClickHouse® database

    This section will give you an understanding of how to talk to your cluster directly from the Linux terminal and use ClickHouse® CLI toolkit.

Create a Managed Service for ClickHouse® Cluster

Warning

During the trial period, you can create clusters with up to 8 cores, 32 GB RAM, and 400 GB storage. If you need to raise the quotas, don't hesitate to contact our support.

  1. Go to the console. You'll see your project's Clusters page.

  2. Click Create cluster in the upper-right corner of the page.

    1. Select ClickHouse®.

    2. Choose a provider and a region closest to your geographical location.

    3. Under Resources:

      • Select a preset for CPU, RAM capacity, and storage space. The first s1-c2-m4 preset will be more than enough for our scenario.

      • Select 2 replicas to improve performance.

      • Select 2 shards to enhance fault tolerance.

    4. Under Basic settings:

      • Enter the cluster Name: doublecloud-quickstart.

      • Keep the Version as is - this is the latest stable version of ClickHouse®.

    5. Under NetworkingVPC, specify in which DoubleCloud VPC to locate your cluster. Use the default value in the previously selected region if you don't need to create this cluster in a specific network.

    6. Click Submit.

Your cluster will appear with the Creating status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive.

Tip

The DoubleCloud service creates the superuser admin and its password automatically. You can find both the User and the Password in the Overview tab on the cluster information page.

To create users for other roles, see Manage ClickHouse® users

Create a ClickHouse® database

This step gives you a glimpse into talking with your Managed ClickHouse® cluster directly from your Linux system.

Please note - this tutorial is based on a DEB-based Linux distribution . You can use a standalone system or a Windows Subsystem for Linux solution.

You need to install clickhouse-client software to be able to talk to your new cluster via the Linux terminal:

  1. Open your terminal.

  2. (Optional) Start Docker if needed:

    service docker start
    
  3. Pull the clickhouse-client Docker image:

    docker pull clickhouse/clickhouse-client
    
  1. Open your terminal.

  2. Connect to the ClickHouse® official DEB repository from your Linux system:

    sudo apt update && sudo apt install -y apt-transport-https ca-certificates dirmngr && \
    sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 && \
    echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
    
  3. Refresh the package list and install the clickhouse-client :

    sudo apt update && sudo apt install -y clickhouse-client
    
  1. Open your terminal.

  2. Connect to a ClickHouse® official RPM repository from your Linux system:

    sudo yum install -y yum-utils
    sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
    
  3. Install the clickhouse-client :

    sudo yum install -y clickhouse-client
    

Warning

If you run a RedHat 7-based Linux distribution, including Cent OS 7, Oracle Linux 7 and others, you need to download and install trusted certificates and manually add the path to them in the clickhouse-client configuration file as follows:

  1. Install the root certificate:

    curl https://letsencrypt.org/certs/isrg-root-x2-cross-signed.pem > \ 
    /etc/pki/ca-trust/source/anchors/isrg-root-x2-cross-signed.pem
    
  2. Install the intermediate certificate:

    curl https://letsencrypt.org/certs/lets-encrypt-r3-cross-signed.pem > \
    /etc/pki/ca-trust/source/anchors/lets-encrypt-r3-cross-signed.pem
    
  3. Update the list of trusted certificates:

    sudo update-ca-trust
    
  4. Locate your clickhouse-client configuration file (by default, you can find it at /etc/clickhouse-client/config.xml) and add the path to the certificates into the <openSSL> section:

    <client> <!-- Used for connection to server's secure tcp port -->
       <loadDefaultCAFile>true</loadDefaultCAFile>
       <cacheSessions>true</cacheSessions>
       <disableProtocols>sslv2,sslv3</disableProtocols>
       <preferServerCiphers>true</preferServerCiphers>
       <caConfig>/etc/ssl/certs/ca-bundle.crt</caConfig>
       <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
       <invalidCertificateHandler>
       <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
       <name>RejectCertificateHandler</name>
       </invalidCertificateHandler>
    </client>
    

The software is ready to go. Let's connect to your new cluster:

  1. Select Clusters from the list of services on the left.

  2. Select the name of your cluster to open its information page. By default, you will see the Overview tab.

  3. Under Connection strings, find the Native interface string and click Copy.

  4. Run the following command in your Linux terminal:

    docker run --network host --rm -it clickhouse/<Native interface connection string>
    
    The complete Docker command structure
    docker run --network host --rm -it \ 
                clickhouse/clickhouse-client \
                --host <FQDN of your cluster> \
                --secure \
                --user admin \
                --password <Cluster user password> \
                --port 9440 
    
    <Native interface connection string>
    

You are now connected to your cluster via the clickhouse-client. It's time to create the database for your Facebook Ads metrics data. Let's call it Sample-ClickHouse-DB:

  1. Type the following command in your Linux terminal:

    CREATE DATABASE IF NOT EXISTS sample_clickhouse_db
    
  2. Check if the database is available on the cluster. Type SHOW DATABASES. You'll see the following readout:

    ┌─name─────────────────┐
    │ INFORMATION_SCHEMA   │
    │ sample_clickhouse_db │
    │ _system              │
    │ default              │
    │ information_schema   │
    │ system               │
    └──────────────────────┘
    

Tip

Alternatively, you can use a visual IDE of your choice to connect to a cluster and create a database.

Transfer the data

Now it's time to set up the tools to get the data from a remote source and transfer it to your sample_clickhouse_db ClickHouse® database. To accomplish this, you need to complete the following steps:

  • Create a source endpoint

    This is your data fetcher. It will connect to a remote source and send the data to your Managed ClickHouse® cluster.

  • Create a target endpoint

    This is your receiver. It will acquire the data sent by the source endpoint and write it to the database on your Managed ClickHouse® cluster.

  • Create and activate a transfer

    This is your data pipeline tool. It will connect your endpoints and ensure the integrity of the data.

Create a source endpoint

  1. Go to your project's Transfer page.

  2. Select Endpoints tab, click Create endpoint, and choose Source.

  3. Select S3 as the Source type.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: s3-source-quickstart.

    2. (optional) Enter a Description of the endpoint.

  5. Specify endpoint parameters under Endpoint settings:

    1. Specify the Dataset: bookings.

    2. Provide the Path pattern: data-sets/bookings.csv.

    3. Keep the Schema field with its default value.

    4. From the dropdown menu, select the data format - CSV.

  6. Under CSV, specify the Delimiter - ;. Keep the rest of the fields with their default values.

  7. Under S3: Amazon Web Services, enter the name of the Bucket: doublecloud-docs. As the bucket is public, leave the rest of the fields blank.

  8. Click Submit.

The transmitter is ready to go. We need to create an endpoint that will receive the data from a remote source.

Create a target endpoint

  1. In the list of services, select Transfer.

  2. Select Endpoints tab, click Create endpoint and choose Target.

  3. Select ClickHouse® as the Target type.

  4. Under Basic settings:

    1. Enter the Name of the endpoint: clickhouse-target-quickstart

    2. (optional) Enter a Description of the endpoint.

  5. Specify endpoint parameters under Endpoint settings:

    1. Select connection type. This tutorial transfers data to the Managed cluster.

    2. Specify the connection properties:

      • Under Managed cluster, select your cluster name (doublecloud-quickstart) from the drop-down list.

      • Specify the User of the database: admin.

      • Enter the Password of the database user.

      • Specify the Database name you want to transfer the data to: sample_clickhouse_db.

  6. Leave all the other fields blank or with their default values.

  7. Click Submit.

Good work. Now we've created an endpoint that will receive and write the data to your ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.

Create and activate a transfer

  1. In the list of services, select Transfer.

  2. Click Create transfer.

  3. Under Endpoints:

    1. Select s3-source-quickstart from the Source drop-down menu.

    2. Select clickhouse-target-quickstart from the Target.

  4. Under Basic settings:

    1. Enter the transfer Name: transfer-quickstart

    2. (optional) Enter the transfer Description.

  5. Leave all the other fields blank or with their default values.

  6. Click Submit. You will see the following line on your Transfer page:

    transfer-ready

  7. After you've created a transfer, click Activate.

  8. Wait until your transfer status changes to Done.

  9. Check the data transferred to your ClickHouse® database:

    1. Open your Linux Terminal.

    2. Connect to your cluster with the Native interface string from your cluster's Overview tab and type the following command:

    SHOW TABLES FROM sample_clickhouse_db
    

    You should see the following output:

    ┌─name─────┐
    │ bookings │
    └──────────┘
    

Nice work! You have all the data transferred from a remote source and replicated with complete integrity in your own ClickHouse® database. Now let's make this data earn its worth.

Visualize the data

After migrating the data to your Managed ClickHouse® cluster, it's time to start building data displays. This process consists of the following steps:

  1. Create a workbook.

    In Visualization it works as a container for all other visualization elements.

  2. Create a connection

    This tool connects to your Managed ClickHouse® cluster to fetch and update the data.

  3. Create a dataset based on the data from your connection and prepare it for visualization.

  4. Create a dashboard

    This tool provides a canvas on which to place and organize charts.

  5. Create charts and selectors to visualize your data.

Create a workbook

Workbooks are visualization environments for data in your projects.

  1. Go to your project's Visualization page.

  2. Click Create a workbook in the upper-right corner of the page.

  3. Name your workbook Booking Demo Workbook and click Create.

    You'll see the workbook page:

    workbook-ready

Let's fill in all its sections.

Create a connection

Connections access hosts in your cluster to fetch data for processing and visualization.

  1. Click CreateConnection in the upper-right corner of your workbook page.

  2. Select ClickHouse®.

  3. Specify the connection parameters:

    • Hostname. Copy the name of a host from the Hosts tab on your cluster's information page.

    • Username and Password from the Overview tab on the information page of your cluster.

    • Leave all the other fields with their default values.

  4. Click Check connection.

    If your connection is operational, you will see a green icon. A red icon and a popup message with information about the error will appear if there's a problem.

    Your connection parameters should look like this:

    workbook-connection-form

  5. Click Create connection in the upper-right corner of the page.

  6. In the dialog, name the connection ClickHouse DB and click Create.

Create a dataset

Datasets process data from your connections and prepare them for visualization.

Based on our tutorial, create one or more datasets:

  1. In the upper-right corner of your connection page, click Create dataset.

  2. In the Connections section, you'll see your new connection, and under it - the table you can use to create a dataset.

  3. Drag and drop the sample_clickhouse_db.bookings table to the workspace on the right.

  4. You can see the preview of the resulting dataset in the Preview section below.

    Your dataset configuration should look similar to this:

    dataset-form

  5. Click Save in the upper-right corner of your dataset page.

  6. In the dialog, name the dataset Bookings Dataset and click Create.

Prepare your dataset for visualization

Use calculated fields and aggregation to extract more value from your data.

  1. In the top-left corner of your dataset page, select Fields.

    Let's look at the list of fields we've got here. The first column is Row_num - this isn't the best option for a sorting field.

    What's the best sorting parameter for a bookings dataset? It should be some sort of time indicator

    Where do we get the appropriate values? Let's look at the dataset again. There is the booking_start_date field. It's a timestamp with the standard YYYY-MM-DD structure transferred as a string. We need to parse it into a more useful data type.

  2. Create a calculated field to extract the year number from this data column:

    1. Click + Add field button in the top right to create a calculated field.

    2. Under Field settings specify the name for the field: Booking Start Date.

    3. Use the DATE_PARSE function in the formula:

      DATE_PARSE([booking_start_date])
      
    4. Click Create in the lower-right corner.

      You will see your new calculated field on the top of the dataset field list. It will have the icon in the Field source column:

      calculated-field-year

    5. Hide the Row_num column since we won't use it:

      1. Hover the cursor over Row_num in the list of dataset fields.

      2. Click the icon.

    The field is now hidden and will be unavailable for visualization. It as also disappeared from the Preview section of the dataset.

  3. Click Save in the upper-right corner of the dataset page to save changes.

Now the data is ready for visualization. Let's get right to it.

Create a dashboard

A dashboard is a customizable data display that allows you to create and arrange charts and selectors.

  1. Click Booking Demo Workbook in the breadcrumbs.

  2. On your workbook page, click CreateDashboard in the upper-right corner of your workbook page.

  3. Call it Booking Dashboard and click Create.

    Your dashboard will open in edit mode. You're ready to start populating it with tabs and charts.

Create charts and selectors

To get you familiar with the most popular chart types in DoubleCloud Visualization, we'll create the following:

We will also add selectors to work as filters for your data visualizations.

Create a stacked area chart

A Stacked area chart is good at showing how value changes over time.

  1. Click AddChart (Wizard).

  2. Select Bookings Dataset from the list.

  3. Click the chart type field on the right of the dataset section and select Stacked area chart from the dropdown menu.

  4. Drag and drop Measures and Dimensions to the appropriate sections of the chart:

    • Booking Start Date to X - this provides the timeline over which the data was changing.

    • Price to Y - these are the value changes we want to visualize.

    Your chart fields configuration should look like this:

    dashboard-stacked-area-chart-fields

  5. Now let's change the default blue color of the chart into something with a bit more zest:

    1. Hover your cursor over the Colors field and click the icon.

    2. In the popup window, select a different color for the Price sum field.

    3. Click Apply.

  6. Click Save in the upper-right corner of the page.

  7. Let's call our chart Price and click Done.

    Your stacked area chart should look similar to this:

    stacked_area_chart

  8. Choose the right location for your chart by dragging it around and resizing it to your liking.

Tip

If you want to practice creating area charts a bit more, create a similar chart, but make it about Average price - choose the Price avg field for your Y field.

Create a bar chart

A Bar chart allows you to compare the same parameter across several categories. For example, we'd like to know the number of bookings in each borough and compare them at a glance. Let's make it happen:

  1. Click AddChart (Wizard).

  2. Select Bookings Dataset from the list.

  3. Click the chart type field on the right of the dataset section and select Bar chart from the drop-down list.

  4. Drag and drop Measures and Dimensions to the appropriate sections of the chart:

    • borough to Y - these are our categories.

    • booking_id_ to X - this is the parameter we want to compare between the categories.

    • borough to Colors - this will paint horizontal bars in different colors to make the data easier to read.

    • booking_id_ to Signatures - this will write the numeric values inside the bars.

    Your chart fields configuration should look like this:

    dashboard-bar-chart-fields

  5. Let's call our chart Bookings by Borough and click Done.

  6. Choose the right location for your chart by dragging it around and resizing it to your liking.

    Your bar chart should look similar to this:

    bar_chart_simple

Tip

If you want to practice creating bar charts, create a similar chart, but make it about ratings rather than bookings - choose the rating_overall field for your X and Signature fields.

Create an indicator

An indicator shows a numeric value or a sum of values in large letters.

  1. Click AddChart (Wizard).

  2. Select Bookings Dataset from the list.

  3. Click the chart type field on the right of the dataset section and select Indicator from the drop-down list.

  4. Drag and drop host_id to the Measure field.

    This will automatically sum up all the values in the field. However, we need to count the number of unique IDs, so we'll have to create a calculated field for this indicator.

  5. Click the icon to the right of the field name in the wizard slot.

  6. In the Field settings window, let's rename the field into hosts count.

  7. Paste the following formula into the code field below:

    COUNT(INT([host_id]))
    
  8. Click Save.

  9. Let's customize the appearance of our indicator:

    1. Hover your cursor over the Measure field and click the icon.

    2. In the popup window, select the S size and choose a different color.

    3. Click Apply.

  10. Click Save in the upper-right corner of the page.

    Your indicator should look similar to this:

    indicator_chart

  11. Let's call our chart Hosts Count Indicator and click Done.

Tip

If you want to practice creating indicators, create a similar chart, but choose the booking_id field for your Measure field and select a different color for the numbers.

Add a selector

Selectors can work as filters for your charts and give you additional analytical flexibility.

  1. Click AddSelector.

  2. Keep the Based on dataset mode selected.

  3. From Dataset, select Bookings dataset.

  4. From Fields, select property_type.

  5. In the Selector type field, keep List.

  6. Under Operation, select Belongs to a set from the dropdown menu.

  7. Toggle Multiple choice on.

  8. In the Title field, enter Property type.

  9. Click Save

Now you can filter your charts by one or more property types. Give it a try.

Tip

If you want to practice creating selectors, create a similar one, but choose Borough under Field.

After working with charts, click Save in the upper-right corner of the page to save the changes to the dashboard.

Below you can see a tab with sample data and layout:

sample-dashboard

Add a sample dashboard to any project

You can add this sample dashboard to any of your projects - click Add demo workbook in the upper-right corner of your All workbooks page.

This concludes the tutorial. There's a lot more you can do with DoubleCloud. Keep exploring!

See also:

Managed Service for ClickHouse®:

Transfer:

Visualization: