Get started with Transfer

DoubleCloud Transfer is a no-code data integration solution that you can use to extract, load, and transform data. You can use Transfer to migrate data to a cloud database, replicate your data, or optimize it for real-time analytics.

This tutorial guides you through the steps of configuring a data transfer from an Amazon S3 bucket to a Managed ClickHouse® cluster.

Before you begin

  1. Log in or sign up to the DoubleCloud console .

    Note

    If you’re a new DoubleCloud user, this tutorial won’t incur you any costs — you can use the trial period credits to test the platform, including creating fully operational clusters and transfers.

Step 1. Create a Managed ClickHouse® cluster

  1. Go to the Clusters page in the console and click Create cluster at the top right.

  2. Select ClickHouse.

    Tip

    The cluster creation page contains various options that allow you to configure the cluster for your needs. If you’re just testing DoubleCloud now, you can go with the default settings that will create a fully functional cluster with minimal resource configuration.

  3. Under basic settings, enter a cluster name, such as clickhouse-dev.

  4. Click Submit.

    Creating a cluster usually takes five to seven minutes depending on the cloud provider and region. When the cluster is ready, its status changes from Creating to Alive.

Step 2. Connect to the cluster and create a database

To connect to the cluster, you can use WebSQL — a DoubleCloud service that provides a full-fledged SQL editor in your browser tab. You can also connect using the ClickHouse client or any other tool of your choice.

  1. After the cluster status has changed to Alive, select it in the cluster list.

  2. Click WebSQL at the top right.

  3. In WebSQL, click on any database in the connection manager on the left to open the query editor.

  4. To create a database, enter the following query and click Execute:

    CREATE DATABASE IF NOT EXISTS data_from_s3 ON CLUSTER default
    
  5. Make sure that the database has been created:

    SHOW DATABASES
    
    ┌─name───────────────┐
    │ INFORMATION_SCHEMA │
    │ _system            │
    │ data_from_s3       │  // your database
    │ default            │
    │ information_schema │
    │ system             │
    └────────────────────┘
    

Step 3. Create a source endpoint

A source endpoint is a transfer component that connects to a remote storage and fetches data from it.

To create a source endpoint:

  1. Go to the Transfer page in the console and click CreateSource endpoint at the top right.

  2. In Source type, select Object storage.

  3. Under Basic settings, enter an endpoint name, such as parquet-source-dev and (optionally) a description.

  4. Under Endpoint parameters, specify the following settings:

    1. Bucket: doublecloud-docs.

      This is a public AWS S3 bucket, so you can leave all other settings in S3: Bucket Connection configuration blank. the bucket is public, leave the rest of the fields blank.

    2. Path pattern: data-sets/hits.parquet.

    3. Data format: Parquet.

    4. Schema: {}. Transfer will automatically infer the schema.

    5. Table: hits.

  5. (Optional) Test the connection.

    About connection testing
    1. Click Test connection.

    2. Select the runtime type you want Transfer to use for connecting to the database.

      • Dedicated: Transfer connects to the database using a specified internal or external network.

      • Serverless: Transfer connects to the database available from the internet using an automatically chosen network.

      Runtime compatibility warning

      Don't use endpoints with different runtime types in the same transfer — this will cause the transfer to fail.

    3. If you selected the dedicated runtime, select the network in the dropdown.

      Screenshot of the endpoint testing dialog

    4. Click Test connection.

    Testing the connection may take a few minutes.

  6. Click Submit.

Step 4. Create a target endpoint

A target endpoint is a transfer component that writes the transferred data to a database.

To create a ClickHouse® target endpoint:

  1. Go to the Transfer page in the console and click CreateTarget endpoint at the top right.

  2. In Target type, select ClickHouse.

  3. Under Basic settings, enter an endpoint name, such as clickhouse-target, and (optionally) a description.

  4. In Connection settingsConnection type, select Managed cluster.

  5. In Managed cluster, select clickhouse-dev from the dropdown list.

  6. In Authentication, select Default to connect to the cluster as the admin user.

  7. In Database, enter data_from_s3 — the name of the database you created earlier.

  8. Leave all the other fields blank or with their default values.

  9. Click Submit.

Step 5. Create and activate a transfer

Now that you have two endpoints, you need to create a transfer that passes data between them. To do that:

  1. Go to the Transfer page in the console and click CreateTransfer at the top right.

    1. Under Endpoints:

      1. In Source, select parquet-source-dev.

      2. In Target, select clickhouse-target.

    2. Under Basic settings, enter a transfer name, such as transfer-dev.

    3. In Transfer type, select Snapshot.

    4. Leave all the other settings blank or with their default values.

    5. Click Submit.

    6. Next to the transfer, click Activate.

    7. Wait until your transfer status changes to Done.

Step 6. Query the data in the ClickHouse® cluster

To access the data that’s been transferred to the ClickHouse® database, take the following steps:

  1. Go to the Clusters page in the console and select your cluster.

  2. In WebSQL, click on any database in the connection manager on the left to open the query editor.

  3. Select entries from the table using the following query. The table name in the data_from_s3 database corresponds to the name of the source dataset.

    SELECT * FROM data_from_s3.hits LIMIT 100
    

    The query output should look as follows:

    ┌─Browser─┬─Cookie_Enabled─┬─Date───────┬─Gender─┬─Hit_ID─┬─Region_ID─┬─Technology───────────┬─Time_Spent─────────┬─Traffic_Source─┐
    │ Firefox │              0 │ 2016-01-10 │ Male   │  67112 │       184 │ PC (Mac)             │ 388.93975903614455 │ Direct         │
    │ Chrome  │              0 │ 2016-03-12 │ Female │  54562 │        51 │ Smartphone (I0S)     │ 325.20392156862744 │ Search engine  │ 
    │ Chrome  │              1 │ 2016-03-18 │ Female │  63833 │        51 │ Smartphone (I0S)     │ 316.09774436090225 │ Search engine  │ 
    │ Firefox │              1 │ 2016-03-24 │ Male   │  43941 │        51 │ PC (Windows)         │  263.7365269461078 │ Search engine  │ 
    │ Safari  │              0 │ 2016-03-30 │ Female │  38583 │        51 │ Smartphone (Android) │  363.8421052631579 │ Internal       │
    ...
    

What’s next

Now that you've learned how to transfer an example dataset, continue exploring the DoubleCloud platform or configure Transfer to replicate and transform your data.

Previous