DoubleCloud Transfer is a no-code data integration solution that you can use to extract, load, and transform data.
You can use Transfer to migrate data to a cloud database,
replicate your data,
or optimize it for real-time analytics.
This tutorial guides you through the steps of configuring a data transfer
from an Amazon S3 bucket to a Managed ClickHouse® cluster.
If you’re a new DoubleCloud user,
this tutorial won’t incur you any costs —
you can use the trial period credits to test the platform,
including creating fully operational clusters and transfers.
Step 1. Create a Managed ClickHouse® cluster
Go to the Clusters page
in the console
and click Create cluster at the top right.
Select ClickHouse.
Tip
The cluster creation page contains various options that allow you to configure the cluster for your needs.
If you’re just testing DoubleCloud now, you can go with the default settings
that will create a fully functional cluster with minimal resource configuration.
Under basic settings, enter a cluster name, such as clickhouse-dev.
Click Submit.
Creating a cluster usually takes five to seven minutes depending on the cloud provider and region.
When the cluster is ready, its status changes from Creating to Alive.
Step 2. Connect to the cluster and create a database
To connect to the cluster,
you can use WebSQL — a DoubleCloud service that provides a full-fledged SQL editor in your browser tab.
You can also connect using the ClickHouse client or any
other tool of your choice.
After the cluster status has changed to Alive, select it in the cluster list.
Click WebSQL at the top right.
In WebSQL, click on any database in the connection manager on the left to open the query editor.
To create a database, enter the following query and click Execute:
CREATE DATABASE IF NOTEXISTS data_from_s3 ON CLUSTER default
A source endpoint is a transfer component that connects to a remote storage and fetches data from it.
To create a source endpoint:
Go to the Transfer
page in the console and click Create → Source endpoint at the top right.
In Source type, select Object storage.
Under Basic settings, enter an endpoint name, such as parquet-source-dev and (optionally) a description.
Under Endpoint parameters, specify the following settings:
Bucket: doublecloud-docs.
This is a public AWS S3 bucket,
so you can leave all other settings in S3: Bucket Connection configuration blank. the bucket is public, leave the rest of the fields blank.
Path pattern: data-sets/hits.parquet.
Data format: Parquet.
Schema: {}.
Transfer will automatically infer the schema.
Table: hits.
(Optional) Test the connection.
About connection testing
Click Test connection.
Select the runtime type you want Transfer to use for connecting to the database.
Dedicated:
Transfer connects to the database using a specified
internal or
external network.
Serverless:
Transfer connects to the database available from the internet using an automatically chosen network.
Runtime compatibility warning
Don't use endpoints with different runtime types
in the same transfer — this will cause the transfer to fail.
If you selected the dedicated runtime,
select the network in the dropdown.
Click Test connection.
Testing the connection may take a few minutes.
Click Submit.
Step 4. Create a target endpoint
A target endpoint is a transfer component that writes the transferred data to a database.
To create a ClickHouse® target endpoint:
Go to the Transfer page in the console
and click Create → Target endpoint at the top right.
In Target type, select ClickHouse.
Under Basic settings, enter an endpoint name, such as clickhouse-target, and (optionally) a description.
In Connection settings → Connection type, select Managed cluster.
In Managed cluster, select clickhouse-dev from the dropdown list.
In Authentication, select Default to connect to the cluster as the admin user.
In Database, enter data_from_s3 — the name of the database you created earlier.
Leave all the other fields blank or with their default values.
Click Submit.
Step 5. Create and activate a transfer
Now that you have two endpoints,
you need to create a transfer that passes data between them.
To do that:
Go to the Transfer
page in the console and click Create → Transfer at the top right.
Under Endpoints:
In Source, select parquet-source-dev.
In Target, select clickhouse-target.
Under Basic settings, enter a transfer name, such as transfer-dev.
In Transfer type, select Snapshot.
Leave all the other settings blank or with their default values.
Click Submit.
Next to the transfer, click
→ Activate.
Wait until your transfer status changes to Done.
Step 6. Query the data in the ClickHouse® cluster
To access the data that’s been transferred to the ClickHouse® database, take the following steps:
Go to the Clusters page
in the console
and select your cluster.
In WebSQL, click on any database in the connection manager on the left to open the query editor.
Select entries from the table using the following query.
The table name in the data_from_s3 database corresponds to the name of the source dataset.
Now that you've learned how to transfer an example dataset,
continue exploring the DoubleCloud platform
or configure Transfer to replicate and transform your data.