Transfer is a tool that allows you to copy or replicate data between databases and stream processing services by creating endpoints and configuring transfers between them.
This tutorial transfers a CSV file from an Amazon S3 bucket to a Managed ClickHouse® cluster.
Before you start
Create a service account:
Go to the Service accounts tab of the Members page in the console. You'll see the following dialog:
Name your service account.
From the drop-down menu, select the Admin user role - we will need both read and write access.
Click Submit. You'll see your new service account appear on the list.
Issue an API key for your service account:
Go to the Service accounts tab of the Members page in the console.
Open the information page of the service account for which you want to create an API key.
Under API keys, click Create key to create you account's first Secret key. You'll see the following dialog:
Click Download file with keys. You'll use it to authenticate API requests.
This tutorial shows how to use a CLI client with Docker and with clickhouse-client on DEB-based and RPM-based Linux distributions. You can also use other tools of your choice.
Good work. Now we've created an endpoint that will receive and write the data to your ClickHouse® database. All we need now is the service that will connect both endpoints and transfer the data.
Create a transfer
This is the service that activates the transfer process through the data pipeline. It will connect your endpoints and ensure the integrity of the data.
API
Python
Let's create a transfer using the TrasnferServicecreate method with the following parameters:
source_id - the endpoint ID for the source endpoint.
To find the endpoint ID, get a list of endpoints in the project.
target_id - the endpoint ID for the target endpoint.
name - the transfer name, transfer-quickstart.
project_id - the ID of the project in which you create a transfer. You can get this value on your project's information page.
Now, let's activate it using the TransferServiceactivate method and pass the transfer ID in the transfer_id request parameter.
To find the transfer ID, get a list of transfers in the project.
import doublecloud
from doublecloud.transfer.v1.transfer_pb2 import TransferType
from doublecloud.transfer.v1.transfer_service_pb2 import ActivateTransferRequest
from doublecloud.transfer.v1.transfer_service_pb2_grpc import TransferServiceStub
defactivate_transfer(svc, transfer_id: str):return svc.Activate(ActivateTransferRequest(transfer_id=<your_transfer_id>))
Query the data in the Managed ClickHouse® cluster
Check the data transferred to your ClickHouse® database:
Open the terminal you used to create the database or connect to your cluster once again as we did above:
Docker
Native clickhouse-client
docker run --network host --rm -it clickhouse/<Native interface connection string>
The complete Docker command structure
docker run --network host --rm -it \
clickhouse/clickhouse-client \
--host <FQDN of your cluster> \
--secure \
--user <cluster user name> \
--password <cluster user password> \
--port 9440
<Native interface connection string>
Send the following query to check if your data exists in the cluster. The name of a table in the db_for_s3 corresponds to the name of the source dataset (hits).
SELECT*FROM db_for_s3.hits
The terminal readout should display the following data:
Nice work! You have all the data transferred from a remote source and replicated with complete integrity in your own ClickHouse® database. Now let's make this data earn its worth.