Migrate a standalone ClickHouse® installation to DoubleCloud using Transfer
To perform your cluster migration using Transfer, you need to complete the following:
-
Create a source endpoint
-
Create a target endpoint
-
Create a transfer
-
Perform post-migration activities
Before you start
- Create a Managed Service for ClickHouse® cluster.
Create a source endpoint
For each database you want to migrate from your standalone cluster, create a source endpoint as follows:
-
Go to the Transfer page
-
Click Create → Source endpoint.
-
In Source type, select ClickHouse.
-
Under Basic settings:
-
Enter the Name of the endpoint.
-
(optional) Enter the Description of the endpoint.
-
-
In Connection settings → Connection type, select On-premise.
-
Under Shards:
-
Click + Shard and enter the shard ID.
-
Under Hosts, click + Host and enter the domain name (FQDN) or IP address of the host.
-
-
In HTTP Port, enter the port for HTTP interface connections or leave the default value of
8443
.Tip
-
Optional fields have default values if these fields are specified.
-
Complex types recording is supported (
array
,tuple
, etc.).
-
-
In Native port, enter the port for clickhouse-client
9440
. -
Enable SSL if you need to secure your connection.
-
To encrypt data transmission, upload a
.pem
file with certificates under CA Certificate. -
In User and Password, enter the user credentials to connect to the database.
-
In Database, enter the name of the database you want to transfer data from.
-
In Table filter → Included tables, add all the tables you want to migrate from the database.
There are two points to keep in mind when creating a list of tables to migrate:
-
Transfer can't migrate materialized views
-
If you want to migrate tables created with the CollapsingMergeTree engine, ensure that to merges happen on the table during transfer.
-
- Click Submit.
Create a target endpoint
Create a single target endpoint for all the databases you want to migrate:
-
On the Transfer page, click Create → Target endpoint.
-
In Source type, select ClickHouse.
-
Under Basic settings:
-
Enter the Name of the endpoint.
-
(optional) Enter the Description of the endpoint.
-
-
In Connection settings → Connection type, select Managed cluster.
-
In Managed cluster, select the cluster you want to migrate your data to.
-
In Authentication, select Default to connect to the cluster as the
admin
user. -
In Database, enter the name of the database you want to transfer your data to.
-
Keep the rest of the settings empty or with their default values and click Submit.
Create a transfer
Create a transfer for each database you want to migrate:
-
On the Transfer page, click Create transfer.
-
Under Endpoints:
-
In Source, select the endpoint to the self-managed database you want to migrate.
-
In Target, select the endpoint to the DoubleCloud Managed Service for ClickHouse® cluster.
-
-
Under Basic settings:
-
Enter the transfer Name.
-
(optional) Enter the transfer Description.
-
-
Under Transfer parameters:
-
If you need a periodic migration of data, under Periodic snapshot, toggle Enabled and select a period or provide a cron expression
-
Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
-
Schema corresponds to the database schema (as in PostgreSQL), name, or dataset. Refer to the Airbyte - destination namespace configuration
-
Table in the source with which to compare your target table.
-
Key column is the name of a column that contains some value that indicates whether the table should be incremented. A common example of a column that contains a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append
-
Initial value (optional) defines a value in the Key column based on which to start tracking changes. For example, you can specify a date as the Initial value. In this case, the service will transfer the rows with a date value greater than the start value.
-
-
Under Parallel snapshot settings, configure the number of workers and threads per worker.
High-load transfer configuration
In high-load production cases, consider adding workers if you need to transfer more than 100 GB of data to snapshot. Adding a worker doubles the transfer's performance.
-
Select the runtime environment:
ServerlessDedicatedThis runtime has no configurable settings. It will define all its parameters automatically.
-
Select the VM size. In most cases,
Medium
is enough for responsive snapshotting. -
Choose the logging level:
-
DEBUG
: information that can be useful for debugging -
INFO
(default): confirmation messages, information about events that mustn't cause errors in the transfer -
WARN
: events that can cause errors in the transfer.
-
-
Under Dedicated settings, select the settings type:
-
Automatic doesn’t require any further settings.
-
Under manual, select your network from the dropdown list.
-
-
-
-
Click Submit.
Activate the transfer
-
Open the Transfer
-
To the right of the transfer name, click
Post-migration
After migrating the data, you need to recreate all materialized views manually (you need to add the ON CLUSTER
default at the source):
SELECT create_table_query
FROM system.tables
WHERE (lower(database)
NOT IN ('system', 'information_schema'))
AND (engine IN ('MaterializedView', 'View', 'Dictionary'))