Migrate a standalone ClickHouse® installation to DoubleCloud Managed Service for ClickHouse® using Transfer
To perform your cluster migration using Transfer, you need to complete the following:
-
Create a source endpoint
-
Create a target endpoint
-
Create a transfer
-
Perform post-migration activities
Before you start
- Create a Managed Service for ClickHouse® cluster.
Create a source endpoint
For each database you want to migrate from your standalone cluster, create a source endpoint as follows:
-
Go to the Transfer service page
-
Select the Endpoints tab, click Create endpoint → Source.
-
Under Source type, choose
ClickHouse
. -
Under Basic settings:
-
Enter the Name of the endpoint.
-
(optional) Enter the Description of the endpoint.
-
-
Under Connection settings → Connection type, select
Custom installation
. -
Under Shards:
-
Click Add Shard.
-
Specify Shard ID identifier;
-
Under Hosts, click Add Host and enter the domain name (FQDN) or IP-address of the host in the textbox.
-
-
Specify the HTTP Port for HTTP interface connections.
Tip
-
Optional fields have default values if these fields are specified.
-
Complex types recording is supported (
array
,tuple
etc).
-
-
Specify the Native port for clickhouse-client
-
Click SSL to secure your connection.
-
To encrypt the data transmission, click Upload file under CA Certificate.
-
Specify the Database name from which to migrate the data.
-
Specify the User of the above database.
-
Enter the Password for the database user.
-
Under Table filter → Included tabes, specify all the tables from the database you want to migrate.
There are two points to keep in mind when creating a list of tables to migrate:
-
Transfer can't migrate materialized views
-
If you want to migrate tables created with the CollapsingMergeTree engine, ensure that to merges happen on the table during transfer.
-
- Click Submit.
Create a target endpoint
Create a single target endpoint for all the databases you want to migrate:
-
On the Transfer page, select the Endpoints tab, click Create endpoint → Target.
-
Under Source type, choose
ClickHouse
. -
Under Basic settings:
-
Enter the Name of the endpoint.
-
(optional) Enter the Description of the endpoint.
-
-
Under Connection settings → Connection type, select
Managed cluster
. -
Under Managed cluster, select the DoubleCloud cluster to which you want to migrate your data.
-
Specify the Database name to which you want to migrate data.
-
Specify the User of the above database.
-
Enter the Password for the database user.
-
Keep the rest of the settings empty or with their default values and click Submit.
Create a transfer
Create a transfer for each database you want to migrate:
-
On the Transfer page, click Create transfer.
-
Under Endpoints:
-
In Source, select the endpoint to the self-managed database you want to migrate.
-
In Target, select the endpoint to the DoubleCloud Managed Service for ClickHouse® cluster.
-
-
Under Basic settings:
-
Enter the transfer Name.
-
(optional) Enter the transfer Description.
-
-
Under Transfer parameters:
-
If you need a periodic migration of data, under Periodic snapshot, toggle Enabled and select a period or provide a cron expression
-
Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
-
Schema corresponds to the database schema (as in PostgreSQL), name, or dataset. Refer to the Airbyte - destination namespace configuration
-
Table in the source with which to compare your target table.
-
Key column is the name of a column that contains some value that indicates whether the table should be incremented. A common example of a column that contains a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append
-
Initial value (optional) defines a value in the Key column based on which to start tracking changes. For example, you can specify a date as the Initial value. In this case, the service will transfer the rows with a date value greater than the start value.
-
-
Under Parallel snapshot settings, configure the number of workers and threads per worker.
High-load transfer configuration
In high-load production cases, consider adding workers if you need to transfer more than 100 GB of data to snapshot. Adding a worker doubles the transfer's performance.
-
Select the runtime environment:
ServerlessDedicatedThis runtime has no configurable settings. It will define all its parameters automatically.
-
Select the VM size. In most cases,
Medium
is enough for responsive snapshotting. -
Choose the logging level:
-
DEBUG
: information that can be useful for debugging -
INFO
(default): confirmation messages, information about events that mustn't cause errors in the transfer -
WARN
: events that can cause errors in the transfer.
-
-
Under Dedicated settings, select the settings type:
-
Automatic doesn't require any further settings.
-
Under manual, select your network from the dropdown list.
-
-
-
-
Click Submit.
Activate the transfer
-
Open the Transfer
-
To the right of the transfer name, click
Post-migration
After migrating the data, you need to recreate all materialized views manually (you need to add the ON CLUSTER
default at the source):
SELECT create_table_query
FROM system.tables
WHERE (lower(database)
NOT IN ('system', 'information_schema'))
AND (engine IN ('MaterializedView', 'View', 'Dictionary'))