Migrate a standalone ClickHouse® installation to DoubleCloud using Transfer

To perform your cluster migration using Transfer, you need to complete the following:

Create a source endpoint
Create a target endpoint
Create a transfer
Perform post-migration activities

Before you start

Create a Managed Service for ClickHouse® cluster.

Create a source endpoint

For each database you want to migrate from your standalone cluster, create a source endpoint as follows:

Go to the Transfer page in the console.
Click Create → Source endpoint.
In Source type, select ClickHouse.
Under Basic settings:
- Enter the Name of the endpoint.
- (optional) Enter the Description of the endpoint.
In Connection settings → Connection type, select On-premise.
Under Shards:
1. Click + Shard and enter the shard ID.
2. Under Hosts, click + Host and enter the domain name (FQDN) or IP address of the host.
In HTTP Port, enter the port for HTTP interface connections or leave the default value of 8443.
Tip
- Optional fields have default values if these fields are specified.
- Complex types recording is supported (array, tuple, etc.).
In Native port, enter the port for clickhouse-client connections or leave the default value of 9440.
Enable SSL if you need to secure your connection.
To encrypt data transmission, upload a .pem file with certificates under CA Certificate.
In User and Password, enter the user credentials to connect to the database.
In Database, enter the name of the database you want to transfer data from.
In Table filter → Included tables, add all the tables you want to migrate from the database.

There are two points to keep in mind when creating a list of tables to migrate:
- Transfer can't migrate materialized views . Migrate the source tables and manually recreate materialized views on the target.
- If you want to migrate tables created with the CollapsingMergeTree engine, ensure that to merges happen on the table during transfer.

Click Submit.

Create a target endpoint

Create a single target endpoint for all the databases you want to migrate:

On the Transfer page, click Create → Target endpoint.
In Source type, select ClickHouse.
Under Basic settings:
- Enter the Name of the endpoint.
- (optional) Enter the Description of the endpoint.
In Connection settings → Connection type, select Managed cluster.
In Managed cluster, select the cluster you want to migrate your data to.
In Authentication, select Default to connect to the cluster as the admin user.
In Database, enter the name of the database you want to transfer your data to.
Keep the rest of the settings empty or with their default values and click Submit.

Create a transfer

Create a transfer for each database you want to migrate:

On the Transfer page, click Create transfer.
Under Endpoints:
1. In Source, select the endpoint to the self-managed database you want to migrate.
2. In Target, select the endpoint to the DoubleCloud Managed Service for ClickHouse® cluster.
Under Basic settings:
1. Enter the transfer Name.
2. (optional) Enter the transfer Description.
Under Transfer parameters:
1. If you need a periodic migration of data, under Periodic snapshot, toggle Enabled and select a period or provide a cron expression of how often to perform a snapshot.
2. Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
  - Schema corresponds to the database schema (as in PostgreSQL), name, or dataset. Refer to the Airbyte - destination namespace configuration to learn more about namespaces.
  - Table in the source with which to compare your target table.
  - Key column is the name of a column that contains some value that indicates whether the table should be incremented. A common example of a column that contains a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append to learn more about cursor definitions and usage examples.
  - Initial value (optional) defines a value in the Key column based on which to start tracking changes. For example, you can specify a date as the Initial value. In this case, the service will transfer the rows with a date value greater than the start value.
3. Under Parallel snapshot settings, configure the number of workers and threads per worker.
High-load transfer configuration

In high-load production cases, consider adding workers if you need to transfer more than 100 GB of data to snapshot. Adding a worker doubles the transfer's performance.
1. Select the runtime environment:
  Serverless
  
  Dedicated
  
  This runtime has no configurable settings. It will define all its parameters automatically.
  1. Select the VM size. In most cases, Medium is enough for responsive snapshotting.
  2. Choose the logging level:
    
    DEBUG: information that can be useful for debugging
    
    INFO (default): confirmation messages, information about events that mustn't cause errors in the transfer
    
    WARN: events that can cause errors in the transfer.
  3. Under Dedicated settings, select the settings type:
    
    Automatic doesn’t require any further settings.
    
    Under manual, select your network from the dropdown list.
Click Submit.

Activate the transfer

Open the Transfer page in the console.
To the right of the transfer name, click → Activate.

Post-migration

After migrating the data, you need to recreate all materialized views manually (you need to add the ON CLUSTER default at the source):

SELECT create_table_query
FROM system.tables
WHERE (lower(database) 
NOT IN ('system', 'information_schema')) 
AND (engine IN ('MaterializedView', 'View', 'Dictionary'))