Migrate a standalone ClickHouse® installation to DoubleCloud using clickhouse-client

This tutorial shows how to migrate your self-managed ClickHouse® cluster to the DoubleCloud Managed Service for ClickHouse®.

Before you start

To migrate your cluster, first install the clickhouse-client :

  1. Install and configure the software:

    1. Connect to the ClickHouse® official DEB repository from your Linux system:

      sudo apt update && sudo apt install -y apt-transport-https ca-certificates dirmngr && \
      sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 && \
      echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \
      /etc/apt/sources.list.d/clickhouse.list
      
    2. Refresh the package list and install the clickhouse-client :

      sudo apt update && sudo apt install -y clickhouse-client
      
  2. Connect to your ClickHouse® host. You can copy the value of the Native Interface field from the Overview tab on your cluster information page. It has the following structure:

    clickhouse-client --host <FQDN of the host> \
                      --secure \
                      --user <Cluster user name> \
                      --password <Cluster user password> \
                      --port 9440 
    
  1. Install and configure the software:

    1. Connect to a ClickHouse® official RPM repository from your Linux system:

      sudo yum install -y yum-utils
      sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
      
    2. Install the clickhouse-client :

      sudo yum install -y clickhouse-client
      

      Warning

      If you run a RedHat 7-based Linux distribution, including Cent OS 7, Oracle Linux 7 and others, you need to download and install trusted certificates, and manually add the path to them in the clickhouse-client configuration file as follows:

      1. Install the root certificate:

        curl https://letsencrypt.org/certs/isrg-root-x2-cross-signed.pem > \ 
        /etc/pki/ca-trust/source/anchors/isrg-root-x2-cross-signed.pem
        
      2. Install the intermediate certificate:

        curl https://letsencrypt.org/certs/lets-encrypt-r3-cross-signed.pem > \
        /etc/pki/ca-trust/source/anchors/lets-encrypt-r3-cross-signed.pem
        
      3. Update the list of trusted certificates:

        sudo update-ca-trust
        
      4. Locate your clickhouse-client configuration file (by default, you can find it at /etc/clickhouse-client/config.xml) and add the path to the certificates into the <openSSL> section:

        <client> <!-- Used for connection to server's secure tcp port -->
           <loadDefaultCAFile>true</loadDefaultCAFile>
           <cacheSessions>true</cacheSessions>
           <disableProtocols>sslv2,sslv3</disableProtocols>
           <preferServerCiphers>true</preferServerCiphers>
           <caConfig>/etc/ssl/certs/ca-bundle.crt</caConfig>
           <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
           <invalidCertificateHandler>
           <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
           <name>RejectCertificateHandler</name>
           </invalidCertificateHandler>
        </client>
        
  2. Connect to your ClickHouse® host. You can copy the value of the Native Interface field from the Overview tab on your cluster information page. It has the following structure:

    clickhouse-client --host <FQDN of the host> \
                      --secure \
                      --user <Cluster user name> \
                      --password <Cluster user password> \
                      --port 9440 
    

Prepare your self-managed cluster

If your self-managed ClickHouse® cluster has tables created with the CollapsingMergeTree engine, you need to disable automatic merges. This will protect you from possible data corruption on the target:

  1. Disable automatic merges by executing the following query:

    SYSTEM STOP MERGES ON CLUSTER 'default' <database-name>.<table-name>
    
  2. Perform the manual merge:

    OPTIMIZE TABLE <database-name>.<table-name> FINAL DEDUPLICATE
    

    The FINAL statement merges all the table parts into one, and the DEDUPLICATE statement removes repeating rows during the merge to avoid possible data corruption.

Arrive at the cluster configuration in DoubleCloud

When migrating a self-managed ClickHouse® cluster to DoubleCloud which is already sharded, we strongly recommend using a single-shard configuration whenever possible.

With hybrid storage enabled, sharding isn't necessary when the only reason is the amount of data. Hybrid storage can handle tens of terabytes per node with EAS.

Therefore, sharding on DoubleCloud is only needed if a single node can't cover your writes anymore.

In case of read performance instead, increase the number of replicas and upgrade your cluster to a better hardware configuration.

If the primary reason for sharding is to improve the write performance, it's important that you choose a sharding key that fits your use case. To select the best key option for your use case, see the official ClickHouse® documentation .

To handle large amounts of data, consider splitting it into manageable chunks when setting up a multi-shard cluster. Consult the official ClickHouse® documentation on data partitioning

Prepare the DoubleCloud cluster

Recreate the DDL structure of the cluster you want to migrate on the DoubleCloud side:

  1. Connect to your cluster and recreate create a database:

    1. Go to the Clusters page in the console.

    2. Select the name of your cluster to open its information page. By default, you will see the Overview tab.

    3. Under Connection strings, find the Native interface string and click Copy.

    4. Open your terminal and run a command to connect to your cluster:

      <Native interface connection string>
      
    5. After you have connected to the cluster, run the CREATE query for each database you want to recreate on the target cluster:

      CREATE DATABASE IF NOT EXISTS <database_name> ON CLUSTER default
      
  2. Manually recreate at the target database each table you want to migrate. To get the CREATE query for all the tables within a database, run the following on your source ClickHouse® cluster:

    SELECT create_table_query FROM system.tables WHERE database NOT IN ('system', 'information_schema’) AND (engine NOT IN ('MaterializedView', 'View', 'Dictionary'))
    

    You need to add the ON CLUSTER default to every CREATE statement at the DoubleCloud target cluster.

    Migrating materialized views

    You can't directly migrate your materialized views .

    Migrate the data from source and target tables manually, and after the migration, recreate the materialized views on the target system.

  3. Add the IP address of your source cluster to your Managed Service for ClickHouse® allowlist.

Migrate your data

Sharded cluster migration cases

  • If you are using a sharded setup as source, make sure that you only SELECT from distributed tables .

  • If your DoubleCloud setup is also sharded, INSERT into distributed tables to automatically rebalance all the target shards while migrating.

  • If your DoubleCloud isn't sharded, SELECT from distributed tables on the source and INSERT into replicated tables on the target.

To migrate your table, run the following query on the source cluster:

INSERT INTO FUNCTION remoteSecure (
   '<target-clickhouse-host>',
   database_name, 
   table_name,
   '<your-cluster-username>', 
   '<your-cluster-password>'s
)
SELECT *
FROM database_name.table_name;

To get the target-clickhouse-host:

  1. Go to your cluster's information page and open the Hosts tab.

  2. Find the host to which you want to migrate your data and click .

Post-migration

After migrating the data, you need to recreate all materialized views manually (you need to add the ON CLUSTER default at the source):

SELECT create_table_query
FROM system.tables
WHERE (lower(database) 
NOT IN ('system', 'information_schema')) 
AND (engine IN ('MaterializedView', 'View', 'Dictionary'))

See also