ClickHouse® connector

You can use the ClickHouse® connector in both source and target endpoints. In source endpoints, the connector retrieves data from ClickHouse® databases. In target endpoints, it inserts data to ClickHouse® databases.

Source endpoint

To configure a source ClickHouse® endpoint, take the following steps:

  1. In Connection type, select the type of the ClickHouse® cluster you want to connect to.

  2. Configure the connection:

    Managed cluster

    Connect to a ClickHouse® cluster deployed in DoubleCloud.

    1. In Managed Cluster, select the cluster you want to connect to.

    2. In Authentication, select Default if you want to connect as the admin user.

      If you want to connect as a different user, select Password and enter user credentials.

    On-premise

    Connect to an on-premise ClickHouse® installation.

    1. Under Shards:

      1. Click + Shard and enter the shard ID.

      2. Under Hosts, click + Host and enter the domain name (FQDN) or IP address of the host.

    2. In HTTP Port, enter the port for HTTP interface connections or leave the default value of 8443.

      Tip

      • Optional fields have default values if these fields are specified.

      • Complex types recording is supported (array, tuple, etc.).

    3. In Native port, enter the port for clickhouse-client connections or leave the default value of 9440.

    4. Enable SSL if you need to secure your connection.

    5. To encrypt data transmission, upload a .pem file with certificates under CA Certificate.

    6. In User and Password, enter the user credentials to connect to the database.

  3. In Database, enter the name of the database you want to transfer data from.

  4. Under Table filter, add tables you want to include or exclude:

    • Included tables: Transfer will transfer data only from these tables.

    • Excluded tables: Data from these tables won’t be transferred.

To create a ClickHouse® source endpoint with the API, use the endpoint.ClickhouseSource model.

ClickHouse® type Transfer type
Int64 int64
Int32 int32
Int16 int16
Int8 int8
UInt64 uint64
UInt32 uint32
UInt16 uint16
UInt8 uint8
float
Float64 double
FixedString, String string
IPv4, IPv6, Enum8, Enum16 utf8
boolean
Date date
DateTime datetime
DateTime64 timestamp
REST... any

Target endpoint

To configure a target ClickHouse® endpoint, take the following steps:

  1. In Connection type, select the type of the ClickHouse® cluster you want to connect to.

  2. Configure the connection:

    Managed cluster

    Connect to a ClickHouse® cluster deployed in DoubleCloud.

    1. In Managed Cluster, select the cluster you want to connect to.

    2. In Authentication, select Default if you want to connect as the admin user.

      If you want to connect as a different user, select Password and enter user credentials.

    On-premise

    Connect to an on-premise ClickHouse® installation.

    1. Under Shards:

      1. Click + Shard and enter the shard ID.

      2. Under Hosts, click + Host and enter the domain name (FQDN) or IP-address of the host.

    2. Enter HTTP Port for HTTP interface connections leave the default value of 8443.

      Tip

      • Optional fields have default values if these fields are specified.

      • Complex types recording is supported (array, tuple, etc.).

    3. In Native port, enter the port for clickhouse-client connections or leave the default value of 9440.

    4. Enable SSL if you need to secure your connection.

    5. To encrypt data transmission, upload a .pem file with certificates under CA Certificate.

    6. In User and Password, enter the user credentials to connect to the database.

  3. In Database, enter the name of the database you want to transfer data to.

  4. Select a Cleanup policy to specify how data in the target database is cleaned up when a transfer is activated, reactivated, or reloaded.

    Cleanup policy reference
    • Disabled: Don’t clean. Select this option if you only perform replication without copying data.

    • Drop (default): Fully delete the tables included in the transfer. Use this option to always transfer the latest version of the table schema to the target database from the source.

    • Truncate: Execute the TRUNCATE command for the target table each time you run a transfer.

  5. In Sharding settings, select how you want to split the data between shards.

    Sharding by column value
    1. Enter the name of a column whose values are used as the row sharding key.

    2. (Optional) If you want to map a column value to a specific shard, click + Mapping. Enter the column value and the shard name.

      By default, the target endpoint uses automatic sharding.

    Sharding by transfer ID
    1. (Optional) If you want to map a transfer ID to a specific shard, click + Mapping. Enter the transfer ID and the shard name.

      By default, the target endpoint uses automatic sharding.

    The No sharding and Uniform random sharing options have no configurable parameters.

  6. (Optional) Configure table renaming:

    1. Under Advanced settingsRename tables, click + Table.

    2. Enter the source and target table names.

      If the Target endpoint has the table with the same name, the data will be written into the existing table.

    Merge multiple tables with table renaming

    You can merge data from several source tables into a single one at your transfer target. To do that, create several Rename table entries with different Source table name values and the same Target table name values.

    The data schemas in the source and target tables must be the same.

  7. In Flush interval, specify a desired value or leave the default of 1 second.

To create a ClickHouse® target endpoint with the API, use the endpoint.ClickhouseTarget model.

Transfer type ClickHouse® type
int64 Int64
int32 Int32
int16 Int16
int8 Int8
uint64 UInt64
uint32 UInt32
uint16 UInt16
uint8 UInt8
float Float64
double Float64
string String
utf8 String
boolean UInt8
date Date
datetime DateTime
timestamp DateTime64
any String

See also