Manage transfers

Create a transfer

  1. Open the Transfer page in the console.

  2. Click Create transfer.

  3. Under Endpoints:

    1. Select the source endpoint in Source or create a new one.

    2. Select the target endpoint in Target or create a new one.

  4. Under Basic settings:

    1. Enter the transfer Name.

    2. (optional) Enter the transfer Description.

  5. Under Transfer Parameters, select a Transfer type. The list of possible transfer types depends on the type of source and destination endpoints:

    Snapshot

    Makes a one-time transfer of the database snapshot.

    1. Under Snapshot settingsParallel snapshot settings, configure the transfer performance:

      1. Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.

      2. Threads count specifies the number of processes within each container. You can run up to 10 processes.

      The average download speed of data transfers is between 1 and 10 mb/s.

    2. Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:

      • Schema corresponds to the database schema (as in PostgreSQL), name, or dataset.

      • Table in the source to compare with your target table.

      • Key column is the name of a column that contains some value (a cursor) that indicates whether to increment the table. A common example of a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append to learn more about cursor definitions and usage examples.

      • Start value (Optional) defines a value in the Key column based on which to track the changes. For example, you can specify a date as the Start value. In this case, the service will transfer only the rows with a date value greater than the start value.

    Periodic snapshot

    Runs snapshots at the specified interval.

    1. Set the Period from 5 minutes to 24 hours between the transfer runs.

    2. Under Snapshot settingsParallel snapshot settings, configure the transfer performance:

      1. Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.

      2. Threads count specifies the number of processes within each container. You can run up to 10 processes.

      The average download speed of data transfers is between 1 and 10 mb/s.

    3. Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:

      • Schema corresponds to the database schema (as in PostgreSQL), name, or dataset. Refer to the Airbyte - destination namespace configuration to learn more about namespaces.

      • Table in the source with which to compare your target table.

      • Key column is the name of a column that contains some value that indicates whether the table should be incremented. A common example of a column that contains a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append to learn more about cursor definitions and usage examples.

      • Initial value (optional) defines a value in the Key column based on which to start tracking changes. For example, you can specify a date as the Initial value. In this case, the service will exclusively transfer the rows with a date value greater than the start value.

    Replication

    Continuously retrieves changes from the source database and applies them to the target database.

    Snapshot and replication

    Transfers the data and keeps the target database in sync with the source database.

    1. Under Snapshot settingsParallel snapshot settings, configure the transfer performance:

      1. Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.

      2. Threads count specifies the number of processes within each container. You can run up to 10 processes.

      The average download speed of data transfers is between 1 and 10 mb/s.

    2. Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:

      • Schema corresponds to the database schema (as in PostgreSQL), name, or dataset.

      • Table in the source to compare with your target table.

      • Key column is the name of a column that contains some value (a cursor) that indicates whether to increment the table. A common example of a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append to learn more about cursor definitions and usage examples.

      • Start value (Optional) defines a value in the Key column based on which to track the changes. For example, you can specify a date as the Start value. In this case, the service will transfer only the rows with a date value greater than the start value.

  6. Specify the Runtime environment properties:

    Serverless

    The settings will be adjusted automatically.

    Dedicated
    1. Select the VM performance preset:

      • small - 2 CPUs

      • medium - 4 CPUs

      • large - 8 CPUs

    2. Select the Minimum logging level:

      Level Description
      TRACE Detailed diagnostics information
      DEBUG Debugging information
      INFO Diagnostics information for statistics
      WARNING Warning about a non-critical malfunction. Please investigate further
      ERROR Error report
      FATAL Possible system failure, service shutdown
    3. Under Settings, choose how to select a network in which to run a virtual machine instance:

      • Automatic

        Allows the service to set the network automatically based on the target resource location.

      • Manual

        Transfer BYOC compatibility

        If you use the BYOC feature for your cluster, select an external network. This will enable Transfer to write data to your BYOA cluster.

        Select a VPC Network ID from the drop-down list.

    Read more about runtimes here: Runtime types in Transfer

  7. Set up Data transformations if you need to modify a list of tables to transfer.

    Public Preview notice

    This feature is provided as Public Preview and is free of charge.

    After the end of the Public Preview period, the functionality and pricing of this feature will be subject to change.

    For instance, you don't want to transfer a column with passwords. In Transformer list, click Add transformer to create a set of transformation rules. Add this column's name to the Exclude columns section. In Transformer[number]Columns filter, configure transformations for tables and columns:

    • Include columns sets the list of columns to transfer and Exclude columns makes the list of columns that won't be transferred. Set these table names as regular expressions.

    Each transformer is a separate set of rules, and you can combine different rules within each set.

    For example, you can set a table in Include tables and an Exclude column. In this case, the service will ignore the specified Exclude columns only for the included table. If you combine Exclude tables and Include columns, only the specified columns will be transferred from all tables except those specified in theExclude tables field.

    1. Click + Transformation to add a new transformation layer. You can apply multiple layers to your data.

    2. From the dropdown menu, select the appropriate transformation type:

      Mask secret fields

      This transformation allows you to apply a hash function to specified columns in tables to further protect sensitive data during transfer.

      1. Under Tables, specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Column list, click + to add a column name. The masking will be applied to the columns listed in this section.

      3. Under Mask functionHashUser-defined Salt, specify the Salt hash you want to apply to your data.

      Columns filter

      This transformation allows you to apply filtering to the list of columns to transfer from the data source.

      1. Under Tables, click + Tables and specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Columns, specify the following:

        • Included columns restricts the set of columns to transfer from the tables specified above.

        • Excluded columns allow transferring all columns except the specified ones.

        Set these column names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      Rename tables

      This transformation gives you a capability to associate the table name on the source with a new table name on the target without changing the contents of the transferred table.

      1. Under Tables list to rename, click + Table.

      2. Under Table 1Source table name:

        • For PostgreSQL data sources, use the Named schema field to provide the source table schema. Leave empty for the data sources that don't support schema and/or database abstractions.

        • Specify the initial Table name on the source.

      3. Under Target table name:

        • For PostgreSQL data sources, use the Named schema field to provide the target table schema. Leave empty for the data sources that don't support schema and/or database abstractions.

        • Specify the intended Table name on the target.

      Replace primary key

      This transformation allows you to reassign the primary key column on the target table.

      1. Under Tables, click + Tables and specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Key columns names, specify the pairs of columns to replace separated by a , comma as follows:

        <column name at the source> <column name at the target>, 
        

        Warning

        Assigning two or more primary keys per table makes these tables incompatible with ClickHouse®.

      Convert values to string

      This transformation allows you to convert a certain data column in a specified table to a string.

      1. Under Tables, click + Tables and specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Columns, specify the following:

        • Included columns restricts the set of columns to transfer from the tables specified above.

        • Excluded columns allow transferring all columns except the specified ones.

        Set these column names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      3. Under Key columns names, click + to add a name of the column containing the primary key.

      4. Under Non-key column names, click + to add a column without a primary key.

      The conversion will be applied to the columns listed in both sections.

      Convert data to raw JSON

      This transformation gives you a capability to convert a certain data column in a specified table to a raw JSON.

      1. Under Tables, specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Columns, specify the following:

        • Included columns restricts the set of columns to transfer from the tables specified above.

        • Excluded columns allow transferring all columns except the specified ones.

        Set these column names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      3. Under Key columns names, click + to add a name of the column containing the primary key.

      4. Under Non-key column names, click + to add a column without a primary key.

      The conversion will be applied to the columns listed in both sections.

      Sharding

      This transformation allows you to distribute the tables between multiple shards on the ClickHouse® data destination.

      1. Under Tables, click + Tables and specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Columns, click + Columns and specify the following:

        • Included columns restricts the set of columns to transfer from the tables specified above.

        • Excluded columns allow transferring all columns except the specified ones.

      3. Enter the Count of shards between which you want to distribute the table data.

      Convert CDC data to raw JSON with history

      This transformation allows you to convert the change data capture (CDC) data to raw JSON with history.

      1. Under Tables, click + Tables and specify the following:

        • Included tables restricts the set of tables to transfer.

        • Excluded tables allow transferring all data except the specified tables.

        Set these table names as regular expressions:

        Collection of regular expression patterns to parse table names
        Pattern Description Example
        abc An explicit series of characters test returns the table names containing test.
        . A character wildcard. Use it to match an expression with defined character positions. t..t returns test, tent, tart etc.
        \ An escape character. Use it to match special characters. \_ returns the table names containing an underscore.
        ? Use it to express that a character (or a group of characters) is optional. c?.n returns can, con, in, on, en, etc.
        + Use it to express that a character (or a group of characters) can appear one and more times. -+ returns the table names containing -, --, --- etc.
        {n} Use it to express that a character (or a group of characters) must appear explicitly n times -{2} returns the table names containing --.
        {n,m} Use it to express that a character (or a group of characters) must appear between n and m times. _{1,3} returns the table names containing _, __ and ___.
        \w An alphanumeric wildcard. Use it to match any alphanumeric characters. Please note that the match pattern is case-sensitive. \w+ returns the table names containing letters and/or digits.
        \W A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. Please note that the match pattern is case-sensitive. \W+ returns the table names containing characters other than letters or digits.
        \d A digit wildcard. Use it to match any digit characters. Please note that the match pattern is case-sensitive. \d+ returns the table names containing digits.
        \D A non-digit wildcard. Use it to match any non-digit characters. Please note that the match pattern is case-sensitive. \D+ returns the table names containing any characters other than digits.
        $ Use it to match the position after the table name's last character. metrics$ returns the table names ending with metrics.
        ^ Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales returns all the tables from the monthly_sales database.
      2. Under Key columns names, click + to add a name of the column containing the primary key.

      3. Under Non-key column names, click + to add a column without a primary key.

      The conversion will be applied to the columns listed in both sections.

      SQL

      This transformer accepts ClickHouse® SQL dialect and allows you to produce SQL-like in-memory data transformation. The solution is based on ClickHouse® Local

      The source table inside ClickHouse® Local is named table, the ClickHouse® table structure mimics the source table structure.

      Since each source change item (row) contains extra metadata, we must match source and target data. Therefore, each row must have a key defined. All these keys should be unique in every batch. Do do this, we call a collapse function.

      If we can't match source keys with transformed data, we mark such row as containing an error.

      When writing an SQL query, you must preserve original key-columns:

      SELECT
         parseDateTime32BestEffortJSONExtractString(CloudTrailEvent, 'eventTime')) AS eventTime,
         JSONExtractString(CloudTrailEvent, 'http_request.user_agent') AS http_useragent,    
         JSONExtractString(CloudTrailEvent, 'errorMessage') AS error_message,
         JSONExtractString(CloudTrailEvent, 'errorCode') AS error_kind, 
         JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS network_client_ip,
         JSONExtractString(CloudTrailEvent, 'eventVersion') AS eventVersion,
         JSONExtractString(CloudTrailEvent, 'eventSource') AS eventSource,
         JSONExtractString(CloudTrailEvent, 'eventName') AS eventName,
         JSONExtractString(CloudTrailEvent, 'awsRegion') AS awsRegion,
         JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS sourceIPAddress,
         JSONExtractString(CloudTrailEvent, 'userAgent') AS userAgent,
         JSONExtractString(CloudTrailEvent, 'requestID') AS requestID,
         JSONExtractString(CloudTrailEvent, 'eventID') AS eventID,
         JSONExtractBool(CloudTrailEvent, 'readOnly') AS readOnly,
         JSONExtractString(CloudTrailEvent, 'eventType') AS eventType,
         JSONExtractBool(CloudTrailEvent, 'managementEvent') AS managementEvent,
         JSONExtractString(CloudTrailEvent, 'recipientAccountId') AS recipientAccountId,
         JSONExtractString(CloudTrailEvent, 'eventCategory') AS eventCategory,
         JSONExtractString(CloudTrailEvent, 'aws_account') AS account,
      
         JSONExtractString(CloudTrailEvent, 'userIdentity.type') AS userIdentity_type,
         JSONExtractString(CloudTrailEvent, 'userIdentity.principalId') AS userIdentity_principalId,
         JSONExtractString(CloudTrailEvent, 'userIdentity.arn') AS userIdentity_arn,
         JSONExtractString(CloudTrailEvent, 'userIdentity.accountId') AS userIdentity_accountId,
         JSONExtractString(CloudTrailEvent, 'userIdentity.accessKeyId') AS userIdentity_accessKeyId,
         
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.type') AS sessionIssuer_type,
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.principalId') AS sessionIssuer_principalId,
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.arn') AS sessionIssuer_arn,
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.accountId') AS sessionIssuer_accountId,
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.userName') AS sessionIssuer_userName,
         
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.webIdFederationData.federatedProvider') AS federatedProvider,
         
         JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.attributes.creationDate') AS attributes_creationDate,
         JSONExtractBool(CloudTrailEvent, 'userIdentity.sessionContext.attributes.mfaAuthenticated') AS attributes_mfaAuthenticated,
         
         JSONExtractString(CloudTrailEvent, 'requestParameters.commandId') AS requestParameters_commandId,
         JSONExtractString(CloudTrailEvent, 'requestParameters.instanceId') AS requestParameters_instanceId,
         
         JSONExtractString(CloudTrailEvent, 'tlsDetails.tlsVersion') AS tlsDetails_tlsVersion,
         JSONExtractString(CloudTrailEvent, 'tlsDetails.cipherSuite') AS tlsDetails_cipherSuite,
         JSONExtractString(CloudTrailEvent, 'tlsDetails.clientProvidedHostHeader') AS tlsDetails_clientProvidedHostHeader
      FROM table
      
      dbt

      Apply your dbt project to the snapshot of the data transferred to ClickHouse®.

      For more information on how to apply dbt transformation to your data, see Transform your data in a Managed ClickHouse® cluster with dbt.

      1. Specify the address of the Git repository containing your dbt project. It must start with https://. The root directory of the repository must contain a dbt_project.yml file.

      2. Under Git branch, specify the branch or a tag of the git repository containing your dbt project.

      3. Provide the DBT profile name which will be created automatically using the settings of the destination endpoint. The name must match the profile property in the dbt_project.yml file.

      4. From the dropdown list, select the Operation for your dbt project to perform. For more information, see the official dbt documentation .

    Deleting a transformation layer

    To delete a transformation layer, click Delete on the to the right of the transformation type dropdown menu.

  8. Click Submit.

You can create a transfer using the Transfer and Transfer endpoint resources of the DoubleCloud Terraform provider.

Tip

If you haven't used Terraform before, refer to Create DoubleCloud resources with Terraform for more detailed instructions.

Example provider and resource configuration:

# main.tf

terraform {
  required_providers {
    doublecloud = {
      source    = "registry.terraform.io/doublecloud/doublecloud"
    }
  }
}

provider "doublecloud" {
  authorized_key = file("authorized_key.json")
}

# Create a source endpoint resource
resource "doublecloud_transfer_endpoint" "example-source-endpoint" {
  name       = "example-source-endpoint"
  project_id = DOUBLECLOUD_PROJECT_ID                         # Replace with your project ID
  settings {
    ...                                                       # Add source endpoint configuration
  }
}

# Create a target endpoint resource
resource "doublecloud_transfer_endpoint" "example-target-endpoint" {
  name       = "example-target-endpoint"
  project_id = DOUBLECLOUD_PROJECT_ID                         # Replace with your project ID
  settings {
    ...                                                       # Add source endpoint configuration
  }
}

# Create a transfer resource
resource "doublecloud_transfer" "example-transfer" {
  name       = "example-transfer"
  project_id = DOUBLECLOUD_PROJECT_ID                          # Replace with your project ID
  source     = doublecloud_transfer_endpoint.example-source-endpoint.id
  target     = doublecloud_transfer_endpoint.example-target-endpoint.id
  type       = "SNAPSHOT_ONLY"
  activated = false
  transformation = {
    transformers = [
      {
        dbt = {
          git_repository_link = "https://github.com/doublecloud/tests-clickhouse-dbt.git"
          profile_name = "my_clickhouse_profile"
          operation = "run"
        }
      },
    ]
  }
}

To learn how to get the authorized_key.json file, refer to Create an API key. You can find the DoubleCloud project ID on the project settings page.

Tip

For a full list of available parameters, refer to the Transfer resource and Transfer endpoint resource schemas.

To create a transfer, use the TrasnferService create method. Pass the following parameters to create a working transfer:

  • source_id - the endpoint ID for the source endpoint.

    To find the endpoint ID, get a list of endpoints in the project.

  • target_id - the endpoint ID for the target endpoint.

  • name - the transfer name. Must be unique within the project.

  • project_id - the ID of the project in which you create a transfer. You can get this value on your project's information page.

  • type - specify your transfer type using the doublecloud.transfer.v1.TransferType model.

github-mark-white

View this example on GitHub

from doublecloud.transfer.v1.transfer_pb2 import TransferType
from doublecloud.transfer.v1.transfer_service_pb2 import CreateTransferRequest


def create_transfer(svc, project_id: str, name: str, src_id: str, dst_id: str):
   return svc.Create(
      CreateTransferRequest(
            source_id=src_id, target_id=dst_id, name=name, project_id=project_id, type=TransferType.SNAPSHOT_ONLY
      )
   )

For more in-depth examples, check out DoubleCloud API Python SDK repository .

func createTransfer(ctx context.Context, dc *dc.SDK, flags *cmdFlags, src string, dst string) (*operation.Operation, error) {
   op, err := dc.WrapOperation(dc.Transfer().Transfer().Create(ctx, &transfer.CreateTransferRequest{
      SourceId:  src,
      TargetId:  dst,
      Name:      *flags.name,
      ProjectId: *flags.projectID,
      Type:      transfer.TransferType_SNAPSHOT_ONLY,
   }))
   if err != nil {
      return op, err
   }
   err = op.Wait(ctx)
   return op, err
}

For more in-depth examples, check out DoubleCloud API Go SDK repository .

Activate a transfer

  1. Open the Transfer page in the console.

  2. Click Activate.

To activate a transfer:

  1. Change the activated parameter in the transfer resource configuration to true:

    resource "doublecloud_transfer" "example-transfer" {
      name      = "example-transfer"
      ...
      activated = true
      ...
    }
    
  2. Update the transfer resource:

    terraform apply
    

Use the TransferService activate method and pass the transfer ID in the transfer_id request parameter.

To find the transfer ID, get a list of transfers in the project.

github-mark-white

View this example on GitHub

from doublecloud.transfer.v1.transfer_service_pb2 import ActivateTransferRequest

def activate_transfer(svc, transfer_id: str):
   return svc.Activate(ActivateTransferRequest(transfer_id=transfer_id))
func activateTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
   op, err := dc.WrapOperation(dc.Transfer().Transfer().Activate(ctx, &transfer.ActivateTransferRequest{
      TransferId: transferId,
   }))
   if err != nil {
      return op, err
   }
   err = op.Wait(ctx)
   return op, err
}

For more in-depth examples, check out DoubleCloud API Go SDK repository .

Deactivate a transfer

  1. Open the Transfer page in the console.

  2. Click the icon near the transfer name and select Deactivate.

To deactivate a transfer:

  1. Change the activated parameter in the transfer resource configuration to false:

    resource "doublecloud_transfer" "example-transfer" {
      name = "example-transfer"
      ...
      activated = false
      ...
    }
    
  2. Update the transfer resource:

    terraform apply
    

Use the TransferService deactivate method and pass the transfer ID in the transfer_id request parameter.

To find the transfer ID, get a list of transfers in the project.

github-mark-white

View this example on GitHub

from doublecloud.transfer.v1.transfer_service_pb2 import DeactivateTransferRequest

def deactivate_transfer(svc, transfer_id: str):
   return svc.Deactivate(DeactivateTransferRequest(transfer_id=transfer_id))
func deactivateTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
   op, err := dc.WrapOperation(dc.Transfer().Transfer().Deactivate(ctx, &transfer.DeactivateTransferRequest{
      TransferId: transferId,
   }))
   if err != nil {
      return op, err
   }
   err = op.Wait(ctx)
   return op, err
}

For more in-depth examples, check out DoubleCloud API Go SDK repository .

Monitor transfer status

You can monitor the health status of your transfer using the Status history timeline on the Overview page:

status-history

The timeline shows the status snapshots of your transfer as colored vertical bars. Currently, there are two possible status messages:

  • OK - the transfer is working normally.
  • ERROR - something is wrong with your transfer. Get further information from the logs.

In the top-right corner of the timeline, you can select the scale of the timeline: one hour (1h), one day (1d), or one week (1w). You can see the current time as a vertical red marker on the timeline.

To navigate the timeline:

  • Use and buttons to move the time scale in selected intervals.
  • Use your mouse cursor to drag the highlighted interval across the timeline located above the colored bars.

On the right of the status display, you can see the information card with the current transfer statistics:

  • The amount of bytes read.
  • The number of data rows read. This parameter is used to calculate the transfer's resource consumption.

Transfer data from the AWS networks

Transfer requires access to the resources in AWS that are usually protected by a set of tools. Refer to the following guide to see how to peer networks and configure DoubleCloud and AWS to perform a successful transfer: Transfer data from a PostgreSQL RDS instance to ClickHouse®.

Update a transfer

  1. Open the Transfer page in the console.

  2. Select the transfer to update.

  3. Click Edit at the top-right of the page.

  4. Edit Basic and Transfer settings (you can't change the Source and Target endpoints at this point).

  5. Click Submit.

To update a transfer:

  1. Change the desired parameters in the transfer resource configuration:

    resource "doublecloud_transfer" "example-transfer" {
      name = "paid-traffic-data"
      ...
      transformation = {
        transformers = [
          {
            dbt = {
              git_repository_link = "https://github.com/doublecloud/tests-clickhouse-dbt.git"
              profile_name = "my_clickhouse_profile"
              operation = "run"
            }
          },
        ]
      }
    }
    
  2. Update the transfer resource:

    terraform apply
    

Use the TransferService update method and pass the transfer ID in the transfer_id request parameter.

To find the transfer ID, get a list of transfers in the project.

You can change the following parameters:

  • description - a new description for the transfer.

  • name - a new transfer name. Please note: it must be unique within the project.

Delete a transfer

  1. Open the Transfer page in the console.

  2. Select the transfer to delete.

  3. Click Delete at the top-right of the page.

  4. Confirm deletion.

  5. Click Delete.

To delete a transfer:

  1. Comment out or delete the transfer resource from the Terraform configuration files.

  2. Apply the configuration:

    terraform apply
    

Use the TransferService delete method and pass the transfer ID in the transfer_id request parameter.

To find the transfer ID, get a list of transfers in the project.

github-mark-white

View this example on GitHub

from doublecloud.transfer.v1.transfer_service_pb2 import DeleteTransferRequest


def delete_transfer(svc, transfer_id: str):
   return svc.Delete(DeleteTransferRequest(transfer_id=transfer_id))

For more in-depth examples, check out DoubleCloud API Python SDK repository .

func deleteTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
   op, err := dc.WrapOperation(dc.Transfer().Transfer().Delete(ctx, &transfer.DeleteTransferRequest{
      TransferId: transferId,
   }))
   if err != nil {
      return op, err
   }
   err = op.Wait(ctx)
   return op, err
}

For more in-depth examples, check out DoubleCloud API Go SDK repository .