Create and manage transfers
Create a transfer
-
Open the Transfer
-
Click Create → Transfer.
-
Under Endpoints:
-
In Source, select an existing source endpoint or create a new one.
-
in Target, select an existing target endpoint or create a new one.
-
-
Under Basic settings, enter a transfer name and (optionally) a description.
-
Under Transfer Parameters, select a Transfer type. Available transfer types depend on the type of source and destination endpoints:
Snapshot
Makes a one-time transfer of the database snapshot.
-
Under Snapshot settings → Parallel snapshot settings, configure the transfer performance:
-
Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.
-
Threads count specifies the number of processes within each container. You can run up to 10 processes.
The average download speed of data transfers is between 1 and 10 mb/s.
-
-
Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
-
Schema corresponds to the database schema (as in PostgreSQL), name, or dataset.
-
Table in the source to compare with your target table.
-
Key column is the name of a column that contains some value (a cursor) that indicates whether to increment the table. A common example of a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append
-
Start value (Optional) defines a value in the Key column based on which to track the changes. For example, you can specify a date as the Start value. In this case, the service will transfer only the rows with a date value greater than the start value.
-
Periodic snapshot
Runs snapshots at the specified interval.
-
Set the Period from 5 minutes to 24 hours between the transfer runs.
-
Under Snapshot settings → Parallel snapshot settings, configure the transfer performance:
-
Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.
-
Threads count specifies the number of processes within each container. You can run up to 10 processes.
The average download speed of data transfers is between 1 and 10 mb/s.
-
-
Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
-
Schema corresponds to the database schema (as in PostgreSQL), name, or dataset. Refer to the Airbyte - destination namespace configuration
-
Table in the source with which to compare your target table.
-
Key column is the name of a column that contains some value that indicates whether the table should be incremented. A common example of a column that contains a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append
-
Initial value (optional) defines a value in the Key column based on which to start tracking changes. For example, you can specify a date as the Initial value. In this case, the service will exclusively transfer the rows with a date value greater than the start value.
-
Replication
Continuously retrieves changes from the source database and applies them to the target database.
Snapshot and replication
Transfers the data and keeps the target database in sync with the source database.
-
Under Snapshot settings → Parallel snapshot settings, configure the transfer performance:
-
Processes count sets the number of parallel instances of a container with a transfer. Increasing this number will speed up your transfer execution. You can specify up to 8 instances.
-
Threads count specifies the number of processes within each container. You can run up to 10 processes.
The average download speed of data transfers is between 1 and 10 mb/s.
-
-
Add one or more Incremental tables. With incremental tables, you can transfer only the data that has changed. Each of these tables has the following fields:
-
Schema corresponds to the database schema (as in PostgreSQL), name, or dataset.
-
Table in the source to compare with your target table.
-
Key column is the name of a column that contains some value (a cursor) that indicates whether to increment the table. A common example of a cursor is a column with timestamps. Refer to the Airbyte - Incremental Sync - Append
-
Start value (Optional) defines a value in the Key column based on which to track the changes. For example, you can specify a date as the Start value. In this case, the service will transfer only the rows with a date value greater than the start value.
-
-
-
Specify the Runtime environment properties:
Serverless
The settings will be adjusted automatically.
Dedicated
-
Select the VM performance preset:
-
Tiny: 2vCPUs (burstable instance)
Tiny instances are ideal for scenarios where a transfer is typically idle
They have 20% baseline performance, and the ability to burst to the maximum performance isn't guaranteed.
-
Small: 2 vCPUs
-
Medium: 4 vCPUs
-
Large: 8 vCPUs
-
-
Select the Minimum logging level:
Level Description TRACE Detailed diagnostics information DEBUG Debugging information INFO Diagnostics information for statistics WARNING Warning about a non-critical malfunction, investigate further ERROR Error report FATAL Possible system failure, service shutdown -
Under Settings, choose how to select a network in which to run a virtual machine instance:
-
Automatic
Allows the service to set the network automatically based on the target resource location.
-
Manual
Transfer BYOC compatibility
If you use the BYOC feature for your cluster, select an external network. This will enable Transfer to write data to your BYOA cluster.
Select a VPC Network ID from the drop-down list.
-
Read more about runtimes here: Runtime types in Transfer
-
-
Set up Data transformations if you need to modify a list of tables to transfer.
Public Preview notice
This feature is provided as Public Preview
After the end of the Public Preview period, the functionality and pricing of this feature will be subject to change.
For example, you don't want to transfer the column with passwords. In Transformer list, click Add transformer to create a set of transformation rules. Add this column's name to the
Exclude columns
section. In Transformer[number] → Columns filter, configure transformations for tables and columns:- Include columns sets the list of columns to transfer and Exclude columns makes the list of columns that won't be transferred. Set these table names as regular expressions.
Each transformer is a separate set of rules, and you can combine different rules within each set.
For example, you can set a table in
Include tables
and anExclude column
. In this case, the service will ignore the specifiedExclude columns
only for the included table. If you combineExclude tables
andInclude columns
, only the specified columns will be transferred from all tables except those specified in theExclude tables
field.-
Click + Transformation to add a new transformation layer. You can apply multiple layers to your data.
-
From the dropdown menu, select the appropriate transformation type:
Mask secret fields
This transformation allows you to apply a hash function to specified columns in tables to further protect sensitive data during transfer.
-
Under Tables, specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Column list, click + to add a column name. The masking will be applied to the columns listed in this section.
-
Under Mask function → Hash → User-defined Salt, specify the Salt hash
Columns filter
This transformation allows you to apply filtering to the list of columns to transfer from the data source.
-
Under Tables, click + Tables and specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Columns, specify the following:
-
Included columns restricts the set of columns to transfer from the tables specified above.
-
Excluded columns allow transferring all columns except the specified ones.
Set these column names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
Rename tables
This transformation gives you a capability to associate the table name on the source with a new table name on the target without changing the contents of the transferred table.
-
Under Tables list to rename, click + Table.
-
Under Table 1 → Source table name:
-
For PostgreSQL data sources, use the Named schema field to provide the source table schema. Leave empty for the data sources that don't support schema and/or database abstractions.
-
Specify the initial Table name on the source.
-
-
Under Target table name:
-
For PostgreSQL data sources, use the Named schema field to provide the target table schema. Leave empty for the data sources that don't support schema and/or database abstractions.
-
Specify the intended Table name on the target.
-
Replace primary key
This transformation allows you to reassign the primary key column on the target table.
-
Under Tables, click + Tables and specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Key columns names, specify the pairs of columns to replace separated by a
,
comma as follows:<column name at the source> <column name at the target>,
Warning
Assigning two or more primary keys per table makes these tables incompatible with ClickHouse®.
Convert values to string
This transformation allows you to convert a certain data column in a specified table to a string.
-
Under Tables, click + Tables and specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Columns, specify the following:
-
Included columns restricts the set of columns to transfer from the tables specified above.
-
Excluded columns allow transferring all columns except the specified ones.
Set these column names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Key columns names, click + to add a name of the column containing the primary key.
-
Under Non-key column names, click + to add a column without a primary key.
The conversion will be applied to the columns listed in both sections.
Convert data to raw JSON
This transformation gives you a capability to convert a certain data column in a specified table to a raw JSON.
-
Under Tables, specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Columns, specify the following:
-
Included columns restricts the set of columns to transfer from the tables specified above.
-
Excluded columns allow transferring all columns except the specified ones.
Set these column names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Key columns names, click + to add a name of the column containing the primary key.
-
Under Non-key column names, click + to add a column without a primary key.
The conversion will be applied to the columns listed in both sections.
Sharding
This transformation allows you to distribute the tables between multiple shards on the ClickHouse® data destination.
-
Under Tables, click + Tables and specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Columns, click + Columns and specify the following:
-
Included columns restricts the set of columns to transfer from the tables specified above.
-
Excluded columns allow transferring all columns except the specified ones.
-
-
Enter the Count of shards between which you want to distribute the table data.
Convert CDC data to raw JSON with history
This transformation allows you to convert the change data capture (CDC) data
-
Under Tables, click + Tables and specify the following:
-
Included tables restricts the set of tables to transfer.
-
Excluded tables allow transferring all data except the specified tables.
Set these table names as regular expressions:
Collection of regular expression patterns to parse table names
Pattern Description Example abc
An explicit series of characters test
returns the table names containingtest
..
A character wildcard. Use it to match an expression with defined character positions. t..t
returnstest
,tent
,tart
etc.\
An escape character. Use it to match special characters. \_
returns the table names containing an underscore.?
Use it to express that a character (or a group of characters) is optional. c?.n
returnscan
,con
,in
,on
,en
, etc.+
Use it to express that a character (or a group of characters) can appear one and more times. -+
returns the table names containing-
,--
,---
etc.{n}
Use it to express that a character (or a group of characters) must appear explicitly n
times-{2}
returns the table names containing--
.{n,m}
Use it to express that a character (or a group of characters) must appear between n
andm
times._{1,3}
returns the table names containing_
,__
and___
.\w
An alphanumeric wildcard. Use it to match any alphanumeric characters. The match pattern is case-sensitive. \w+
returns the table names containing letters and/or digits.\W
A non-alphanumeric wildcard. Use it to match any non-alphanumeric character. The match pattern is case-sensitive. \W+
returns the table names containing characters other than letters or digits.\d
A digit wildcard. Use it to match any digit characters. The match pattern is case-sensitive. \d+
returns the table names containing digits.\D
A non-digit wildcard. Use it to match any non-digit characters. The match pattern is case-sensitive. \D+
returns the table names containing any characters other than digits.$
Use it to match the position after the table name's last character. metrics$
returns the table names ending withmetrics
.^
Use it to match the position before the table name's first character. This position is useful to define database names. For example, ^monthly_sales
returns all the tables from themonthly_sales
database. -
-
Under Key columns names, click + to add a name of the column containing the primary key.
-
Under Non-key column names, click + to add a column without a primary key.
The conversion will be applied to the columns listed in both sections.
SQL
This transformer accepts ClickHouse® SQL dialect and allows you to produce SQL-like in-memory data transformation. The solution is based on ClickHouse® Local
The source table inside ClickHouse® Local is named
table
, the ClickHouse® table structure mimics the source table structure.Since each source change item (row) contains extra metadata, we must match source and target data. Therefore, each row must have a key defined. All these keys should be unique in every batch. Do do this, we call a collapse function.
If we can't match source keys with transformed data, we mark such row as containing an error.
When writing an SQL query, you must preserve original key-columns:
SELECT parseDateTime32BestEffortJSONExtractString(CloudTrailEvent, 'eventTime')) AS eventTime, JSONExtractString(CloudTrailEvent, 'http_request.user_agent') AS http_useragent, JSONExtractString(CloudTrailEvent, 'errorMessage') AS error_message, JSONExtractString(CloudTrailEvent, 'errorCode') AS error_kind, JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS network_client_ip, JSONExtractString(CloudTrailEvent, 'eventVersion') AS eventVersion, JSONExtractString(CloudTrailEvent, 'eventSource') AS eventSource, JSONExtractString(CloudTrailEvent, 'eventName') AS eventName, JSONExtractString(CloudTrailEvent, 'awsRegion') AS awsRegion, JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS sourceIPAddress, JSONExtractString(CloudTrailEvent, 'userAgent') AS userAgent, JSONExtractString(CloudTrailEvent, 'requestID') AS requestID, JSONExtractString(CloudTrailEvent, 'eventID') AS eventID, JSONExtractBool(CloudTrailEvent, 'readOnly') AS readOnly, JSONExtractString(CloudTrailEvent, 'eventType') AS eventType, JSONExtractBool(CloudTrailEvent, 'managementEvent') AS managementEvent, JSONExtractString(CloudTrailEvent, 'recipientAccountId') AS recipientAccountId, JSONExtractString(CloudTrailEvent, 'eventCategory') AS eventCategory, JSONExtractString(CloudTrailEvent, 'aws_account') AS account, JSONExtractString(CloudTrailEvent, 'userIdentity.type') AS userIdentity_type, JSONExtractString(CloudTrailEvent, 'userIdentity.principalId') AS userIdentity_principalId, JSONExtractString(CloudTrailEvent, 'userIdentity.arn') AS userIdentity_arn, JSONExtractString(CloudTrailEvent, 'userIdentity.accountId') AS userIdentity_accountId, JSONExtractString(CloudTrailEvent, 'userIdentity.accessKeyId') AS userIdentity_accessKeyId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.type') AS sessionIssuer_type, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.principalId') AS sessionIssuer_principalId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.arn') AS sessionIssuer_arn, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.accountId') AS sessionIssuer_accountId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.userName') AS sessionIssuer_userName, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.webIdFederationData.federatedProvider') AS federatedProvider, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.attributes.creationDate') AS attributes_creationDate, JSONExtractBool(CloudTrailEvent, 'userIdentity.sessionContext.attributes.mfaAuthenticated') AS attributes_mfaAuthenticated, JSONExtractString(CloudTrailEvent, 'requestParameters.commandId') AS requestParameters_commandId, JSONExtractString(CloudTrailEvent, 'requestParameters.instanceId') AS requestParameters_instanceId, JSONExtractString(CloudTrailEvent, 'tlsDetails.tlsVersion') AS tlsDetails_tlsVersion, JSONExtractString(CloudTrailEvent, 'tlsDetails.cipherSuite') AS tlsDetails_cipherSuite, JSONExtractString(CloudTrailEvent, 'tlsDetails.clientProvidedHostHeader') AS tlsDetails_clientProvidedHostHeader FROM table
dbt
Apply your dbt project to the snapshot of the data transferred to ClickHouse®.
For more information on how to apply dbt transformation to your data, see Transform your data in a Managed ClickHouse® cluster with dbt.
-
Specify the address of the Git repository containing your dbt project. It must start with
https://
. The root directory of the repository must contain adbt_project.yml
file. -
Under Git branch, specify the branch or a tag of the git repository containing your dbt project.
-
Provide the DBT profile name which will be created automatically using the settings of the destination endpoint. The name must match the
profile
property in thedbt_project.yml
file. -
From the dropdown list, select the Operation for your dbt project to perform. For more information, see the official dbt documentation
-
Deleting a transformation layer
To delete a transformation layer, click
-
Click Submit.
You can create a transfer using the
Transfer
Tip
If you haven’t used Terraform before, refer to Create DoubleCloud resources with Terraform for more detailed instructions.
Example provider and resource configuration:
# main.tf
terraform {
required_providers {
doublecloud = {
source = "registry.terraform.io/doublecloud/doublecloud"
}
}
}
provider "doublecloud" {
authorized_key = file("authorized_key.json")
}
# Create a source endpoint resource
resource "doublecloud_transfer_endpoint" "example-source-endpoint" {
name = "example-source-endpoint"
project_id = DOUBLECLOUD_PROJECT_ID # Replace with your project ID
settings {
... # Add source endpoint configuration
}
}
# Create a target endpoint resource
resource "doublecloud_transfer_endpoint" "example-target-endpoint" {
name = "example-target-endpoint"
project_id = DOUBLECLOUD_PROJECT_ID # Replace with your project ID
settings {
... # Add source endpoint configuration
}
}
# Create a transfer resource
resource "doublecloud_transfer" "example-transfer" {
name = "example-transfer"
project_id = DOUBLECLOUD_PROJECT_ID # Replace with your project ID
source = doublecloud_transfer_endpoint.example-source-endpoint.id
target = doublecloud_transfer_endpoint.example-target-endpoint.id
type = "SNAPSHOT_ONLY"
activated = false
transformation = {
transformers = [
{
dbt = {
git_repository_link = "https://github.com/doublecloud/tests-clickhouse-dbt.git"
profile_name = "my_clickhouse_profile"
operation = "run"
}
},
]
}
}
To learn how to get the authorized_key.json
file,
refer to Create an API key.
You can find the DoubleCloud project ID on the project settings page.
Tip
For a full list of available parameters, refer to the
Transfer resource
To create a transfer,
use the TrasnferService
create method.
Pass the following parameters to create a working transfer:
-
source_id
: ID of the source endpoint.Tip
To find the endpoint ID, get a list of endpoints in the project.
-
target_id
: ID of the target endpoint. -
name
: Transfer name. Must be unique within the project. -
project_id
: ID of the project where you want to create a transfer. You can get this value on your project's information page. -
type
: Transfer type according to the doublecloud.transfer.v1.TransferType model.
from doublecloud.transfer.v1.transfer_pb2 import TransferType
from doublecloud.transfer.v1.transfer_service_pb2 import CreateTransferRequest
def create_transfer(svc, project_id: str, name: str, src_id: str, dst_id: str):
return svc.Create(
CreateTransferRequest(
source_id=src_id, target_id=dst_id, name=name, project_id=project_id, type=TransferType.SNAPSHOT_ONLY
)
)
For more in-depth examples, check out DoubleCloud API Python SDK repository
func createTransfer(ctx context.Context, dc *dc.SDK, flags *cmdFlags, src string, dst string) (*operation.Operation, error) {
op, err := dc.WrapOperation(dc.Transfer().Transfer().Create(ctx, &transfer.CreateTransferRequest{
SourceId: src,
TargetId: dst,
Name: *flags.name,
ProjectId: *flags.projectID,
Type: transfer.TransferType_SNAPSHOT_ONLY,
}))
if err != nil {
return op, err
}
err = op.Wait(ctx)
return op, err
}
For more in-depth examples, check out DoubleCloud API Go SDK repository
Activate a transfer
-
Open the Transfer
-
Click
To activate a transfer:
-
Change the
activated
parameter in the transfer resource configuration totrue
:resource "doublecloud_transfer" "example-transfer" { name = "example-transfer" ... activated = true ... }
-
Update the transfer resource:
terraform apply
Use the TransferService
activate method
and pass the transfer ID in the transfer_id
request parameter.
Tip
To find the transfer ID, get a list of transfers in the project.
from doublecloud.transfer.v1.transfer_service_pb2 import ActivateTransferRequest
def activate_transfer(svc, transfer_id: str):
return svc.Activate(ActivateTransferRequest(transfer_id=transfer_id))
func activateTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
op, err := dc.WrapOperation(dc.Transfer().Transfer().Activate(ctx, &transfer.ActivateTransferRequest{
TransferId: transferId,
}))
if err != nil {
return op, err
}
err = op.Wait(ctx)
return op, err
}
For more in-depth examples, check out DoubleCloud API Go SDK repository
Deactivate a transfer
-
Open the Transfer
-
Next to the transfer name, click
To deactivate a transfer:
-
Change the
activated
parameter in the transfer resource configuration tofalse
:resource "doublecloud_transfer" "example-transfer" { name = "example-transfer" ... activated = false ... }
-
Update the transfer resource:
terraform apply
Use the TransferService
deactivate method
and pass the transfer ID in the transfer_id
request parameter.
Tip
To find the transfer ID, get a list of transfers in the project.
from doublecloud.transfer.v1.transfer_service_pb2 import DeactivateTransferRequest
def deactivate_transfer(svc, transfer_id: str):
return svc.Deactivate(DeactivateTransferRequest(transfer_id=transfer_id))
func deactivateTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
op, err := dc.WrapOperation(dc.Transfer().Transfer().Deactivate(ctx, &transfer.DeactivateTransferRequest{
TransferId: transferId,
}))
if err != nil {
return op, err
}
err = op.Wait(ctx)
return op, err
}
For more in-depth examples, check out DoubleCloud API Go SDK repository
Monitor transfer status
You can monitor the health status of your transfer using the Status history timeline on the Overview page:
The timeline shows the status snapshots of your transfer as colored vertical bars. Currently, there are two possible status messages:
- OK: The transfer is running normally.
- ERROR: Something’s wrong with the transfer. For more information, refer to the transfer logs.
In the top-right corner of the timeline, you can select the scale: one hour (1h), one day (1d), or one week (1w). You can see the current time as a vertical red marker on the timeline.
To navigate the timeline:
- Use and buttons to move the time scale in selected intervals.
- Use your mouse cursor to drag the highlighted interval across the timeline located above the colored bars.
On the right of the status display, you can see the information card with the current transfer statistics:
- The amount of bytes read.
- The number of data rows read. This parameter is used to calculate the transfer's resource consumption.
Transfer data from the AWS networks
Transfer requires access to the resources in AWS that are usually protected by a set of tools. Refer to the following guide to see how to peer networks and configure DoubleCloud and AWS to perform a successful transfer: Transfer data from a PostgreSQL RDS instance to ClickHouse®.
Update a transfer
-
Open the Transfer
-
Select the transfer you want to update.
-
Click Edit at the top right of the page.
-
Edit Basic and Transfer settings.
-
Click Submit.
To update a transfer:
-
Change the desired parameters in the transfer resource configuration:
resource "doublecloud_transfer" "example-transfer" { name = "paid-traffic-data" ... transformation = { transformers = [ { dbt = { git_repository_link = "https://github.com/doublecloud/tests-clickhouse-dbt.git" profile_name = "my_clickhouse_profile" operation = "run" } }, ] } }
-
Update the transfer resource:
terraform apply
Use the TransferService
update method and pass the transfer ID in the transfer_id
request parameter.
Tip
To find the transfer ID, get a list of transfers in the project.
You can change the following parameters:
-
description
: New transfer description. -
name
: New transfer name. Must be unique within the project.
Delete a transfer
-
Open the Transfer
-
Select the transfer you want to delete.
-
Click Delete at the top right of the page.
-
In the dialog that opens, confirm deletion and click Delete.
To delete a transfer:
-
Comment out or delete the transfer resource from the Terraform configuration files.
-
Apply the configuration:
terraform apply
Use the TransferService
delete method
and pass the transfer ID in the transfer_id
request parameter.
Tip
To find the transfer ID, get a list of transfers in the project.
from doublecloud.transfer.v1.transfer_service_pb2 import DeleteTransferRequest
def delete_transfer(svc, transfer_id: str):
return svc.Delete(DeleteTransferRequest(transfer_id=transfer_id))
For more in-depth examples, check out DoubleCloud API Python SDK repository
func deleteTransfer(ctx context.Context, dc *dc.SDK, transferId string) (*operation.Operation, error) {
op, err := dc.WrapOperation(dc.Transfer().Transfer().Delete(ctx, &transfer.DeleteTransferRequest{
TransferId: transferId,
}))
if err != nil {
return op, err
}
err = op.Wait(ctx)
return op, err
}
For more in-depth examples, check out DoubleCloud API Go SDK repository