Transformations in Transfer

DoubleCloud Transfer can apply transformations to the transferred data on the fly.

You can configure transformations when creating or editing a transfer. To do that, click + Transformation on the transfer configuration page and provide the desired settings.

The list of available transformations in each transfer depends on the source and target types.

SQL

Performs SQL-like in-memory data transformations based on a provided query in the ClickHouse® SQL dialect. The transformer is based on ClickHouse® Local — an isolated ClickHouse® database engine.

The source table inside ClickHouse® Local is named table. The ClickHouse® table structure mimics the source table structure.

Because each source change item (row) contains extra metadata, source and target data must match. Therefore, each row must have a key defined. All these keys must be unique in every batch. For that, a collapse function is called.

If Transfer can't match source keys with the transformed data, it marks such rows as containing an error.

SQL query example

When writing an SQL query, you must preserve original key-columns:

SELECT
   parseDateTime32BestEffortJSONExtractString(CloudTrailEvent, 'eventTime')) AS eventTime,
   JSONExtractString(CloudTrailEvent, 'http_request.user_agent') AS http_useragent,    
   JSONExtractString(CloudTrailEvent, 'errorMessage') AS error_message,
   JSONExtractString(CloudTrailEvent, 'errorCode') AS error_kind, 
   JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS network_client_ip,
   JSONExtractString(CloudTrailEvent, 'eventVersion') AS eventVersion,
   JSONExtractString(CloudTrailEvent, 'eventSource') AS eventSource,
   JSONExtractString(CloudTrailEvent, 'eventName') AS eventName,
   JSONExtractString(CloudTrailEvent, 'awsRegion') AS awsRegion,
   JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS sourceIPAddress,
   JSONExtractString(CloudTrailEvent, 'userAgent') AS userAgent,
   JSONExtractString(CloudTrailEvent, 'requestID') AS requestID,
   JSONExtractString(CloudTrailEvent, 'eventID') AS eventID,
   JSONExtractBool(CloudTrailEvent, 'readOnly') AS readOnly,
   JSONExtractString(CloudTrailEvent, 'eventType') AS eventType,
   JSONExtractBool(CloudTrailEvent, 'managementEvent') AS managementEvent,
   JSONExtractString(CloudTrailEvent, 'recipientAccountId') AS recipientAccountId,
   JSONExtractString(CloudTrailEvent, 'eventCategory') AS eventCategory,
   JSONExtractString(CloudTrailEvent, 'aws_account') AS account,

   JSONExtractString(CloudTrailEvent, 'userIdentity.type') AS userIdentity_type,
   JSONExtractString(CloudTrailEvent, 'userIdentity.principalId') AS userIdentity_principalId,
   JSONExtractString(CloudTrailEvent, 'userIdentity.arn') AS userIdentity_arn,
   JSONExtractString(CloudTrailEvent, 'userIdentity.accountId') AS userIdentity_accountId,
   JSONExtractString(CloudTrailEvent, 'userIdentity.accessKeyId') AS userIdentity_accessKeyId,

   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.type') AS sessionIssuer_type,
   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.principalId') AS sessionIssuer_principalId,
   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.arn') AS sessionIssuer_arn,
   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.accountId') AS sessionIssuer_accountId,
   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.userName') AS sessionIssuer_userName,

   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.webIdFederationData.federatedProvider') AS federatedProvider,

   JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.attributes.creationDate') AS attributes_creationDate,
   JSONExtractBool(CloudTrailEvent, 'userIdentity.sessionContext.attributes.mfaAuthenticated') AS attributes_mfaAuthenticated,

   JSONExtractString(CloudTrailEvent, 'requestParameters.commandId') AS requestParameters_commandId,
   JSONExtractString(CloudTrailEvent, 'requestParameters.instanceId') AS requestParameters_instanceId,

   JSONExtractString(CloudTrailEvent, 'tlsDetails.tlsVersion') AS tlsDetails_tlsVersion,
   JSONExtractString(CloudTrailEvent, 'tlsDetails.cipherSuite') AS tlsDetails_cipherSuite,
   JSONExtractString(CloudTrailEvent, 'tlsDetails.clientProvidedHostHeader') AS tlsDetails_clientProvidedHostHeader
FROM table

Mask secret fields

Applies a hash function to specified columns in tables to further protect sensitive data during transfer.

Columns filter

Applies filtering to the list of columns transferred from the data source.

Rename tables

Associates a table name on the source with a new table name on the target without changing the contents of the transferred table.

Replace primary key

Reassigns the primary key column on the target table.

Convert values to string

Converts a certain data column in a specified table to a string.

Convert data to raw JSON

Converts a certain data column in a specified table to raw JSON.

Sharding

Distributes tables between multiple shards on the ClickHouse® data destination.

Convert CDC data to raw JSON with history

Converts the change data capture (CDC) data to raw JSON with history.

dbt

Applies a dbt project to the snapshot of the data transferred to ClickHouse®.

For more information on how to apply dbt transformation to data, refer to Transform your data in a Managed ClickHouse® cluster with dbt.

See also