Transformations in Transfer
DoubleCloud Transfer can apply transformations of the to your data. Below you can find descriptions for every transformation type:
-
Mask secret fields
This transformation allows you to apply a hash function to specified columns in tables to further protect sensitive data during transfer.
-
Columns filter
This transformation allows you to apply filtering to the list of columns to transfer from the data source.
-
Rename tables
This transformation gives you a capability to associate the table name on the source with a new table name on the target without changing the contents of the transferred table.
-
Replace primary key
This transformation allows you to reassign the primary key column on the target table.
-
Convert values to string
This transformation allows you to convert a certain data column in a specified table to a string.
-
Convert data to raw JSON
This transformation gives you a capability to convert a certain data column in a specified table to a raw JSON.
-
Sharding
This transformation allows you to distribute the tables between multiple shards on the ClickHouse® data destination.
-
Convert CDC data to raw JSON with history
This transformation allows you to convert the change data capture (CDC) data
-
SQL
This transformer accepts ClickHouse® SQL dialect and allows you to produce SQL-like in-memory data transformation. The solution is based on ClickHouse® Local
The source table inside ClickHouse® Local is named
table
, the ClickHouse® table structure mimics the source table structure.Since each source change item (row) contains extra metadata, we must match source and target data. Therefore, each row must have a key defined. All these keys should be unique in every batch. Do do this, we call a collapse function.
If we can't match source keys with transformed data, we mark such row as containing an error.
SQL query exampleWhen writing an SQL query, you must preserve original key-columns:
SELECT parseDateTime32BestEffortJSONExtractString(CloudTrailEvent, 'eventTime')) AS eventTime, JSONExtractString(CloudTrailEvent, 'http_request.user_agent') AS http_useragent, JSONExtractString(CloudTrailEvent, 'errorMessage') AS error_message, JSONExtractString(CloudTrailEvent, 'errorCode') AS error_kind, JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS network_client_ip, JSONExtractString(CloudTrailEvent, 'eventVersion') AS eventVersion, JSONExtractString(CloudTrailEvent, 'eventSource') AS eventSource, JSONExtractString(CloudTrailEvent, 'eventName') AS eventName, JSONExtractString(CloudTrailEvent, 'awsRegion') AS awsRegion, JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS sourceIPAddress, JSONExtractString(CloudTrailEvent, 'userAgent') AS userAgent, JSONExtractString(CloudTrailEvent, 'requestID') AS requestID, JSONExtractString(CloudTrailEvent, 'eventID') AS eventID, JSONExtractBool(CloudTrailEvent, 'readOnly') AS readOnly, JSONExtractString(CloudTrailEvent, 'eventType') AS eventType, JSONExtractBool(CloudTrailEvent, 'managementEvent') AS managementEvent, JSONExtractString(CloudTrailEvent, 'recipientAccountId') AS recipientAccountId, JSONExtractString(CloudTrailEvent, 'eventCategory') AS eventCategory, JSONExtractString(CloudTrailEvent, 'aws_account') AS account, JSONExtractString(CloudTrailEvent, 'userIdentity.type') AS userIdentity_type, JSONExtractString(CloudTrailEvent, 'userIdentity.principalId') AS userIdentity_principalId, JSONExtractString(CloudTrailEvent, 'userIdentity.arn') AS userIdentity_arn, JSONExtractString(CloudTrailEvent, 'userIdentity.accountId') AS userIdentity_accountId, JSONExtractString(CloudTrailEvent, 'userIdentity.accessKeyId') AS userIdentity_accessKeyId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.type') AS sessionIssuer_type, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.principalId') AS sessionIssuer_principalId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.arn') AS sessionIssuer_arn, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.accountId') AS sessionIssuer_accountId, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.userName') AS sessionIssuer_userName, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.webIdFederationData.federatedProvider') AS federatedProvider, JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.attributes.creationDate') AS attributes_creationDate, JSONExtractBool(CloudTrailEvent, 'userIdentity.sessionContext.attributes.mfaAuthenticated') AS attributes_mfaAuthenticated, JSONExtractString(CloudTrailEvent, 'requestParameters.commandId') AS requestParameters_commandId, JSONExtractString(CloudTrailEvent, 'requestParameters.instanceId') AS requestParameters_instanceId, JSONExtractString(CloudTrailEvent, 'tlsDetails.tlsVersion') AS tlsDetails_tlsVersion, JSONExtractString(CloudTrailEvent, 'tlsDetails.cipherSuite') AS tlsDetails_cipherSuite, JSONExtractString(CloudTrailEvent, 'tlsDetails.clientProvidedHostHeader') AS tlsDetails_clientProvidedHostHeader FROM table
-
dbt
Apply your dbt project to the snapshot of the data transferred to ClickHouse®.
For more information on how to apply dbt transformation to your data, see Transform your data in a Managed ClickHouse® cluster with dbt.