Transformations in Transfer

DoubleCloud Transfer can apply transformations of the to your data. Below you can find descriptions for every transformation type:

  • Mask secret fields

    This transformation allows you to apply a hash function to specified columns in tables to further protect sensitive data during transfer.

  • Columns filter

    This transformation allows you to apply filtering to the list of columns to transfer from the data source.

  • Rename tables

    This transformation gives you a capability to associate the table name on the source with a new table name on the target without changing the contents of the transferred table.

  • Replace primary key

    This transformation allows you to reassign the primary key column on the target table.

  • Convert values to string

    This transformation allows you to convert a certain data column in a specified table to a string.

  • Convert data to raw JSON

    This transformation gives you a capability to convert a certain data column in a specified table to a raw JSON.

  • Sharding

    This transformation allows you to distribute the tables between multiple shards on the ClickHouse® data destination.

  • Convert CDC data to raw JSON with history

    This transformation allows you to convert the change data capture (CDC) data to raw JSON with history.

  • SQL

    This transformer accepts ClickHouse® SQL dialect and allows you to produce SQL-like in-memory data transformation. The solution is based on ClickHouse® Local

    The source table inside ClickHouse® Local is named table, the ClickHouse® table structure mimics the source table structure.

    Since each source change item (row) contains extra metadata, we must match source and target data. Therefore, each row must have a key defined. All these keys should be unique in every batch. Do do this, we call a collapse function.

    If we can't match source keys with transformed data, we mark such row as containing an error.

    SQL query example

    When writing an SQL query, you must preserve original key-columns:

    SELECT
       parseDateTime32BestEffortJSONExtractString(CloudTrailEvent, 'eventTime')) AS eventTime,
       JSONExtractString(CloudTrailEvent, 'http_request.user_agent') AS http_useragent,    
       JSONExtractString(CloudTrailEvent, 'errorMessage') AS error_message,
       JSONExtractString(CloudTrailEvent, 'errorCode') AS error_kind, 
       JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS network_client_ip,
       JSONExtractString(CloudTrailEvent, 'eventVersion') AS eventVersion,
       JSONExtractString(CloudTrailEvent, 'eventSource') AS eventSource,
       JSONExtractString(CloudTrailEvent, 'eventName') AS eventName,
       JSONExtractString(CloudTrailEvent, 'awsRegion') AS awsRegion,
       JSONExtractString(CloudTrailEvent, 'sourceIPAddress') AS sourceIPAddress,
       JSONExtractString(CloudTrailEvent, 'userAgent') AS userAgent,
       JSONExtractString(CloudTrailEvent, 'requestID') AS requestID,
       JSONExtractString(CloudTrailEvent, 'eventID') AS eventID,
       JSONExtractBool(CloudTrailEvent, 'readOnly') AS readOnly,
       JSONExtractString(CloudTrailEvent, 'eventType') AS eventType,
       JSONExtractBool(CloudTrailEvent, 'managementEvent') AS managementEvent,
       JSONExtractString(CloudTrailEvent, 'recipientAccountId') AS recipientAccountId,
       JSONExtractString(CloudTrailEvent, 'eventCategory') AS eventCategory,
       JSONExtractString(CloudTrailEvent, 'aws_account') AS account,
    
       JSONExtractString(CloudTrailEvent, 'userIdentity.type') AS userIdentity_type,
       JSONExtractString(CloudTrailEvent, 'userIdentity.principalId') AS userIdentity_principalId,
       JSONExtractString(CloudTrailEvent, 'userIdentity.arn') AS userIdentity_arn,
       JSONExtractString(CloudTrailEvent, 'userIdentity.accountId') AS userIdentity_accountId,
       JSONExtractString(CloudTrailEvent, 'userIdentity.accessKeyId') AS userIdentity_accessKeyId,
       
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.type') AS sessionIssuer_type,
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.principalId') AS sessionIssuer_principalId,
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.arn') AS sessionIssuer_arn,
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.accountId') AS sessionIssuer_accountId,
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.sessionIssuer.userName') AS sessionIssuer_userName,
       
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.webIdFederationData.federatedProvider') AS federatedProvider,
       
       JSONExtractString(CloudTrailEvent, 'userIdentity.sessionContext.attributes.creationDate') AS attributes_creationDate,
       JSONExtractBool(CloudTrailEvent, 'userIdentity.sessionContext.attributes.mfaAuthenticated') AS attributes_mfaAuthenticated,
       
       JSONExtractString(CloudTrailEvent, 'requestParameters.commandId') AS requestParameters_commandId,
       JSONExtractString(CloudTrailEvent, 'requestParameters.instanceId') AS requestParameters_instanceId,
       
       JSONExtractString(CloudTrailEvent, 'tlsDetails.tlsVersion') AS tlsDetails_tlsVersion,
       JSONExtractString(CloudTrailEvent, 'tlsDetails.cipherSuite') AS tlsDetails_cipherSuite,
       JSONExtractString(CloudTrailEvent, 'tlsDetails.clientProvidedHostHeader') AS tlsDetails_clientProvidedHostHeader
    FROM table
    
  • dbt

    Apply your dbt project to the snapshot of the data transferred to ClickHouse®.

    For more information on how to apply dbt transformation to your data, see Transform your data in a Managed ClickHouse® cluster with dbt.

See also