Transfer data from a PostgreSQL RDS instance to ClickHouse®
This use case covers a scenario when you have a PostgreSQL database and want to use DoubleCloud to migrate data from it to a Managed ClickHouse® cluster and visualize it.
This tutorial shows how to peer your DoubleCloud and AWS networks. Most sections describe AWS-specific actions such as the configuration of security groups, route tables, and the RDS instance. For demonstration purposes, we'll use a default VPC for an AWS region. The DoubleCloud network will contain a Managed ClickHouse® cluster and the AWS one will have an PostgreSQL RDS instance from where we'll transfer data.
The process scheme is the following:
Tip
This tutorial shows the use of PostgreSQL in RDS. DoubleCloud also allows you to transfer data from Amazon Aurora
You'll see how to perform these actions by following the steps below:
Prepare your Managed ClickHouse® cluster
Create a Managed ClickHouse® cluster
-
Go to the Clusters overview
-
Click Create cluster in the upper-right corner of the page.
-
Select ClickHouse®.
-
Choose a provider and a region.
-
Under Resources:
-
Select the
s1-c2-m4
preset for CPU, RAM capacity, and storage space to create a cluster with minimal configuration.Understand your ClickHouse® resource presetA resource preset has the following structure:
<CPU platform> - C<number of CPU cores> - M<number of gigabytes of RAM>
There are three available CPU platforms:
-
g
- ARM Graviton -
i
- Intel (x86) -
s
- AMD (x86)
For example, the
i1-c2-m8
preset means that it's an Intel platform 2-core CPU with 8 gigabytes of RAM.You can see the availability of CPU platforms across our Managed Service for ClickHouse® areas and regions.
-
-
Choose the number of replicas. Let's keep it as is with a single replica.
-
Select the number of shards. Keep a single shard.
Understand shards and replicas
Shards refer to the servers that contain different parts of the data (to read all the data, you must access all the shards). Replicas are duplicating servers (to read all the data, you can access the data on any of the replicas).
-
-
Under Basic settings:
-
Enter the cluster Name, in this scenario -
tutorial-cluster
. -
From the Version drop-down list, select the ClickHouse® version the Managed ClickHouse® cluster will use. For most clusters, we recommend using the latest version.
-
-
Under Advanced:
-
Under Maintenance settings, select the scheduling type:
-
Arbitrary to delegate maintenance window selection to DoubleCloud. Usually, your cluster will perform maintenance procedure at the earliest available time slot.
Warning
We suggest not to use this scheduling type with single-host clusters, as it can lead to your cluster becoming unavailable at random.
-
By schedule to set the weekday and time (UTC) when DoubleCloud may perform maintenance on your cluster.
-
-
Under Networking → VPC, specify in which DoubleCloud VPC to locate your cluster. Use the
default
value in the previously selected region if you don't need to create this cluster in a specific network. -
Select the allocation for the ClickHouse Keeper service -
embedded
ordedicated
.We recommend using dedicated hosts for high-load production clusters. Dedicated ClickHouse Keeper hosts ensure that your production cluster's performance remains unaffected under heavy loads - they don't use its CPU or memory.
ClickHouse Keeper host location is irreversible
After creating the cluster, you won't be able to change the ClickHouse Keeper deployment type.
For dedicated ClickHouse Keeper hosts, select the appropriate resource preset. Please note that this resource preset will apply to all three hosts and will be billed accordingly.
-
Specify or adjust your cluster's DBMS settings. For more information, see the Settings reference.
-
-
Click Submit.
Your cluster will appear with the Creating
status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive
.
Click on the cluster to open its information page:
Tip
The DoubleCloud service creates the superuser admin
and its password automatically. You can find both the User and the Password in the Overview tab on the cluster information page.
To create users for other roles, see Manage ClickHouse® users
Connect to your cluster and create a database
-
Open your terminal.
-
(Optional) Start Docker
sudo service docker start
-
Pull the clickhouse-client
docker pull clickhouse/clickhouse-client
-
Open your terminal.
-
Connect to the ClickHouse® official DEB repository
sudo apt update && sudo apt install -y apt-transport-https ca-certificates dirmngr && \ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754 && \ echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list
-
Refresh the package list and install the clickhouse-client
sudo apt update && sudo apt install -y clickhouse-client
-
Open your terminal.
-
Connect to a ClickHouse® official RPM repository
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
-
Install the clickhouse-client
sudo yum install -y clickhouse-client
Warning
If you run a RedHat 7-based Linux distribution, including Cent OS 7, Oracle Linux 7 and others, you need to download and install trusted certificates and manually add the path to them in the clickhouse-client configuration file as follows:
-
Install the
root
certificate:curl https://letsencrypt.org/certs/isrg-root-x2-cross-signed.pem > \ /etc/pki/ca-trust/source/anchors/isrg-root-x2-cross-signed.pem
-
Install the
intermediate
certificate:curl https://letsencrypt.org/certs/lets-encrypt-r3-cross-signed.pem > \ /etc/pki/ca-trust/source/anchors/lets-encrypt-r3-cross-signed.pem
-
Update the list of trusted certificates:
sudo update-ca-trust
-
Locate your clickhouse-client configuration file (by default, you can find it at
/etc/clickhouse-client/config.xml
) and add the path to the certificates into the<openSSL>
section:<client> <!-- Used for connection to server's secure tcp port --> <loadDefaultCAFile>true</loadDefaultCAFile> <cacheSessions>true</cacheSessions> <disableProtocols>sslv2,sslv3</disableProtocols> <preferServerCiphers>true</preferServerCiphers> <caConfig>/etc/ssl/certs/ca-bundle.crt</caConfig> <!-- Use for self-signed: <verificationMode>none</verificationMode> --> <invalidCertificateHandler> <!-- Use for self-signed: <name>AcceptCertificateHandler</name> --> <name>RejectCertificateHandler</name> </invalidCertificateHandler> </client>
After you have installed the clickhouse-client, connect to your cluster:
-
Open your terminal and run the following command:
DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string>
The complete Docker command structuredocker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <cluster user name> \ --password <cluster user password> \ --port 9440
<Native interface connection string>
-
After you have connected to the cluster, run the CREATE
CREATE DATABASE IF NOT EXISTS db_for_postgres
After that, you can check the result by executing the following command:
SHOW DATABASES
The output will contain the name of your recently created database:
┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ db_for_postgres │ │ information_schema │ │ system │ └────────────────────┘
Inspect the AWS console
-
Open the AWS console
-
Select the
US East (N. Virginia) us-east-1
in the header: -
Go to the AWS VPC service page
-
From the list of options on the left, click Your VPCs under Virtual private cloud. This section lists the default VPC that we will use.
A default VPC has already attached and configured subnets, a security group, and a route table. In this scenario, we only need to configure the route table, and we leave other resources with their default configuration options.
We'll place the RDS instance from which to transfer the data in the default VPC in the US East (N. Virginia) us-east-1
region.
Peer AWS and DoubleCloud networks
Create peering connection in DoubleCloud
-
Go to the console.
-
In the list of services, select VPC.
-
On the Connections tab, click Create connection.
-
Specify the AWS account ID you want to connect to. You can find this numeric ID in the AWS console
-
Specify the AWS VPC ID. In the AWS console, find your default VPC and copy its ID. It looks as follows:
vpc-xxxxxxxxxxxxxxxxx
. -
Provide your AWS IPv4 CIDR. You can find it to the right from the VPC ID in the AWS console.
-
Leave the AWS IPv6 CIDR empty - we don't need it fro this scenario.
-
Select your VPC Region -
US East (N. Virginia) us-east-1
. -
From the drop-down menu, select your DoubleCloud Network to peer with.
-
Click Submit.
After you've created a connection, enable it on the AWS side. Note that it will take some time before the request appears on the AWS side.
Confirm peering on the AWS side
-
Go to the AWS VPC service page
-
Go to the Virtual private cloud → Peering connections section in the menu on the left:
-
Click your VPC Peering connection ID and choose Actions → Accept Request.
-
In the confirmation dialog, choose Accept request.
At this point, you've successfully peered your AWS and DoubleCloud networks. Now it's time to add a route to the peered VPC.
Configure a route table
Route tables
We'll use the default route table and add a route to the DoubleCloud VPC.
-
Go to the AWS VPC service page
-
Open the Virtual private cloud → Route Tables section from the menu on the left:
-
Select a route table associated with your VPC by the VPC ID.
-
In the Routes tab, click Edit routes.
-
Click Add route and specify its properties:
-
Destination to your DoubleCloud connection as a Managed IPv4 address in the CIDR notation:
-
Target is the ID of a peering connection in DoubleCloud. Click on this field, select Peering connection and then select your connection ID.
-
-
Click Save changes.
Now you've fully connected your AWS and DoubleCloud networks. The traffic flow is also configured and you can proceed to creating an RDS instance in AWS to transfer data from.
Create a PostgreSQL RDS instance in a peered network
Create an RDS instance
For this tutorial, you need a PostgreSQL database available from external networks.
Warning
Be sure to create a user with replication rights postgres
instead.
-
Go to the AWS RDS service page
N. Virginia: us-east-1
. -
Click Create database →
Create database
: -
Choose a database creation method -
Easy create
. -
Select the configuration properties:
-
Engine type - PostgreSQL
-
DB instance size -
Free tier
-
DB instance identifier - enter the name for your instance, for example
dc-tutorial
. -
Master username - a name of the master user. You can leave the default value -
postgres
. -
Check the Auto generate a password box. You'll see it in the credential details after the instance is created. This is the only time you will be able to view this password. Alternatively, you can specify it manually at this stage.
Tip
If you expand the View default settings for Easy create section, you'll see that the instance will be created in the default VPC for the selected region. The instance is also located in the default subnet and is associated with the default security group.
-
-
Click Create database and wait for a few minutes.
The RDS instance is created and running, but it also requires you to modify certain settings.
Create a new parameter group for the database
The Transfer service requires you to set two parameters in basic scenario to enable the logical replication. These parameters are rds.logical_replication
-
Click Parameter groups in the left navigation menu:
-
Click Create parameter group.
-
On the Parameter group details, specify the settings:
-
Parameter group family - select the value that corresponds to your PostgreSQL version. In this tutorial, it's
postgres13
. -
Type -
DB parameter group
. -
Group name -
tutorial-group
. -
Description -
Parameters for the DC tutorial
.
-
-
Click Create.
When you've created the group, modify it:
-
Click on the group name in the groups list -
tutorial-group
. -
Use the search box to find the
rds.logical_replication
parameter. -
Select it and click Edit parameters.
-
Expand the drop-down list in the Values column and select
1
. -
Click Save changes.
As soon as you do it, the
wal_level
parameter, that's also required, will be automatically set tological
.
When you have created the new parameter group, apply it to your RDS instance:
-
Click Databases in the navigation menu on the left:
-
On the databases overview page
dc-tutorial
. -
Click Modify in the top right corner.
-
Expand the lowermost Additional configuration section.
-
Under Database options → DB parameter group, select your parameter group -
tutorial group
. -
Click Continue.
-
Under Scheduling of modifications, select
Apply immediately
. -
Click Modify DB instance.
-
At this point, you've modified the parameters, but they're pending reboot. Do the following:
-
On the databases overview page, select your database.
-
Click Actions → Reboot:
-
Click Confirm to reboot your database and apply the parameter group change.
Wait a couple of minutes until the database status changes to
Available
.
Connect to the RDS instance
Now it's time to connect to your instance. In this tutorial, we use CLI tools, but you can use any tool of your choice.
-
On the databases overview page
dc-tutorial
. -
Make your instance accessible from external networks. They're not accessible by default.
-
Click Modify.
-
Under Connectivity, expand the Additional configuration section, and then select
Publicly accessible
. -
Click Continue.
-
Under Scheduling of modifications, select
Apply immediately
. -
Click Modify DB instance.
-
-
Install and use the PostgreSQL client:
DockerDEB-based-
Open your terminal.
-
(Optional) Start Docker
sudo service docker start
-
Pull the postgresql-client
docker pull jbergknoff/postgresql-client
-
Connect to your database. Use the following properties:
-
The Endpoint field value from the database overview page as an address. It should start with your RDS instance name (
dc-tutorial
) and end with.rds.amazonaws.com
. -
The 5432 port that is opened by default.
-
Your master user name (
postgres
by default). -
The
-W
flag allows you to specify the password before sending a request to the server.
docker run -it --rm jbergknoff/postgresql-client \ postgresql://<username>:<Password>@<host address>:5432
-
-
Open your terminal.
-
Install the PostgreSQL client with the following command:
sudo apt install postgresql-client
-
Connect to your RDS instance. Use the following properties:
-
The Endpoint field value from the database overview page as an address. It should start with your RDS instance name (
dc-tutorial
) and end with.rds.amazonaws.com
. -
The 5432 port that's opened by default.
-
Your master user name (
postgres
by default). -
The
-W
flag allows you to specify the password before sending a request to the server.
psql -h <host address> -p 5432 -U <username> -W
-
As soon as you see the following prompt in your terminal, the database is ready for operation:
postgres=>
-
-
Check if all the required parameters are configured correctly. Connect to your database and send the following query:
SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
You should see the following output:
name | setting ------------------------+--------- rds.logical_replication | on wal_level | logical
Create a table and upload data
-
Create a table with the structure similar to the source dataset's structure. It must have a primary key, in this example it's the
order_id
field.CREATE TABLE sales_info ON CLUSTER default ( region text, country text, item_type text, order_date date, order_id int PRIMARY KEY, units_sold int, unit_price float(2), unit_cost float(2), total_revenue float(2), total_cost float(2), total_profit float(2) );
-
Finish the current terminal session by typing
\q
or open a new terminal instance. -
Run the following command to fetch the data from our S3 bucket and copy it to the previously created table:
DockerDEB-basedcurl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sales_info.csv \ | docker run -i --rm jbergknoff/postgresql-client \ postgresql://<username>:<Password>@<host address>:5432 \ -c "\copy sales_info FROM STDIN WITH (DELIMITER ';', FORMAT CSV, header);"
curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sales_info.csv \ | psql -h <host address> -p 5432 -U <username> -W \ -c "\copy sales_info FROM STDIN WITH (DELIMITER ';', FORMAT CSV, header);"
-
Send the
SELECT
query to check if the data is imported successfully:DockerDEB-baseddocker run -it --rm jbergknoff/postgresql-client \ postgresql://<username>:<Password>@<host address>:5432 \ -c "SELECT * FROM sales_info LIMIT 5;"
psql -h <host address> -p 5432 -U <username> -W -c "SELECT * FROM sales_info LIMIT 5;"
You should see the following output:
REGION │ COUNTRY │ ITEM_TYPE │ ORDER_DATE │ ORDER_ID │ UNITS_SOLD │ UNIT_PRICE │ UNIT_COST │ TOTAL_REVENUE │ TOTAL_COST │ TOTAL_PROFIT ----------------------------------+---------------------+-----------+------------+-----------+------------+------------+-----------+---------------+------------+------------- Central America and the Caribbean │ Antigua and Barbuda | Baby Food │ 2013-12-20 │ 957081544 │ 552 │ 255.28 │ 159.42 │ 140914.56 │ 87999.84 │ 52914.72 Central America and the Caribbean │ Panama │ Snacks │ 2010-07-05 │ 301644504 │ 2167 │ 152.58 │ 97.44 │ 330640.88 │ 211152.48 │ 119488.38 Europe | Czech Republic │ Beverages │ 2011-09-12 | 478051030 │ 4778 │ 47.45 │ 31.79 │ 226716.1 │ 151892.62 │ 74823.48 Asia | South Korea | Cereal │ 2010-05-13 │ 892599952 │ 9016 │ 205.7 │ 117.11 │ 1.8545912e+06 │ 1.0558638e+06 │ 798727.44 Asia │ Sri Lanka │ Snacks │ 2015-07-20 │ 571902596 │ 7542 | 152.58 | 97.44 | 1.1507584e+06 | 734892.5 | 415865.88
-
Press
q
to exit the table view mode and then type\q
to terminate thepsql
session.
Transfer the data
Now it's time to set up the tools to get the data from a remote source and transfer it to your database in the Managed ClickHouse® cluster.
Tip
Before you start configure Transfer, ensure that you have all the actions from the previous steps performed.
-
You have a Managed ClickHouse® cluster with the
db_for_postgres
database. -
You have a peering connection between your AWS and DoubleCloud networks.
-
Check the Connections page
-
In the AWS console, check the settings of your peering connection
us-east-1
region. -
Also check the route in your route table
-
-
Check the settings of your RDS instance. Pay attention to the instance parameters.
-
On your instance's overview page, go to the Configuration tab
If you still have a default parameter group and it's in the Pending reboot status, reboot your instance.
- Ensure that the
rds.logical_replication
setting in yourtutorial-group
parameter group is enabled.
-
-
After making sure that all the resources are created and settings have the correct values, proceed to the next step.
Create a source endpoint
-
Go to the Transfer
-
Select Endpoints tab, click Create endpoint → Source.
-
Select
PostgreSQL
as the Source type. -
Under Basic settings:
- Enter the Name of the endpoint:
rds-postgres
.
- Enter the Name of the endpoint:
-
Specify endpoint parameters under Endpoint settings:
-
Under Connection settings → On-premise, specify the connection options:
-
IP or FQDN of the host. Paste the host address from the database overview page. It should start with your database name (
dc-tutorial
) and end with.rds.amazonaws.com
. -
Database port -
5432
. -
CA certificate - upload the certificate as text or upload a file. You can obtain the certificate for your region on the corresponding AWS documentation page
-
-
Enter the Database name. We use the default database
Postgres
in this tutorial.How to list your RDS instance databases-
Connect to your RDS instance as shown in the previous sections.
-
Type
\l
.You should see the following output:
List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | rdsadmin | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin+ | | | | | rdstopmgr=Tc/rdsadmin sales | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | rdsadmin | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin + | | | | | rdsadmin=CTc/rdsadmin template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres
-
-
Specify the Username of the user with the replication role. In this scenario, we've created the master user
postgres
. -
Under Password, enter this user's password.
The filled-out form looks as follows:
-
-
Optionally, you can test your source endpoint:
After configuring your endpoint, click Test. You'll see an endpoint test dialog:
You can use two runtime types for connection testing - Dedicated and Serverless.
Runtime compatibility warning
Don't use endpoints with different runtime types in the same transfer - this will cause it to fail.
DedicatedThe Transfer service uses this runtime to connect to your data source via an internal or external network.
This runtime is useful when you need to use a specific network - it may be an external network or an internal one with a peer connection.
To run the test with this runtime:
-
Under Runtime, select Dedicated.
-
From the drop-down list, select the network to use to connect to your data source.
-
Click Test connection.
After completing the test procedure, you'll see a list of the endpoint's data sources. For Apache Kafka® endpoints, you'll also see data samples for each data source.
ServerlessThe Transfer service uses this runtime to connect to your data sources available from the internet via an automatically defined network.
Use this runtime to test an endpoint to a data source located outside isolated networks.
To run the test with this runtime:
-
Under Runtime, select Serverless.
-
Click Test.
After completing the test procedure, you'll see a list of the endpoint's data sources. For Apache Kafka® endpoints, you'll also see data samples for each data source.
Warning
Please be patient - testing may take up to several minutes.
-
-
Optionally, you can test your source endpoint:
After configuring your endpoint, click Test. You'll see an endpoint test dialog:
You can use two runtime types for connection testing - Dedicated and Serverless.
Runtime compatibility warning
Don't use endpoints with different runtime types in the same transfer - this will cause it to fail.
DedicatedThe Transfer service uses this runtime to connect to your data source via an internal or external network.
This runtime is useful when you need to use a specific network - it may be an external network or an internal one with a peer connection.
To run the test with this runtime:
-
Under Runtime, select Dedicated.
-
From the drop-down list, select the network to use to connect to your data source.
-
Click Test connection.
After completing the test procedure, you'll see a list of the endpoint's data sources. For Apache Kafka® endpoints, you'll also see data samples for each data source.
ServerlessThe Transfer service uses this runtime to connect to your data sources available from the internet via an automatically defined network.
Use this runtime to test an endpoint to a data source located outside isolated networks.
To run the test with this runtime:
-
Under Runtime, select Serverless.
-
Click Test.
After completing the test procedure, you'll see a list of the endpoint's data sources. For Apache Kafka® endpoints, you'll also see data samples for each data source.
Warning
Please be patient - testing may take up to several minutes.
-
-
Click Submit.
The transmitter is ready to go. We need to create an endpoint that will receive the data from a remote source.
Create a target endpoint
-
Go to the Transfer
-
Select the Endpoints tab, click Create endpoint → Target.
-
Select ClickHouse® as the Target type.
-
Under Basic settings:
-
Enter the Name of the endpoint:
target-for-postgres
-
(optional) Enter a Description of the endpoint.
-
-
Specify endpoint parameters under Endpoint settings:
-
Select connection type. This tutorial transfers data to the Managed cluster.
-
Specify the connection properties:
-
Under Managed cluster, select your cluster name (
tutorial-cluster
) from the drop-down list. -
Specify the User of the database:
admin
. -
Enter the Password of the database user.
-
Specify the Database name you want to transfer the data to:
db_for_postgres
.
-
-
Under Cleanup policy, select
Drop
.
The filled-out form looks as follows:
-
-
Leave all the other fields blank or with their default values.
-
Click Submit.
Good work. We've created an endpoint that will receive and write the data to your Managed ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.
Create and activate a transfer
-
Go to the Transfer
-
Click Create transfer.
-
Under Endpoints:
-
Select
rds-postgres-source
from the Source drop-down menu. -
Select
target-for-postgres
from the Target.
-
-
Under Basic settings:
-
Enter the transfer Name:
postgres-to-clickhouse
. -
(optional) Enter the transfer Description.
-
-
Under Transfer settings, select the Transfer type. In this use case, we choose
Snapshot and replication
. -
Under Runtime, select
Dedicated
. This runtime type is designed to transfer from peered networks. When you select this type, DoubleCloud allocates a dedicated virtual machine for your transfer.Under Dedicated, set the virtual machine properties:
-
VM performance -
small
. It allocates 2 CPUs to your virtual machine. -
Minimum logging level -
Warn
. -
Settings -
Automatic
. In this scenario, we created the Managed ClickHouse® cluster in the same region as the RDS instance. When you select theAutomatic
settings here, the DoubleCloud service creates an EC2 instance in the same region as the target cluster -US East (N. Virginia) us-east-1
.
-
-
Leave all the other fields blank or with their default values.
-
Click Submit.
-
After you've created a transfer, click
-
Wait until your transfer status changes to
Done
.
Now the service has transferred the data, and you can go to your Managed ClickHouse® cluster to check the result of the transfer.
Query the data in the Managed ClickHouse® cluster
-
Open your terminal.
-
Send a query with the Native interface string from your cluster's Overview tab and type the following command:
DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string> --query="SHOW TABLES FROM db_for_postgres"
The complete Docker command structuredocker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <your cluster username> \ --password <Cluster user password> \ --port 9440 --query="SHOW TABLES FROM db_for_postgres"
<Native interface connection string> --query="SHOW TABLES FROM db_for_postgres"
The complete structure of the Native interface connection stringclickhouse-client --host <FQDN of the host> \ --secure \ --user <Cluster user name> \ --password <Cluster user password> \ --port 9440
The output will contain the
sales_info
table from the PostgreSQL RDS database. It has thepublic
prefix from the PostgreSQL schema.┌─name─────────────────────┐ │ public_sales_info │ └──────────────────────────┘
-
Let's send a
SELECT
query to thepublic_sales_info
table:DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string> --query="SELECT * FROM db_for_postgres.public_sales_info LIMIT 20"
The complete Docker command structuredocker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <your cluster username> \ --password <Cluster user password> \ --port 9440
<Native interface connection string> --query="SELECT * FROM db_for_postgres.public_sales_info LIMIT 20"
The complete structure of the Native interface connection stringclickhouse-client --host <FQDN of the host> \ --secure \ --user <Cluster user name> \ --password <Cluster user password> \ --port 9440
This query will return the same data as we previously had in the PostgreSQL database with additional technical columns:
┌─order_id──┬─region────────────────────────────┬─country────┬─item_type──┬─order_date───────────┬─units_sold─┬─unit_price─────────┬──────────unit_cost─┬─total_revenue─┬─────total_cost─┬────total_profit─┬─data_transfer_commit_time─┬─data_transfer_delete_time─┐ │ 100640618 │ Europe │ Norway │ Cereal │ 2014-10-08T00:00:00Z │ 650 │ 205.6999969482422 │ 117.11000061035156 │ 133705 │ 76121.5 │ 57583.5 │ 1656595919141050000 │ 0 │ │ 101315677 │ Sub-Saharan Africa │ Mali │ Vegetables │ 2013-11-07T00:00:00Z │ 415 │ 154.05999755859375 │ 90.93000030517578 │ 63934.8984375 │ 37735.94921875 │ 126198.94921875 │ 1656595919141050000 │ 0 │ │ 103258454 │ Middle East and North Africa │ Azerbaijan │ Baby Food │ 2017-06-28T00:00:00Z │ 5326 │ 255.27999877929688 │ 159.4199981689453 │ 1359621.25 │ 849070.9375 │ 510550.375 │ 1656595919141050000 │ 0 │ │ 103281629 │ Sub-Saharan Africa │ Botswana │ Vegetables │ 2013-10-19T00:00:00Z │ 2184 │ 154.05999755859375 │ 90.93000030517578 │ 336467.03125 │ 198591.125 │ 137875.921875 │ 1656595919141050000 │ 0 │ │ 103832662 │ Europe │ Croatia │ │ │ 5247 │ 255.27999877929688 │ 159.4199981689453 │ 1339454.125 │ 836476.75 │ 502977.40625 │ 1656595919141050000 │ 0 │ │ 104650199 │ Sub-Saharan Africa │ Seychelles │ Cereal │ 2014-08-20T00:00:00Z │ 7174 │ 205.6999969482422 │ 117.11000061035156 │ 1475691.75 │ 840147.125 │ 635544.6875 │ 1656595919141050000 │ 0 │ │ 105630385 │ Central America and the Caribbean │ Saint Ki
Great job! You have successfully transferred data from and RDS instance to the Managed ClickHouse® cluster. See the links below to find more info about each used service we used in this scenario.