This use case covers a scenario when you have a PostgreSQL database and want to use DoubleCloud to migrate data from it to a Managed ClickHouse® cluster and visualize it.
This tutorial shows how to peer your DoubleCloud and AWS networks. Most sections describe AWS-specific actions such as the configuration of security groups, route tables, and the RDS instance. For demonstration purposes, we'll use a default VPC for an AWS region. The DoubleCloud network will contain a Managed ClickHouse® cluster and the AWS one will have an PostgreSQL RDS instance from where we'll transfer data.
The process scheme is the following:
Tip
This tutorial shows the use of PostgreSQL in RDS. DoubleCloud also allows you to transfer data from Amazon Aurora in a similar way.
From the list of options on the left, click Your VPCs under Virtual private cloud. This section lists the default VPC that we will use.
A default VPC has already attached and configured subnets, a security group, and a route table.
For this tutorial, you only need to configure the route table.
Leave other resources with their default configuration options.
You'll place the RDS instance in the default VPC in the US East (N. Virginia) us-east-1 region.
Step 3. Peer AWS and DoubleCloud networks
Step 3.1. Create a peering connection in DoubleCloud
Open the Virtual private cloud → Route Tables section from the menu on the left:
Select a route table associated with the subnet you want to access clusters from.
In the Routes tab, click Edit routes.
Click Add route and specify its properties:
Destination to your DoubleCloud connection as a Managed IPv4 address in the CIDR notation:
Target is the ID of a peering connection in DoubleCloud. Click on this field, select Peering connection and then select your connection ID.
Click Save changes.
Now you've fully connected your AWS and DoubleCloud networks. The traffic flow is also configured and you can proceed to creating an RDS instance in AWS to transfer data from.
Step 4. Create a PostgreSQL RDS instance in a peered network
Step 4.1. Create an RDS instance
For this tutorial, you need a PostgreSQL database available from external networks.
Warning
Be sure to create a user with replication rights in real scenarios
since it's necessary for the data transfer.
Note that we don't create a user in this scenario and transfer data under the root user postgres instead.
Go to the AWS RDS service page .
Make sure that you are still in the right region - N. Virginia: us-east-1.
Click Create database → Create database:
Under Choose a database creation method, select Easy create.
Specify the following configuration properties:
Engine type: PostgreSQL
DB instance size: Free tier
DB instance identifier: enter the name for your instance, such as dc-tutorial.
Master username: enter the master username.
You can leave the default value of postgres.
Check Auto generate a password.
You'll see the password in the credential details after the instance is created.
This is the only time when you can view this password.
You can also enter your own password instead of generating a new one.
Tip
If you expand the View default settings for Easy create section, you'll see that the instance will be created in the default VPC for the selected region. The instance is also located in the default subnet and is associated with the default security group.
Click Create database and wait until it's created.
It usually takes a few minutes.
The RDS instance is created and running, but it also requires you to modify certain settings.
Step 4.2. Create a new parameter group for the database
Transfer requires you to set two parameters in the basic scenario to enable the logical replication.
These parameters are rds.logical_replication
and wal_level .
To modify these parameters, create a separate parameter group:
In the left navigation menu, click Parameter groups:
Click Create parameter group.
On the Parameter group details, specify the settings:
Parameter group family: select the value that corresponds to your PostgreSQL version.
In this tutorial, it's postgres13.
Type: DB parameter group
Group name: tutorial-group.
Description: Parameters for the DoubleCloud tutorial.
Click Create.
After you've created the group, modify it:
Click on the group name in the groups list - tutorial-group.
Use the search box to find the rds.logical_replication parameter.
Select it and click Edit parameters.
Expand the drop-down list in the Values column and select 1.
Click Save changes.
As soon as you do it, the wal_level parameter, that's also required, will be automatically set to logical.
When you have created the new parameter group, apply it to your RDS instance:
Click Databases in the navigation menu on the left:
Connect to your database. Use the following properties:
The Endpoint field value from the database overview page as an address. It should start with your RDS instance name (dc-tutorial) and end with .rds.amazonaws.com.
The 5432 port that is opened by default.
Your master user name (postgres by default).
The -W flag allows you to specify the password before sending a request to the server.
docker run -it --rm jbergknoff/postgresql-client \
postgresql://<username>:<Password>@<host address>:5432
Open your terminal.
Install the PostgreSQL client with the following command:
sudo apt install postgresql-client
Connect to your RDS instance. Use the following properties:
The Endpoint field value from the database overview page as an address. It should start with your RDS instance name (dc-tutorial) and end with .rds.amazonaws.com.
The 5432 port that's opened by default.
Your master user name (postgres by default).
The -W flag allows you to specify the password before sending a request to the server.
psql -h <host address> -p 5432 -U <username> -W
As soon as you see the following prompt in your terminal, the database is ready for operation:
postgres=>
Check if all the required parameters are configured correctly.
Connect to your database and execute the following query:
SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
name | setting
------------------------+---------
rds.logical_replication | on
wal_level | logical
Step 4.3. Create a table and upload data
Create a table with the structure similar to the source dataset's structure. It must have a primary key, in this example it's the order_id field.
CREATETABLE sales_info ON CLUSTER default (
region text,
country text,
item_type text,
order_date date,
order_id intPRIMARY KEY,
units_sold int,
unit_price float(2),
unit_cost float(2),
total_revenue float(2),
total_cost float(2),
total_profit float(2)
);
Finish the current terminal session by typing \q or open a new terminal instance.
Run the following command to fetch the data from our S3 bucket and copy it to the previously created table:
Docker
DEB-based
curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sales_info.csv \
| docker run -i --rm jbergknoff/postgresql-client \
postgresql://<username>:<Password>@<host address>:5432 \
-c "\copy sales_info FROM STDIN WITH (DELIMITER ';', FORMAT CSV, header);"
curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sales_info.csv \
| psql -h <host address> -p 5432 -U <username> -W \
-c "\copy sales_info FROM STDIN WITH (DELIMITER ';', FORMAT CSV, header);"
Send the SELECT query to check if the data is imported successfully:
Docker
DEB-based
docker run -it --rm jbergknoff/postgresql-client \
postgresql://<username>:<Password>@<host address>:5432 \
-c "SELECT * FROM sales_info LIMIT 5;"
Specify endpoint parameters under Endpoint settings:
Under Connection settings → On-premise, specify the connection options:
IP or FQDN of the host. Paste the host address from the database overview page. It should start with your database name (dc-tutorial) and end with .rds.amazonaws.com.
Database port - 5432.
CA certificate - upload the certificate as text or upload a file. You can obtain the certificate for your region on the corresponding AWS documentation page .
Enter the Database name. We use the default database Postgres in this tutorial.
How to list your RDS instance databases
Connect to your RDS instance as shown in the previous sections.
Select the Endpoints tab, click Create endpoint → Target.
Select ClickHouse® as the Target type.
Under Basic settings:
Enter the Name of the endpoint: target-for-postgres
(optional) Enter a Description of the endpoint.
Specify endpoint parameters under Endpoint settings:
Select connection type. This tutorial transfers data to the Managed cluster.
Specify the connection properties:
Under Managed cluster, select your cluster name (tutorial-cluster) from the drop-down list.
Specify the User of the database: admin.
Enter the Password of the database user.
Specify the Database name you want to transfer the data to: db_for_postgres.
Under Cleanup policy, select Drop.
The filled-out form looks as follows:
Leave all the other fields blank or with their default values.
Click Submit.
Good work. We've created an endpoint that will receive and write the data to your Managed ClickHouse® database. All we need now is a tool that will connect both endpoints and transfer the data.
Enter a transfer name, such as postgres-to-clickhouse.
(Optional) Enter a transfer description.
Under Transfer type, select Snapshot and replication.
Under Runtime environment, select Dedicated.
Tip
This runtime type is designed to transfer from
peered networks.
When you select this type, DoubleCloud allocates a dedicated virtual machine for your transfer.
Under Dedicated, set the virtual machine properties:
VM size: small.
It allocates 2 CPUs to your virtual machine.
Settings: Automatic.
In this tutorial, you created the Managed ClickHouse® cluster in the same region as the RDS instance.
When you select the automatic settings,
DoubleCloud creates an EC2 instance in the same region as the target cluster — US East (N. Virginia) us-east-1.
Leave all the other fields blank or with their default values.
Click Submit.
After you've created a transfer,
click
→ Activate.
Wait until your transfer status changes to Done.
Now that Transfer has replicated the data, and you can connect to your Managed ClickHouse® cluster and run queries.
Step 6. Query data in the Managed ClickHouse® cluster
Open the terminal.
Send a query with the Native interface string from your cluster's Overview tab and type the following command:
Docker
Native clickhouse-client
docker run --network host --rm -it
clickhouse/<Native interface connection string>
--query="SHOW TABLES FROM db_for_postgres"
The complete Docker command structure
docker run --network host --rm -it \
clickhouse/clickhouse-client \
--host <FQDN of your cluster> \
--secure \
--user <your cluster username> \
--password <Cluster user password> \
--port 9440
--query="SHOW TABLES FROM db_for_postgres"
<Native interface connection string>
--query="SHOW TABLES FROM db_for_postgres"
The complete structure of the Native interface connection string
clickhouse-client --host <FQDN of the host> \
--secure \
--user <Cluster user name> \
--password <Cluster user password> \
--port 9440
The output will contain the sales_info table from the PostgreSQL RDS database. It has the public prefix from the PostgreSQL schema.
Great job! You have successfully transferred data from and RDS instance to the Managed ClickHouse® cluster. See the links below to find more info about each used service we used in this scenario.