Replicate data from MongoDB to ClickHouse®
- Before you start
- Step 1. Enable access to the MongoDB cluster
- Step 2. Create a Managed ClickHouse® cluster
- Step 3. Create a ClickHouse® database
- Step 4. Create a source MongoDB endpoint
- Step 5. Create a target ClickHouse® endpoint
- Step 6. Create a transfer
- Step 7. Activate the transfer and query transferred data
- What’s next
MongoDB
To enhance the performance of MongoDB, you can add ClickHouse® to the mix and offload analytical queries to the database that’s designed to be fast and efficient. By replicating data from MongoDB to ClickHouse® and running analytical queries there, you can benefit from the flexibility of MongoDB and the speed of ClickHouse® — all of this at a lower cost.
This tutorial guides you through replicating data from MongoDB to ClickHouse® with DoubleCloud Transfer — a real-time data replication tool. You will learn how to create a ClickHouse® cluster, set up a data pipeline, apply data transformations, and run queries.
Before you start
-
If you haven’t already, create a DoubleCloud account
-
If you don’t have a MongoDB instance with data that you can use for this tutorial:
Step 1. Enable access to the MongoDB cluster
Before DoubleCloud Transfer can connect to your MongoDB database, configure networks connection rules and create a database user.
-
Create a database user
Tip
Take note of the username and password. You may not be able to access the password later.
-
Configure a network connection
3.77.1.232/32 3.74.181.206/32 3.125.212.122/32 3.77.29.32/32 3.68.105.66/32 52.59.249.9/32 18.184.232.195/32 3.76.190.60/32
Tip
If you’re just testing the integration and your MongoDB cluster contains only sample data, you can allow access from any IP address by adding
0.0.0.0/0
as an access list entry and marking the entry as temporary in the dialog.Don’t do that if you’re connecting to a production database with real data.
-
Get a connection string
Step 2. Create a Managed ClickHouse® cluster
-
Go to the Clusters
-
Click Create cluster at the top right and select ClickHouse.
Tip
For this tutorial, you don’t need to modify cluster settings on this page. If you proceed with the default settings, you get a fully functional ClickHouse® cluster that you can use for testing and development.
When creating a production cluster, refer to Create a Managed ClickHouse® cluster for a list of ways to create and configure a cluster.
-
Under Basic settings, enter a cluster name, such as
clickhouse-dev
. -
Click Submit.
Creating a cluster takes around five minutes depending on the provider, region, and settings.
Step 3. Create a ClickHouse® database
-
After the cluster status changes from Creating to Alive, select it in the cluster list.
-
Click WebSQL at the top right.
-
In WebSQL, click on any database in the connection manager on the left to open the query editor.
-
Create a new database named
listings
using the following command:CREATE DATABASE IF NOT EXISTS listings ON CLUSTER default;
-
Check that the database has been created:
SHOW DATABASES
┌─name───────────────┐ │ INFORMATION_SCHEMA │ │ _system │ │ default │ │ listings │ // your database │ information_schema │ │ system │ └────────────────────┘
Step 4. Create a source MongoDB endpoint
To create a source MongoDB endpoint:
-
Go to the Transfer
-
In Source type, select MongoDB.
-
Enter an endpoint name, such as
mongo-source-dev
. -
In connection type select SRV.
-
Fill out the connection details using the SRV connection string you copied earlier. The connection string has the following format:
mongodb+srv://<username>:<password>@<hostname>/<database>
-
Under Collection filter, add
listings
in Included collections. -
Click Submit.
Step 5. Create a target ClickHouse® endpoint
To create a target endpoint:
-
On the Transfer page, click Create → Target endpoint.
-
In Target type, select ClickHouse.
-
Enter an endpoint name, such as
clickhouse-target-dev
. -
In Connection type, select Managed cluster and select the cluster that you created from the dropdown.
You can leave the default values in other fields.
-
Click Submit.
Step 6. Create a transfer
Now that your endpoints are ready, you can create a transfer.
-
On the Transfer page, click Create → Transfer at the top right.
-
Under Endpoints, select the endpoints you just created — mongodb-source-dev and clickhouse-target-dev as the source and target respectively.
-
Under Basic settings, enter a transfer name, such as
transfer-dev
. -
In Transfer type, select Snapshot and replication.
This transfer type transfers the current state of the source database, and then keeps all the added data in sync.
-
Leave the default preselected options in Snapshot settings and Runtime environment
-
In List of objects for transfer, click + Object and enter the name of the collection you want to transfer —
sample_airbnb.listingsAndReviews
. -
Click + Transformation.
-
In the Transformer block that appeared, select SQL in the dropdown.
-
In the SQL field, enter a query that transforms and flattens BSON documents to allow them to be added to ClickHouse® columns. For example, to extract several fields from documents in the sample Airbnb listing collection, use the following query:
SELECT _id, JSON_VALUE(assumeNotNull(document), '$.listing_url') AS listing_url, JSON_VALUE(assumeNotNull(document), '$.name') AS name, JSON_VALUE(assumeNotNull(document), '$.property_type') AS property_type, JSONExtractRaw(assumeNotNull(document), 'amenities') AS amenities, JSON_VALUE(assumeNotNull(document), '$.price') AS price, JSON_VALUE(assumeNotNull(document), '$.address.country') AS country, JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[0]') AS lat, JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[1]') AS lng FROM table
Tip
This query processes data on the fly using ClickHouse® JSON functions, such as
JSON_VALUE
orJSONExtractRaw
. To access nested values in MongoDB documents, use the dot notation. Learn more about using the SQL transformer- Click Submit.
Step 7. Activate the transfer and query transferred data
Now that you transfer is fully configured, you can use it to replicate data from MongoDB to ClickHouse®.
-
On the transfer page, click Activate at the top right.
After the transfer is activated, it fetches data from the source database, applies transformations to the data, and uploads it to the target database.
The initial activation may take a few minutes. When some data is successfully replicated, you can see green bars in the Status history section:
-
Go to the Clusters
-
Select the cluster you created earlier.
-
Click WebSQL at the top right.
-
In the Connection manager on the left, select the listings database.
-
In the query editor, run the following statement:
SELECT * FROM sample_airbnb_listingsAndReviews LIMIT 100;
-
The query output is displayed in the Results section: