MongoDB
is a NoSQL document-oriented database management system
that can efficiently store large amounts of unstructured or semi-structured data.
It’s used in various applications because of its versatility and scalability,
yet it may not be an ideal choice in every case.
For example, running analytical queries in
MongoDB may require adding computational resources
and can take 50+ times longer compared to OLAP databases like ClickHouse®.
To enhance the performance of MongoDB,
you can add ClickHouse® to the mix and offload analytical queries to the database that’s designed to be fast and efficient.
By replicating data from MongoDB to ClickHouse® and running analytical queries there,
you can benefit from the flexibility of MongoDB and the speed of ClickHouse® — all of this at a lower cost.
This tutorial guides you through replicating data from MongoDB to ClickHouse®
with DoubleCloud Transfer — a real-time data replication tool.
You will learn how to create a ClickHouse® cluster, set up a data pipeline, apply data transformations, and run queries.
Before DoubleCloud Transfer can connect to your MongoDB database,
configure networks connection rules and create a database user.
Create a database user
in your cluster.
Select the Username and password as the authentication method.
Make sure this user has sufficient permissions to read data in the cluster.
Tip
Take note of the username and password. You may not be able to access the password later.
If you’re just testing the integration and your MongoDB cluster contains only sample data,
you can allow access from any IP address
by adding 0.0.0.0/0 as an access list entry
and marking the entry as temporary in the dialog.
Don’t do that if you’re connecting to a production database with real data.
Get a connection string .
Select Drivers as the connection option and copy the string from the dialog.
You will need this string when configuring the source endpoint later in this tutorial.
Click Create cluster at the top right and select ClickHouse.
Tip
For this tutorial, you don’t need to modify cluster settings on this page.
If you proceed with the default settings,
you get a fully functional ClickHouse® cluster that you can use for testing and development.
Under Collection filter, add listings in Included collections.
Click Submit.
Step 5. Create a target ClickHouse® endpoint
To create a target endpoint:
On the Transfer page, click Create → Target endpoint.
In Target type, select ClickHouse.
Enter an endpoint name, such as clickhouse-target-dev.
In Connection type,
select Managed cluster and select the cluster that you created from the dropdown.
You can leave the default values in other fields.
Click Submit.
Step 6. Create a transfer
Now that your endpoints are ready, you can create a transfer.
On the Transfer page, click Create → Transfer at the top right.
Under Endpoints,
select the endpoints you just created —
mongodb-source-dev and clickhouse-target-dev as the source and target respectively.
Under Basic settings,
enter a transfer name, such as transfer-dev.
In Transfer type,
select Snapshot and replication.
This transfer type transfers the current state of the source database,
and then keeps all the added data in sync.
Leave the default preselected options in Snapshot settings and Runtime environment
In List of objects for transfer,
click + Object and enter the name of the collection you want to transfer — sample_airbnb.listingsAndReviews.
Click + Transformation.
In the Transformer block that appeared,
select SQL in the dropdown.
In the SQL field,
enter a query that transforms and flattens BSON documents to allow them to be added to ClickHouse® columns.
For example, to extract several fields from documents in the sample Airbnb listing collection,
use the following query:
SELECT _id,
JSON_VALUE(assumeNotNull(document), '$.listing_url') AS listing_url,
JSON_VALUE(assumeNotNull(document), '$.name') AS name,
JSON_VALUE(assumeNotNull(document), '$.property_type') AS property_type,
JSONExtractRaw(assumeNotNull(document), 'amenities') AS amenities,
JSON_VALUE(assumeNotNull(document), '$.price') AS price,
JSON_VALUE(assumeNotNull(document), '$.address.country') AS country,
JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[0]') AS lat,
JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[1]') AS lng
FROMtable
Tip
This query processes data on the fly using ClickHouse® JSON functions, such as JSON_VALUE or JSONExtractRaw.
To access nested values in MongoDB documents, use the dot notation.
Learn more about using the SQL transformer
Click Submit.
Step 7. Activate the transfer and query transferred data
Now that you transfer is fully configured, you can use it to replicate data from MongoDB to ClickHouse®.
On the transfer page, click Activate at the top right.
After the transfer is activated,
it fetches data from the source database,
applies transformations to the data, and uploads it to the target database.
The initial activation may take a few minutes.
When some data is successfully replicated, you can see green bars in the Status history section: