Replicate data from MongoDB to ClickHouse®

MongoDB is a NoSQL document-oriented database management system that can efficiently store large amounts of unstructured or semi-structured data. It’s used in various applications because of its versatility and scalability, yet it may not be an ideal choice in every case. For example, running analytical queries in MongoDB may require adding computational resources and can take 50+ times longer compared to OLAP databases like ClickHouse®.

To enhance the performance of MongoDB, you can add ClickHouse® to the mix and offload analytical queries to the database that’s designed to be fast and efficient. By replicating data from MongoDB to ClickHouse® and running analytical queries there, you can benefit from the flexibility of MongoDB and the speed of ClickHouse® — all of this at a lower cost.

This tutorial guides you through replicating data from MongoDB to ClickHouse® with DoubleCloud Transfer — a real-time data replication tool. You will learn how to create a ClickHouse® cluster, set up a data pipeline, apply data transformations, and run queries.

Before you start

  1. If you haven’t already, create a DoubleCloud account .

  2. If you don’t have a MongoDB instance with data that you can use for this tutorial:

    1. Create a MongoDB Atlas cluster .

    2. Load a sample AirBnB listings dataset into it.

Step 1. Enable access to the MongoDB cluster

Before DoubleCloud Transfer can connect to your MongoDB database, configure networks connection rules and create a database user.

  1. Create a database user in your cluster. Select the Username and password as the authentication method. Make sure this user has sufficient permissions to read data in the cluster.

    Tip

    Take note of the username and password. You may not be able to access the password later.

    Screenshot showing database access configuration in MongoDB Atlas settings

  2. Configure a network connection and add Transfer’s IP addresses:

    3.77.1.232/32
    3.74.181.206/32
    3.125.212.122/32
    3.77.29.32/32
    3.68.105.66/32
    52.59.249.9/32
    18.184.232.195/32
    3.76.190.60/32
    

    Tip

    If you’re just testing the integration and your MongoDB cluster contains only sample data, you can allow access from any IP address by adding 0.0.0.0/0 as an access list entry and marking the entry as temporary in the dialog.

    Don’t do that if you’re connecting to a production database with real data.

    Screenshot showing network access configuration in MongoDB Atlas settings

  3. Get a connection string . Select Drivers as the connection option and copy the string from the dialog. You will need this string when configuring the source endpoint later in this tutorial.

Step 2. Create a Managed ClickHouse® cluster

  1. Go to the Clusters page in the console.

  2. Click Create cluster at the top right and select ClickHouse.

    Tip

    For this tutorial, you don’t need to modify cluster settings on this page. If you proceed with the default settings, you get a fully functional ClickHouse® cluster that you can use for testing and development.

    When creating a production cluster, refer to Create a Managed ClickHouse® cluster for a list of ways to create and configure a cluster.

  3. Under Basic settings, enter a cluster name, such as clickhouse-dev.

  4. Click Submit.

    Creating a cluster takes around five minutes depending on the provider, region, and settings.

Step 3. Create a ClickHouse® database

  1. After the cluster status changes from Creating to Alive, select it in the cluster list.

  2. Click WebSQL at the top right.

  3. In WebSQL, click on any database in the connection manager on the left to open the query editor.

  4. Create a new database named listings using the following command:

    CREATE DATABASE IF NOT EXISTS listings ON CLUSTER default;
    
  5. Check that the database has been created:

    SHOW DATABASES
    
    ┌─name───────────────┐
    │ INFORMATION_SCHEMA │
    │ _system            │
    │ default            │
    │ listings           │  // your database
    │ information_schema │
    │ system             │
    └────────────────────┘
    

Step 4. Create a source MongoDB endpoint

To create a source MongoDB endpoint:

  1. Go to the Transfer page in the console and click CreateSource endpoint at the top right.

  2. In Source type, select MongoDB.

  3. Enter an endpoint name, such as mongo-source-dev.

  4. In connection type select SRV.

  5. Fill out the connection details using the SRV connection string you copied earlier. The connection string has the following format:

    mongodb+srv://<username>:<password>@<hostname>/<database>
    
  6. Under Collection filter, add listings in Included collections.

  7. Click Submit.

Step 5. Create a target ClickHouse® endpoint

To create a target endpoint:

  1. On the Transfer page, click CreateTarget endpoint.

  2. In Target type, select ClickHouse.

  3. Enter an endpoint name, such as clickhouse-target-dev.

  4. In Connection type, select Managed cluster and select the cluster that you created from the dropdown.

    You can leave the default values in other fields.

  5. Click Submit.

Step 6. Create a transfer

Now that your endpoints are ready, you can create a transfer.

  1. On the Transfer page, click CreateTransfer at the top right.

  2. Under Endpoints, select the endpoints you just created — mongodb-source-dev and clickhouse-target-dev as the source and target respectively.

  3. Under Basic settings, enter a transfer name, such as transfer-dev.

  4. In Transfer type, select Snapshot and replication.

    This transfer type transfers the current state of the source database, and then keeps all the added data in sync.

  5. Leave the default preselected options in Snapshot settings and Runtime environment

  6. In List of objects for transfer, click + Object and enter the name of the collection you want to transfer — sample_airbnb.listingsAndReviews.

  7. Click + Transformation.

  8. In the Transformer block that appeared, select SQL in the dropdown.

  9. In the SQL field, enter a query that transforms and flattens BSON documents to allow them to be added to ClickHouse® columns. For example, to extract several fields from documents in the sample Airbnb listing collection, use the following query:

    SELECT _id,
           JSON_VALUE(assumeNotNull(document), '$.listing_url') AS listing_url,
           JSON_VALUE(assumeNotNull(document), '$.name') AS name,
           JSON_VALUE(assumeNotNull(document), '$.property_type') AS property_type,
           JSONExtractRaw(assumeNotNull(document), 'amenities') AS amenities,
           JSON_VALUE(assumeNotNull(document), '$.price') AS price,
           JSON_VALUE(assumeNotNull(document), '$.address.country') AS country,
           JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[0]') AS lat,
           JSON_VALUE(assumeNotNull(document), '$.address.location.coordinates[1]') AS lng
    FROM table
    

    Tip

    This query processes data on the fly using ClickHouse® JSON functions, such as JSON_VALUE or JSONExtractRaw. To access nested values in MongoDB documents, use the dot notation. Learn more about using the SQL transformer

    1. Click Submit.

Step 7. Activate the transfer and query transferred data

Now that you transfer is fully configured, you can use it to replicate data from MongoDB to ClickHouse®.

  1. On the transfer page, click Activate at the top right.

    After the transfer is activated, it fetches data from the source database, applies transformations to the data, and uploads it to the target database.

    The initial activation may take a few minutes. When some data is successfully replicated, you can see green bars in the Status history section:

    Screenshot showing a transfer page with green bars that indicate successful data transfer

  2. Go to the Clusters page in the console.

  3. Select the cluster you created earlier.

  4. Click WebSQL at the top right.

  5. In the Connection manager on the left, select the listings database.

  6. In the query editor, run the following statement:

    SELECT * FROM sample_airbnb_listingsAndReviews LIMIT 100;
    
  7. The query output is displayed in the Results section:

    Screenshot showing WebSQL with a query and results

What’s next