Create DoubleCloud resources with Terraform

Terraform is an infrastructure-as-code tool that allows you to provision and manage cloud resources with declarative configuration files. With DoubleCloud, you can use Terraform to manage ClickHouse®, Apache Kafka®, and Airflow® clusters, data transfers, endpoints, and network connections.

In this tutorial, you learn how to create resources using the DoubleCloud Terraform provider . This will help you get started with DoubleCloud services and give you an idea of how you can benefit from them.

Before you start

If you haven't already, install Terraform on your local machine.
Select a service account to use with Terraform or create a new one. Make sure this account has the Editor permissions for the services you want to use.
Create an API key for that account and download a file with keys.

Step 1. Configure the DoubleCloud Terraform provider

When creating resources with Terraform, you first describe their parameters in Terraform configuration files. After that, you run Terraform, and it provisions the resources on your behalf.

Create a new directory for your project and navigate to it:

mkdir doublecloud-terraform && cd doublecloud-terraform

Move the file with keys that you downloaded to this directory.

Create a new Terraform configuration file named main.tf and add the following code:

# main.tf

terraform {
  required_providers {
    doublecloud = {
      source = "registry.terraform.io/doublecloud/doublecloud"
    }
  }
}

provider "doublecloud" {
  authorized_key = file("authorized_key.json")
}

The terraform section instructs Terraform to use the DoubleCloud provider and download it from the official Terraform Registry. The provider section specifies where to look for the API keys.

Step 2. Add resource configuration

In the same main.tf file, add the configuration of the resources you want to create.

ClickHouse® cluster

Apache Kafka® cluster

Apache Airflow® cluster

Endpoints

Transfer

Tip

When you create a Managed ClickHouse® cluster, it needs a network. In Terraform configuration, the network must be placed before the cluster as in this example.

# main.tf

...
data "doublecloud_network" "default" {
  name       = NETWORK_NAME              # Replace with the name of the network you want to use
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
}

resource "doublecloud_clickhouse_cluster" "example-clickhouse" {
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  name       = "example-clickhouse"
  region_id  = "eu-central-1"
  cloud_type = "aws"
  network_id = data.doublecloud_network.default.id

  resources {
    clickhouse {
      resource_preset_id = "s2-c2-m4"
      disk_size          = 34359738368
      replica_count      = 1
    }
  }

  config {
    log_level       = "LOG_LEVEL_TRACE"
    max_connections = 120
  }

  access {
    data_services    = ["transfer"]
    ipv4_cidr_blocks = [
      {
        value       = "10.0.0.0/24"
        description = "Office in Berlin"
      }
    ]
  }
}

You can find the project ID on the project settings page . For a full list of available parameters, refer to the ClickHouse cluster resource schema .

Tip

When you create a Managed Apache Kafka® cluster, it needs a network. In Terraform configuration, the network must be placed before the cluster as in this example.

# main.tf

...
data "doublecloud_network" "default" {
  name       = NETWORK_NAME              # Replace with the name of the network you want to use
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
}

resource "doublecloud_kafka_cluster" "example-kafka" {
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  name       = "example-kafka"
  region_id  = "eu-central-1"
  cloud_type = "aws"
  network_id = data.doublecloud_network.default.id

  resources {
    kafka {
      resource_preset_id = "s2-c2-m4"
      disk_size          = 34359738368
      broker_count       = 1
      zone_count         = 1
    }
  }

  schema_registry {
    enabled = false
  }

  access {
    data_services    = ["transfer"]
    ipv4_cidr_blocks = [
      {
        value       = "10.0.0.0/24"
        description = "Office in Berlin"
      }
    ]
  }
}

You can find the project ID on the project settings page . For a full list of available parameters, refer to the Kafka cluster resource schema .

Tip

When you create a Managed Apache Airflow® cluster, it needs a network. In Terraform configuration, the network must be placed before the cluster as in this example.

# main.tf

...
data "doublecloud_network" "default" {
  name       = NETWORK_NAME              # Replace with the name of the network you want to use
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
}

resource "doublecloud_airflow_cluster" "example-airflow" {
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  name       = "example-airflow"
  region_id  = "eu-central-1"
  cloud_type = "aws"
  network_id = data.doublecloud_network.default.id

  resources {
    airflow {
      max_worker_count   = 1
      min_worker_count   = 1
      environment_flavor = "dev_test"
      worker_concurrency = 16
      worker_disk_size   = 10
      worker_preset      = "small"
    }
  }

  config {
    version_id = "2.10.0"
    sync_config {
      repo_url  = "https://github.com/apache/airflow"
      branch    = "main"
      dags_path = "airflow/example_dags"
    }
  }

  access {
    data_services    = ["transfer"]
    ipv4_cidr_blocks = [
      {
        value       = "10.0.0.0/24"
        description = "Office in Berlin"
      }
    ]
  }
}

You can find the project ID on the project settings page . For a full list of available parameters, refer to the Airflow® cluster resource schema .

An endpoint is a connection between a database and Transfer. A source endpoint connects to a remote source and sends data to Transfer, while a target endpoint writes this data to a database.

# main.tf

...
# Source endpoint resource
resource "doublecloud_transfer_endpoint" "example-s3-source" {
  name       = "example-s3-source"
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  settings {
    s3_source {
      dataset = "hits"
      format {
        parquet {
          buffer_size = 10000
        }
      }
      path_pattern = "data-sets/hits.parquet"
      provider {
        bucket = "doublecloud-docs"
      }
      schema {}
    }
  }
}

# Target endpoint resource
resource "doublecloud_transfer_endpoint" "example-clickhouse-target" {
  name       = "example-clickhouse-target"
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  settings {
    clickhouse_target {
      clickhouse_cleanup_policy = "DROP"
      connection {
        address {
          cluster_id = example-clickhouse.id
        }
        database = "default"
        password = CLICKHOUSE_PASSWORD   # Replace with the ClickHouse user password
        user     = "admin"
      }
    }
  }
}

You can find the project ID on the project settings page . For a full list of available parameters, refer to the Transfer endpoint resource schema .

A transfer requires a source and a target endpoint. If you create new endpoints for your transfer resource, make sure to place their configuration before the transfer as in this example.

# main.tf

...
# Source endpoint resource
resource "doublecloud_transfer_endpoint" "example-s3-source" {
  ...
}

# Target endpoint resource
resource "doublecloud_transfer_endpoint" "example-clickhouse-source" {
  ...
}

# Transfer resource
resource "doublecloud_transfer" "example-transfer" {
  name = "example-transfer"
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  source = doublecloud_transfer_endpoint.example-s3-source.id
  target = doublecloud_transfer_endpoint.example-clickhouse-target.id
  type = "SNAPSHOT_ONLY"
  activated = false
  transformation = {
    transformers = [
      {
        dbt = {
          git_repository_link = "https://github.com/doublecloud/tests-clickhouse-dbt.git"
          profile_name = "my_clickhouse_profile"
          operation = "run"
        }
      },
    ]
  }
}

You can find the project ID on the project settings page . For a full list of available parameters, refer to the Transfer resource schema .

Tip

In this tutorial, you are placing the Terraform and resource configuration in one .tf file. When creating a more complex project, you may want to use variables and split the configuration into several files, such as main.tf, resource.tf, etc.

It’s also good practice to move sensitive information to variables.

Step 3. Initialize Terraform and validate the configuration

Initialize Terraform by running the following command in the project directory. It downloads the provider and builds the .terraform directory.

terraform init

Initializing the backend...

Initializing provider plugins...
- Finding latest version of doublecloud/doublecloud...
- Installing doublecloud/doublecloud v0.1.24...
...

Terraform has been successfully initialized!

(Optional) Validate the configuration. This command verifies that the syntax is correct and outputs the resources that will be created.

terraform plan

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  + create
Terraform will perform the following actions:
...
Plan: 4 to add, 0 to change, o to destroy.

Step 4. Apply the configuration and create resources

Apply the configuration and create resources:
```
terraform apply
```
When prompted, type yes and press Enter. Terraform will provision your resources, which may take some time.
```
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
```

Step 5. (Optional) Clean up

When you no longer need the resources you created, you can delete them using Terraform to avoid incurring additional costs. To do that:

In the project's configuration file, comment out or remove the resources you want to delete.
Apply the configuration:
```
terraform apply
```
When prompted, type yes and press Enter. Terraform will delete the resources you commented out or removed. This may take some time.
```
...
Apply complete! Resources: 0 added, 0 changed, 4 destroyed.
```

Alert

To remove all the resources described in the project's configuration file, you can also use the terraform destroy command. However, it's not common to use it in production environments.