Create an Apache Airflow® cluster

This tutorial guides you through creating a Managed Apache Airflow® cluster in DoubleCloud.

Step 1. Configure resources

  1. Go to the Clusters page and click Create cluster at the top right.

  2. Select Airflow.

  3. Choose a provider and a region. You can create Managed Apache Airflow® clusters on AWS in any of the available regions. By default, DoubleCloud preselects the region nearest to you.

  4. In Environment configuration, select the configuration that best fits your needs. It defines the ratio of webservers, schedulers, and triggerers.

  5. Under Worker node resources, select a preset with the amount of CPU, RAM, and SSD storage suitable for your workload.

  6. In Min worker nodes and Max worker nodes, specify the lower and upper limits of workers for autoscaling. The cluster will automatically adjust the number of workers depending on the load.

  7. In Concurrency, specify the limit for how many running task instances a DAG can have. Any task instance on top of this number is queued.

  8. Under Basic settings in Name enter a cluster name, such as airflow-dev.

  9. In Version, select the Airflow® version for the cluster. Unless you need a specific version, select the latest one.

  10. If your DAGs are stored in a Git repository, configure a connection under DAG Code Repository:

    How to configure a connection
    1. In Repository URL, DAG path, and Branch, enter the details of your Git repository with DAGs.

    2. If the repository is private, enter the credentials in Username and Password/token. For GitHub, use a personal access token .

    3. To make sure the connection details are correct, click Check connection.

Step 2. Configure advanced settings

  1. In Maintenance settings, select whether you want DoubleCloud to perform maintenance at an arbitrary time or by schedule. If you selected By schedule, select the day and time (UTC).

  2. Under NetworkingVPC, select the network where you want to create the cluster.

    If you don’t need to place the cluster in a specific network, leave the preselected default option. You can create a new network on the VPC page in the console.

    Airflow® clusters in BYOC networks

    You can create Managed Apache Airflow® clusters only in BYOC networks created after September 19, 2024.

  3. (Optional) In Allowlist, configure IP addresses the Airflow® cluster can be accessed from. To do that, click Edit and add or remove IP addresses. You can use both single addresses and CIDR blocks. When you're done, click Save in the dialog.

  4. In the Summary block on the right, review the resources to be created and their price.

  5. Click Submit.

    When the cluster is ready, its status changes from Creating to Alive.

You can create a Managed Apache Airflow® cluster using the DoubleCloud Terraform provider .

Tip

If you haven't used Terraform before, refer to Create DoubleCloud resources with Terraform for more detailed instructions.

Example provider and resource configuration:

# main.tf

terraform {
  required_providers {
    doublecloud = {
      source    = "registry.terraform.io/doublecloud/doublecloud"
    }
  }
}

provider "doublecloud" {
  authorized_key = file("authorized_key.json")
}

data "doublecloud_network" "default" {
  name       = NETWORK_NAME              # Replace with the name of the network you want to use
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
}

resource "doublecloud_airflow_cluster" "example-airflow" {
  project_id = DOUBLECLOUD_PROJECT_ID    # Replace with your project ID
  name       = "example-airflow"
  region_id  = "eu-central-1"
  cloud_type = "aws"
  network_id = data.doublecloud_network.default.id

  resources {
    airflow {
      max_worker_count   = 1
      min_worker_count   = 1
      environment_flavor = "dev_test"
      worker_concurrency = 16
      worker_disk_size   = 10
      worker_preset      = "small"
    }
  }

  config {
    version_id = "2.10.0"
    sync_config {
      repo_url  = "https://github.com/apache/airflow"
      branch    = "main"
      dags_path = "airflow/example_dags"
    }
  }

  access {
    data_services    = ["transfer"]
    ipv4_cidr_blocks = [
      {
        value       = "10.0.0.0/24"
        description = "Office in Berlin"
      }
    ]
  }
}

To learn how to get the authorized_key.json file, refer to Create an API key. You can find the DoubleCloud project ID on the project settings page.

Tip

This example contains a minimum set of parameters required to create a functional example cluster. When you create your production cluster, make sure to use the configuration that is suitable for your needs. For a full list of available parameters, refer to the DoubleCloud Airflow® cluster resource schema .

To create a Managed Apache Airflow® cluster, use the ClusterService create method.

The following parameters are required to create a functional cluster:

  • project_id: ID of your project. You can get the ID on your project’s information page.

  • cloud_type: aws.

  • region_id: AWS region to create the cluster in.

  • name: Name of your cluster. It must be unique within the project.

  • resources: Specify the following settings from the doublecloud.aifrlow.v1.ClusterResources model:

    • environment_flavor: Environment configuration.

    • max_worker_count: Maximum number of workers.

    • min_worker_count: Minimum number of workers.

    • worker_concurrency: Worker concurrency.

    • worker_disk_size: Worker disk size (GiB).

    • worker_preset: Worker resource preset.

See also