Create an Apache Airflow® cluster
This tutorial guides you through creating a Managed Apache Airflow® cluster in DoubleCloud.
Step 1. Configure resources
-
Go to the Clusters
-
Select Airflow.
-
Choose a provider and a region. You can create Managed Apache Airflow® clusters on AWS in any of the available regions. By default, DoubleCloud preselects the region nearest to you.
-
In Environment configuration, select the configuration that best fits your needs. It defines the ratio of webservers, schedulers, and triggerers.
-
Under Worker node resources, select a preset with the amount of CPU, RAM, and SSD storage suitable for your workload.
-
In Min worker nodes and Max worker nodes, specify the lower and upper limits of workers for autoscaling. The cluster will automatically adjust the number of workers depending on the load.
-
In Concurrency, specify the limit for how many running task instances a DAG can have. Any task instance on top of this number is queued.
-
Under Basic settings in Name enter a cluster name, such as
airflow-dev
. -
In Version, select the Airflow® version for the cluster. Unless you need a specific version, select the latest one.
-
If your DAGs are stored in a Git repository, configure a connection under DAG Code Repository:
How to configure a connection
-
In Repository URL, DAG path, and Branch, enter the details of your Git repository with DAGs.
-
If the repository is private, enter the credentials in Username and Password/token. For GitHub, use a personal access token
-
To make sure the connection details are correct, click Check connection.
-
Step 2. Configure advanced settings
-
In Maintenance settings, select whether you want DoubleCloud to perform maintenance at an arbitrary time or by schedule. If you selected By schedule, select the day and time (UTC).
-
Under Networking → VPC, select the network where you want to create the cluster.
If you don’t need to place the cluster in a specific network, leave the preselected default option. You can create a new network on the VPC
Airflow® clusters in BYOC networks
You can create Managed Apache Airflow® clusters only in BYOC networks created after September 19, 2024.
-
(Optional) In Allowlist, configure IP addresses the Airflow® cluster can be accessed from. To do that, click Edit and add or remove IP addresses. You can use both single addresses and CIDR blocks. When you're done, click Save in the dialog.
-
In the Summary block on the right, review the resources to be created and their price.
-
Click Submit.
When the cluster is ready, its status changes from Creating to Alive.
You can create a Managed Apache Airflow® cluster using the
DoubleCloud Terraform provider
Tip
If you haven't used Terraform before, refer to Create DoubleCloud resources with Terraform for more detailed instructions.
Example provider and resource configuration:
# main.tf
terraform {
required_providers {
doublecloud = {
source = "registry.terraform.io/doublecloud/doublecloud"
}
}
}
provider "doublecloud" {
authorized_key = file("authorized_key.json")
}
data "doublecloud_network" "default" {
name = NETWORK_NAME # Replace with the name of the network you want to use
project_id = DOUBLECLOUD_PROJECT_ID # Replace with your project ID
}
resource "doublecloud_airflow_cluster" "example-airflow" {
project_id = DOUBLECLOUD_PROJECT_ID # Replace with your project ID
name = "example-airflow"
region_id = "eu-central-1"
cloud_type = "aws"
network_id = data.doublecloud_network.default.id
resources {
airflow {
max_worker_count = 1
min_worker_count = 1
environment_flavor = "dev_test"
worker_concurrency = 16
worker_disk_size = 10
worker_preset = "small"
}
}
config {
version_id = "2.10.0"
sync_config {
repo_url = "https://github.com/apache/airflow"
branch = "main"
dags_path = "airflow/example_dags"
}
}
access {
data_services = ["transfer"]
ipv4_cidr_blocks = [
{
value = "10.0.0.0/24"
description = "Office in Berlin"
}
]
}
}
To learn how to get the authorized_key.json
file,
refer to Create an API key.
You can find the DoubleCloud project ID on the project settings page.
Tip
This example contains a minimum set of parameters required to create a functional example cluster.
When you create your production cluster, make sure to use the configuration that is suitable for your needs.
For a full list of available parameters, refer to the
DoubleCloud Airflow® cluster resource schema
To create a Managed Apache Airflow® cluster,
use the ClusterService
create method.
The following parameters are required to create a functional cluster:
-
project_id
: ID of your project. You can get the ID on your project’s information page. -
cloud_type
:aws
. -
region_id
: AWS region to create the cluster in. -
name
: Name of your cluster. It must be unique within the project. -
resources
: Specify the following settings from the doublecloud.aifrlow.v1.ClusterResources model:-
environment_flavor
: Environment configuration. -
max_worker_count
: Maximum number of workers. -
min_worker_count
: Minimum number of workers. -
worker_concurrency
: Worker concurrency. -
worker_disk_size
: Worker disk size (GiB). -
worker_preset
: Worker resource preset.
-