Getting started with Managed Service for Apache Airflow

To get started with the service:

Before you start

Your primary tool to interact with the DoubleCloud is the console. We need to set it up and then configure it before moving on.

  1. Go to the console.

  2. Log in to DoubleCloud if you already have an account, or create one if you are opening the console for the first time.

Create a cluster

  1. Go to the Clusters page in the console.

  2. Click Create cluster in the upper-right corner of the page.

  3. Select Airflow.

  4. Choose a provider and a region.

  5. Under Cluster configuration, select the ratio of web servers, schedulers and workers that best fit your use case.

  6. Under Worker node resources:

    1. Select the small preset for CPU, RAM capacity, and storage space to create a cluster with minimal configuration.

    2. Choose the minimum and maximum numbers of workers for autoscaling. This will adjust the capacity of the system depending on the current load.

    3. Set the Concurrency limit. It defines how many running task instances a DAG is allowed to have, beyond which point the tasks are queued.

  7. Under Basic settings:

    1. Enter the cluster Name, in this scenario - tutorial-cluster.

    2. From the Version drop-down list, select the Apache Airflow® version your cluster will use. For most clusters, we recommend using the latest version.

  8. Under DAG Code Repository:

    1. Provide the Repository URL where your DAGs are stored.

    2. Specify the path to the folder with the DAGs in the repository under DAG Path.

    3. Specify the Branch from which to scrape production-ready DAG versions.

    4. Provide the Username and Password/token for your repository.

      Token creation instructions

      If you use GitHub as your DAG code repository, use official instructions to create a personal access token.

  9. Under Advanced settings:

    1. Under NetworkingVPC, select the network where you want to create the cluster.

      If you don’t need to place the cluster in a specific network, leave the preselected default option.

    2. (optional) Add IP addresses or CIDRs allowed to access your cluster to the allowlist.

    3. Under Maintenance settings, select between the arbitrary and scheduled maintenance:

      About maintenance settings

      If you select Arbitrary, DoubleCloud selects the maintenance window automatically. Usually, maintenance takes place at the earliest available time slot.

      Warning

      If your cluster has only one host, arbitrary maintenance can make it unavailable at a random time.

      To perform maintenance on a specific date and time, select By schedule and specify the day and time (UTC) when you want the cluster maintenance to be performed.

  10. Click Create cluster.

Your cluster will appear with the Creating status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive.

Log in to your Airflow®

  1. Open your new cluster's Overview page and click the link under Webserver.

  2. In the login form, specify the User and the Password listed on your cluster's Overview page.

Now you can run DAGs and create workflows from your Airflow® cluster's web interface.