Getting started with Managed Service for Apache Airflow

To get started with the service:

Before you start

Your primary tool to interact with the DoubleCloud is the console. We need to set it up and then configure it before moving on.

  1. Go to the console.

  2. Log in to DoubleCloud if you already have an account, or create one if you are opening the console for the first time.

Create a cluster

  1. Go to the Clusters page in the console.

  2. Click Create cluster in the upper-right corner of the page.

  3. Select Airflow.

  4. Choose a provider and a region.

  5. Under Cluster configuration, select the ratio of web servers, schedulers and workers that best fit your use case.

  6. Under Worker node resources:

    1. Select the small preset for CPU, RAM capacity, and storage space to create a cluster with minimal configuration.

    2. Choose the minimum and maximum numbers of workers for autoscaling. This will adjust the capacity of the system depending on the current load.

    3. Set the Concurrency limit. It defines how many running task instances a DAG is allowed to have, beyond which point the tasks are queued.

  7. Under Basic settings:

    1. Enter the cluster Name, in this scenario - tutorial-cluster.

    2. From the Version drop-down list, select the Apache Airflow® version your cluster will use. For most clusters, we recommend using the latest version.

  8. Under DAG Code Repository:

    1. Provide the Repository URL where your DAGs are stored.

    2. Specify the path to the folder with the DAGs in the repository under DAG Path.

    3. Specify the Branch from which to scrape production-ready DAG versions.

    4. Provide the Username and Password/token for your repository.

      Token creation instructions

      If you use GitHub as your DAG code repository, use official instructions to create a personal access token.

  9. Under Advanced settings:

    1. Under NetworkingVPC, specify in which DoubleCloud VPC to locate your cluster. Use the default value in the previously selected region if you don't need to create this cluster in a specific network.

    2. (optional) Add IP addresses or CIDRs allowed to access your cluster to the allowlist.

    3. Under Maintenance settings, select the scheduling type:

      • Arbitrary to delegate maintenance window selection to DoubleCloud. Usually, your cluster will perform maintenance procedure at the earliest available time slot.

        Warning

        We suggest not to use this scheduling type with single-host clusters, as it can lead to your cluster becoming unavailable at random.

      • By schedule to set the weekday and time (UTC) when DoubleCloud may perform maintenance on your cluster.

  10. Click Create cluster.

Your cluster will appear with the Creating status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive.

Log in to your Airflow®

  1. Open your new cluster's Overview page and click the link under Webserver.

  2. In the login form, specify the User and the Password listed on your cluster's Overview page.

Now you can run DAGs and create workflows from your Airflow® cluster's web interface.