Apache Airflow® glossary

This glossary defines the concepts and terms essential for understanding and working with Airflow®.

Connection

A connection is an object used for storing login credentials and other information Airflow® needs to connect to external services and exchange data with them. Every connection has a unique conn_id value that you can provide to hooks and operators when a connection is required.

Custom image

A container image built for a specific workflow. A custom Airflow® image can extend the base image with custom plugins, packages, or libraries tailored for your needs.

For example, with custom container images, users can deploy specific versions of Airflow® or dependencies not included in the default configuration. Custom images give you full control over the environment, but you have to maintain and update packages and dependencies yourself. Learn how to use a custom image in Managed Airflow®

DAG

A Directed Acyclic Graph or DAG is a collection of tasks in Airflow® organized in a way that corresponds to their dependencies and relationships. DAGs are used to define and manage workflows. The word acyclic means that task execution doesn’t contain any loops and must ultimately end. Learn more about DAGs

DAG run

A DAG run is an execution instance of a DAG that corresponds to a specific run of the workflow that this DAG defines.

Dagbag

A dagbag is a collection of DAGs parsed out of a folder tree. It allows Airflow to keep DAGs in memory instead of constantly rereading and reparsing them from files.

Executor

An executor is a mechanism that handles task execution. Depending on where tasks run, executors are divided into local and remote. Local executors run tasks inside the scheduler process, while remote executors usually use a pool of remote workers.

Hook

A hook is a high-level abstraction of an external API that allows Airflow® to communicate with that system without low-level code or dedicated libraries.

Image

An image is a container image with a specific version of Airflow®, OS, and Python. When deploying a Managed Apache Airflow® cluster on DoubleCloud, you can use either the default image with only the necessary packages, or a custom image that you build for your specific workflow.

Operator

A predefined template for a task that can be used declaratively inside a DAG. Core operators include BashOperator for running bash commands, PythonOperator for executing Python functions, and EmailOperator for sending out emails.

Scheduler

The Scheduler is a process that runs in the background and constantly monitors DAGs and tasks. It schedules the tasks when they need to run and distributes tasks across workers. Learn more about schedulers

Sensor

A sensor is a special type of Operator designed to wait for something to occur.

Task

A task is a unit of execution in Airflow®. Tasks have dependencies that are referred to as upstream and downstream tasks in the Airflow® terminology.

Previous