Apache Airflow® Core Components
An Apache Airflow® cluster comprises the following essential components:
-
Scheduler: This component is responsible for both triggering scheduled workflows and submitting tasks to the executor for execution.
-
Executor: The executor handles the execution of tasks.
-
Webserver: The webserver provides a user-friendly interface for inspecting, triggering, and debugging the behavior of DAGs and tasks.
-
A Git repository containing DAG files, which are read by the scheduler and executor.
-
A metadata database used by the scheduler, executor, and webserver to store state information.
Scheduler
The Airflow® scheduler continuously monitors all tasks and DAGs. It triggers task instances once their dependencies are satisfied.
At regular intervals, typically once per minute by default, the scheduler collects DAG parsing results and checks if any active tasks are ready to be triggered.
For more information, see the official Apache Airflow® documentation
Executor
There are two types of executors in Airflow®:
-
Local Executors: These run tasks locally inside the scheduler process.
-
Remote Executors: They run tasks remotely using a pool of workers.
For more information, see the official Airflow® documentation
Webserver
The Airflow® Webserver offers a web-based user interface for setting up and managing workflows. The key capabilities of the Airflow® Webserver include:
-
Workflow Visualization: The Webserver enables you to visualize the DAGs of your workflows.
-
Workflow Management: You can create, edit, and delete DAGs through the web interface. Additionally, you can define tasks and their relationships to create complex workflows.
-
Task Monitoring: The Webserver displays the status of each task within a DAG. It provides information such as the task's current state (running, success, failure, etc.), start and end times, and task logs.
-
Event Log: The Airflow® Webserver maintains an event log that records various events related to workflows, tasks, and their execution.