Catalog Details
CATEGORY
workloadsCREATED BY
UPDATED AT
May 17, 2024VERSION
1.0
What this pattern does:
Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. Airflow works best with workflows that are mostly static and slowly changing. When the DAG structure is similar from one run to the next, it clarifies the unit of work and continuity. Other similar projects include Luigi, Oozie and Azkaban. Airflow is commonly used to process data, but has the opinion that tasks should ideally be idempotent (i.e., results of the task will be the same, and will not create duplicated data in a destination system), and should not pass large quantities of data from one task to the next (though tasks can pass metadata using Airflow's XCom feature). For high-volume, data-intensive tasks, a best practice is to delegate to external services specializing in that type of work. Airflow is not a streaming solution, but it is often used to process real-time data, pulling data off streams in batches. Principles Dynamic: Airflow pipelines are configuration as code (Python), allowing for dynamic pipeline generation. This allows for writing code that instantiates pipelines dynamically. Extensible: Easily define your own operators, executors and extend the library so that it fits the level of abstraction that suits your environment. Elegant: Airflow pipelines are lean and explicit. Parameterizing your scripts is built into the core of Airflow using the powerful Jinja templating engine. Scalable: Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers.
Caveats and Consideration:
Make sure to fill out your own postgres username ,password, host,port etc to see airflow working as per your database requirements. pass them as environment variables or create secrets for password and config map for ports ,host .
Compatibility:
Recent Discussions with "meshery" Tag
- Apr 14 | Unable to deploy meshery to minikube
- May 08 | No reachable contexts found in the uploaded kube config
- May 08 | Meshery Development Meeting | May 8th 2024
- May 01 | WEBINAR: Making the CNCF Landscape interactive with Meshery
- Apr 24 | Meshery Development Meeting | April 24th 2024
- Mar 11 | [Help Wanted] A list of open DevOps-centric needs on Meshery projects
- Apr 16 | Help needed for setup of meshery cli
- Apr 17 | Meshery Development Meeting | April 17th 2024
- Apr 12 | What exactly is this sistent design system project
- Nov 11 | Unable setup local Meshery development server