This is a template repository for the DAGs, scripts, and other resources associated with an Apache Airflow project within the context of the Airflow-Platorm module and Saint Louis Regional Data Alliance Airflow Project Management Framework (outlined below). This module is designed to serve as a template; create a separate repository for each Airflow project.
The Saint Louis Regional Data Alliance's Airflow projects are managed using multiple Github repositories. Each repository is linked to and described below; see individual repositories for more information about their place in our ecosystem.
Airflow-Platorm sets up and configures our Airflow cluster.
Airflow-Admin Tools is put into the Airflow/dags folder of each server in the cluster, and provides basic dags that facilitate the administration of the cluster.
Airflow-Infrastructure is a template repository for creating new Airflow projects. It spins up the non-Airflow AWS resources needed to do data integration work.
Airflow-Workflows is a template repository for the DAGs, scripts,and other resources associated with a single Airflow-based ELT project.
- Fork Airflow-Admin Tools.
- Modify your version of clone_and_link.py to look at your copy of projects.csv
- Clone or fork Airflow-Platorm and follow the provided instructions for setting up your Airflow cluster. Point the github variables in the .tfvars files to your fork of Airflow-Admin Tools.
- For each Airflow project you would like to manage separately, create a separate copy of Airflow-Infrastructure and run it to spin up an S3 bucket and PostGreSQL database to serve as the ELT target.
- For each Airflow project you would like to manage separately, create a separate copy of Airflow-Workflows to manage the works and scripts associated with the project. Add this repository to your copy of projects.csv.
- Run the ImportDags Dag from your Airflow instance to pull in and update all your projects.