Skip to content

An extension of RedB to Integrate the City of St. Louis's Parcel API and Generate Vacancy Estimates

Notifications You must be signed in to change notification settings

stlrda/VacantParcel-Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow-Workflows

This is a template repository for the DAGs, scripts, and other resources associated with an Apache Airflow project within the context of the Airflow-Platorm module and Saint Louis Regional Data Alliance Airflow Project Management Framework (outlined below). This module is designed to serve as a template; create a separate repository for each Airflow project.

How STLRDA Manages Airflow Projects

The Saint Louis Regional Data Alliance's Airflow projects are managed using multiple Github repositories. Each repository is linked to and described below; see individual repositories for more information about their place in our ecosystem.

Airflow-Platorm sets up and configures our Airflow cluster.

Airflow-Admin Tools is put into the Airflow/dags folder of each server in the cluster, and provides basic dags that facilitate the administration of the cluster.

Airflow-Infrastructure is a template repository for creating new Airflow projects. It spins up the non-Airflow AWS resources needed to do data integration work.

Airflow-Workflows is a template repository for the DAGs, scripts,and other resources associated with a single Airflow-based ELT project.

Replicating the STLRDA Workflow

  1. Fork Airflow-Admin Tools.
  2. Modify your version of clone_and_link.py to look at your copy of projects.csv
  3. Clone or fork Airflow-Platorm and follow the provided instructions for setting up your Airflow cluster. Point the github variables in the .tfvars files to your fork of Airflow-Admin Tools.
  4. For each Airflow project you would like to manage separately, create a separate copy of Airflow-Infrastructure and run it to spin up an S3 bucket and PostGreSQL database to serve as the ELT target.
  5. For each Airflow project you would like to manage separately, create a separate copy of Airflow-Workflows to manage the works and scripts associated with the project. Add this repository to your copy of projects.csv.
  6. Run the ImportDags Dag from your Airflow instance to pull in and update all your projects.

About

An extension of RedB to Integrate the City of St. Louis's Parcel API and Generate Vacancy Estimates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages