Author: @nicseltzer, @jigglepuff, @mrpetrocket
Status: In-Development
This project aims to be the jumping off point for the OpenSTL Extract, Transform, and Load (ETL) pipeline.
The below notes assume that you've forked the repo to your account and have pulled down a clone of that fork.
Note: This project has only been built on MacOS and Linux so far. Submit PRs with instructions on building / running in Windows (or whatever you use).
If you don't have Python3 installed, you'll need to download it from the Python website for your particular operating system.
There are a lot of resources for doing this online, but I recommend the following (especically if you're using VSCode):
- From your locally cloned repo, run
python3 -m venv .venv
. This will create a .venv directory which contains all of the pieces for your local Python project. - That's it. You've created a venv.
- Using the your favorite terminal or the terminal built-in to VSCode (which should pick up
.venv
automatically), runsource ./.venv/bin/activate
. This will do the magic of setting up your project in isolation from your global package manifest. - Next, we need to get the dependencies for the project. You can do this by running
./make.py
. If you need to add dependencies, you can add them withpip
as you normally would. Just make sure to run./package.py
before committing back to the repo. - If you are a Windows or RedHat user (any system without apt as package manager frontend), you might have to manually install
mdbtools
.
- Visit MDBTools official site for manual installation instructions.
If you're running the application locally on your PC, use the following instructions. This will create a local database.
- Run
python3 ./app.py
.
--db dev
Use local database.
--db prod
Use production database. If not specified, argument defaults to 'dev'.
--local-sources
Use local files instead of downloading from the internet - this drastically reduces startup times while testing.
- Download from internet, commit to local database
python3 ./app.py
- Use local sources, and commit to local database.
python3 ./app.py --local-sources file1.mdb file2.dbf
- Download from internet, commit to production database.
python3 ./app.py --db prod
- Run
deactivate
.
If you're running the application to integrate with production database, use the following instructions. For development, you most likely will not need this.
- Get database credentials from @jigglepuff
- Run
python3 ./config.py
- Enter hostname:
dbopenstl.johnkramlich.com
- Enter database name:
openstl
- Enter username: (Ask project lead for credentials)
- Enter password: (Ask project lead for credentials)
- Enter hostname:
- Run
python3 ./app.py --db prod
.
Author: @nicseltzer
Status: Alpha
This script is run at a configurable interval and is responsible for fetching data from configured remote sources.
Author: @nicseltzer
Status: Alpha
This module is responsible for classifying fetched binary data. The application will hand this data off to the Extractor module.
Author: @jigglepuff
Status: In-Development
This module is responsible for taking data of a given format and extracting it to an agreed upon, unifrom format
Author: @mrpetrocket
Status: In-Development
This module will mold the data into a usable state.
Author: @jigglepuff
Status: In-Development
This module is responsible for pushing the transformed data into a persistent datastore.
https://www.stlouis-mo.gov/data/upload/data-files/prcl_shape.zip
https://www.stlouis-mo.gov/data/upload/data-files/prcl.zip https://www.stlouis-mo.gov/data/upload/data-files/par.zip
https://www.stlouis-mo.gov/data/upload/data-files/lra_public.zip
https://www.stlouis-mo.gov/data/upload/data-files/bldginsp.zip
https://www.stlouis-mo.gov/data/upload/data-files/prmbdo.zip
https://www.stlouis-mo.gov/data/upload/data-files/forestry-maintenance-properties.csv