-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: Formalize ingestion process #136
Comments
Some thoughts on the ingestion pipeline: Current process:Contributing new datasets:Scientists contact the developers to determine correct display options for the dataset (eg: colour swatches, legend stops, rescaling factor, etc). A dataset metadata json file gets added to the API's codebase and the API gets deployed. Every 24hrs, the dataset metadata generator lambda reads through all the dataset metadata json files for the S3 folders of each dataset, and searches these folders for available dates for each dataset, and writes them to a json file in S3. The API then reads from this JSON file to display the dates available for each dataset. Contributing new data for existing datasets:Most datasets are not regularly updated at this point (eg: Goal:The goal of this ingest pipeline is to minimize as much as possible the manual steps needed when ingesting data during initial and recurring deliveries. Unknowns:
Lowest complexity/ 1st iteration implementation of the ingestion pipeline:1. Delivering the data:Scientists use the AWS CLI to copy datafiles ( 2. Tiggering the ingest:S3 Lambda trigger runs on 3. Ingest script:The S3 trigger lambda function executes the following steps (the lambda function is packages with GDAL and rasterio using the Potential improvements:
|
Goal: put together an ADR describing the ingestion pipeline |
Related to #37
This includes (in order to complexity to implement):
nodata
valuenodata
valueThe text was updated successfully, but these errors were encountered: