Skip to content

Latest commit

 

History

History
757 lines (549 loc) · 44.5 KB

File metadata and controls

757 lines (549 loc) · 44.5 KB

SWOOP

This module defines the resources required for the SWOOP: STAC Workflow Open Orchestration Platform. This includes:

Installation

Please run the following steps at the top level of the filmdrop-k8s-tf-modules project.

For recommended VM settings and other kubernetes guidance, please check the Operations Guide

The commands below require you to be on top level directory of the filmdrop-k8s-tf-modules project.

  1. First, update local.tfvars or create your own .tfvars:
  • For enabling swoop-api and it's dependent services you will need to enable at least the following from your tfvars:
deploy_swoop_api          = true
deploy_swoop_caboose      = true
deploy_swoop_conductor    = true
deploy_db_migration       = true
deploy_argo_workflows     = true
deploy_postgres           = true
deploy_db_init            = true
deploy_minio              = true
deploy_workflow_config    = true
  • If you would like to automatically expose the swoop-api, minio and postgres ports in your local environment, you can enable an ingress-nginx that has been provided for this purpose. First for enabling the ingress-nginx module, make sure to update local.tfvars or your own .tfvars with the following:
deploy_ingress_nginx      = true
  • Lastly, if you do decide to use the ingress-nginx load balancer to expose your application, you can control which local port would you want to forward the service port via the nginx_extra_values variable in the local.tfvars or your own .tfvars:
nginx_extra_values = {
  "tcp.<LOCAL_MACHINE_PORT>" = "<NAMESPACE>/<SERVICE_NAME>:<SERVICE_PORT>"
}
  • For swoop-api, minio and postgres, the default nginx_extra_values configuration would look like:
nginx_extra_values = {
  "tcp.8000"  = "swoop/swoop-api:8000"
  "tcp.9000"  = "io/minio:9000"
  "tcp.9001"  = "io/minio:9001"
  "tcp.5432"  = "db/postgres:5432"
}
  1. Next, initialize terraform:
terraform init
  1. Validate that the terraform resources are valid. If your terraform is valid the validate command will respond with "Success! The configuration is valid."
terraform validate
  1. Run a terraform plan. The terraform plan will give you a summary of all the changes terraform will perform prior to deploying any change. You will a need
terraform plan -var-file=local.tfvars
  1. Deploy the changes by applying the terraform plan. You will be asked to confirm the changes and must respond with "yes".
terraform apply -var-file=local.tfvars

Connecting to SWOOP API

Connecting with Ingress Nginx

If you decided to enable the ingress-nginx module, then you do not need to do anything else to expose your service ports! You should be able to reach out your services via your localhost without the need of port-forwarding. For example:

swoop-api:800 -> localhost:8000
minio:9000 -> localhost:9000
minio:9001 -> localhost:9001
postgres:5432 -> localhost:5432

Connecting without Ingress Nginx

Once the chart has been deployed, you should see at least 3 deployments: postgres, minio and swoop-api.

SWOOP Deployment



In order to start using the services used by this helm chart, you will need to port-forward postgres onto localhost port 5432, port-forward minio onto localhost ports 9000 & 9001 and port-forward swoop-api onto localhost port 8000.

Via Rancher Desktop:

Port forwarding SWOOP



or via terminal:

kubectl port-forward -n swoop svc/swoop-api 8000:8000 &
kubectl port-forward -n db svc/postgres 5432:5432 &
kubectl port-forward -n io svc/minio 9000:9000 &
kubectl port-forward -n io svc/minio 9001:9001 &

You will see now, that if you reach the swoop api http://localhost:8000/, you should see a sample response:

$ curl http://localhost:8000/

{"title":"Example processing server","description":"Example server implementing the OGC API - Processes 1.0 Standard","links":[{"href":"http://localhost:8000/conformance","rel":"http://www.opengis.net/def/rel/ogc/1.0/conformance","type":"application/json"}]}%



API tests with Database

To test the API endpoints that make use of data in the postgres database, you will need to load data into the postgres state database or use swoop-db to initialize the schema and load test migrations.

If you want database sample data to test the API, run the following swoop-db commands on the postgres pods to apply the migrations and load the fixtures:

kubectl exec -it --namespace=db svc/postgres  -- /bin/sh -c "swoop-db up"
kubectl exec -it --namespace=db svc/postgres  -- /bin/sh -c "swoop-db load-fixture base_01"

After loading the database, you should be able to see the jobs in the swoop api jobs endpoint http://localhost:8000/jobs/:

$ curl http://localhost:8000/jobs/

{"jobs":[{"processID":"action_1","type":"process","jobID":"0187c88d-a9e0-788c-adcb-c0b951f8be91","status":"successful","created":"2023-04-28T15:49:00+00:00","started":"2023-04-28T15:49:02+00:00","finished":"2023-04-28T15:49:03+00:00","updated":"2023-04-28T15:49:03+00:00","links":[{"href":"http://localhost:8000/","rel":"root","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-788c-adcb-c0b951f8be91","rel":"self","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-788c-adcb-c0b951f8be91/results","rel":"results","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-788c-adcb-c0b951f8be91/inputs","rel":"inputs","type":"application/json"},{"href":"http://localhost:8000/processes/action_1","rel":"process","type":"application/json"},{"href":"http://localhost:8000/payloadCacheEntries/ade69fe7-1d7d-572e-9f36-7242cc2aca77","rel":"cache","type":"application/json"}]},{"processID":"action_2","type":"process","jobID":"0187c88d-a9e0-757e-aa36-2fbb6c834cb5","status":"accepted","created":"2023-04-28T15:49:00+00:00","started":null,"finished":null,"updated":"2023-04-28T15:49:00+00:00","links":[{"href":"http://localhost:8000/","rel":"root","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-757e-aa36-2fbb6c834cb5","rel":"self","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-757e-aa36-2fbb6c834cb5/results","rel":"results","type":"application/json"},{"href":"http://localhost:8000/jobs/0187c88d-a9e0-757e-aa36-2fbb6c834cb5/inputs","rel":"inputs","type":"application/json"},{"href":"http://localhost:8000/processes/action_2","rel":"process","type":"application/json"},{"href":"http://localhost:8000/payloadCacheEntries/ade69fe7-1d7d-572e-9f36-7242cc2aca77","rel":"cache","type":"application/json"}]}],"links":[{"href":"http://localhost:8000/","rel":"root","type":"application/json"},{"href":"http://localhost:8000/jobs/","rel":"self","type":"application/json"}]}%

API tests with Object Storage

In order to load data into MinIO, follow these steps:

Install First the MinIO client by running

brew install minio/stable/mc

Then set the MinIO alias, find the ACCESS_KEY and SECRET_KEY by quering the Helm values

export MINIO_ACCESS_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.access_key_id}} | base64 --decode`
export MINIO_SECRET_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.secret_access_key}} | base64 --decode`
mc alias set swoopminio http://127.0.0.1:9000 $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

Test MinIO connection by running

$ mc admin info swoopminio

●  127.0.0.1:9000
   Uptime: 23 minutes
   Version: 2023-06-02T23:17:26Z
   Network: 1/1 OK
   Drives: 1/1 OK
   Pool: 1

Pools:
   1st, Erasure sets: 1, Drives per erasure set: 1

0 B Used, 1 Bucket, 0 Objects
1 drive online, 0 drives offline

Load data into MinIO by running

First clone the https://github.com/Element84/swoop repository locally, and then run the following from the top level of the your local swoop clone:

$ mc cp --recursive tests/fixtures/io/base_01/ swoopminio/swoop/executions/2595f2da-81a6-423c-84db-935e6791046e/

...fixtures/io/base_01/output.json: 181 B / 181 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.67 KiB/s 0s

View your data on MinIO by opening your browser on http://localhost:9001/ and logging into MinIO

Retrieve username by running:

helm get values minio -n io -a -o json | jq -r .minio.service.accessKeyId | base64 --decode

Retrieve password by running:

helm get values minio -n io -a -o json | jq -r .minio.service.secretAccessKey | base64 --decode

Open MinIO dashboard by opening your browser on http://localhost:9001/ and logging into MinIO using the credentials above:

SWOOP MinIO



Test API with MinIO by running

$ curl http://localhost:8000/jobs/2595f2da-81a6-423c-84db-935e6791046e/inputs

{"process_id":"2595f2da-81a6-423c-84db-935e6791046e","payload":"test_input"}%
$ curl http://localhost:8000/jobs/2595f2da-81a6-423c-84db-935e6791046e/results

{"process_id":"2595f2da-81a6-423c-84db-935e6791046e","payload":"test_output"}%



Validating SWOOP Caboose installation

Check the logs of the swoop-caboose pod and check your workers have started via:

$ kubectl logs -n swoop svc/swoop-caboose

time="2023-07-31T21:26:13Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
2023/07/31 21:26:13 starting worker 0
2023/07/31 21:26:13 starting worker 1
2023/07/31 21:26:13 starting worker 2



Running an Argo workflow

For full documentation for argo workflows, please visit the Argo Workflows Official Documentation.

For a full list of customization values for the argo-workflows helm chart, visit Argo Workflows ArtifactHub.

Pre-requisites to running Argo Workflows example

You will need an AWS account, with an S3 bucket and an IAM user with read/write access to that bucket.

  1. Go to your account in AWS, and create an S3 bucket for your assets, for example:

Argo Assets Bucket



  1. Create an IAM user with a read/write policy to your bucket.

Argo IAM User

Argo IAM User Permissions



Installation of Argo Workflows

Argo Workflows have been included as part of the swoop-bundle, as a dependency of swoop-caboose. To deploy argo workflows follow these steps:

The commands below require you to be on top level directory of the filmdrop-k8s-tf-modules project.

  1. First, update local.tfvars or create your own .tfvars:
  • For enabling swoop-api and it's dependent services you will need to enable at least the following from your tfvars:
deploy_swoop_api          = true
deploy_swoop_caboose      = true
deploy_swoop_conductor    = true
deploy_db_migration       = true
deploy_argo_workflows     = true
deploy_postgres           = true
deploy_db_init            = true
deploy_minio              = true
deploy_workflow_config    = true
  1. Next, initialize terraform:
terraform init
  1. Validate that the terraform resources are valid. If your terraform is valid the validate command will respond with "Success! The configuration is valid."
terraform validate
  1. Run a terraform plan. The terraform plan will give you a summary of all the changes terraform will perform prior to deploying any change. You will a need
terraform plan -var-file=local.tfvars
  1. Deploy the changes by applying the terraform plan. You will be asked to confirm the changes and must respond with "yes".
terraform apply -var-file=local.tfvars

Running Argo Workflows Copy Assets Stac Task

The publish-stac-task example will run an argo workflow which copies specified Assets from Source STAC Item(s), uploads them to S3 and updates Item assets hrefs to point to the new location.

Now, prior to running the argo workflows example, first make sure to port-forward minio onto localhost ports 9000 & 9001.

Via Rancher Desktop:

Port forwarding Minio



or via terminal:

kubectl port-forward -n io svc/minio 9000:9000 &
kubectl port-forward -n io svc/minio 9001:9001 &

Install First the MinIO client by running

brew install minio/stable/mc

Then set the MinIO alias, find the ACCESS_KEY and SECRET_KEY by quering the Helm values

export MINIO_ACCESS_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.access_key_id}} | base64 --decode`
export MINIO_SECRET_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.secret_access_key}} | base64 --decode`
mc alias set swoopminio http://127.0.0.1:9000 $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

Run the publish-stac-task argo workflow example

First clone the publish-stac-task repository.

After cloning the publish-stac-task repository, first proceed to modify the payload_workflow.json and replace the <REPLACE_WITH_ASSETS_S3_BUCKET_NAME> with the bucket name created on the Pre-requisites section.

Create a public minio local bucket and copy the payload_workflow.json file after replacing the <REPLACE_WITH_ASSETS_S3_BUCKET_NAME> name. To do this via the minio client:

$ mc mb swoopminio/payloadtest

Bucket created successfully `swoopminio/payloadtest`.
$ mc anonymous set public swoopminio/payloadtest

Access permission for `swoopminio/payloadtest` is set to `public`
$ mc cp ./payload_workflow.json swoopminio/payloadtest/

...opy-assets-stac-task/payload_workflow.json: 4.58 KiB / 4.58 KiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.61 KiB/s 0s

Now modify the secrets.yaml and replace the <REPLACE_WITH_BASE64_AWS_ACCESS_KEY>, <REPLACE_WITH_BASE64_AWS_SECRET_ACCESS_KEY> and <REPLACE_WITH_BASE64_AWS_REGION> with the base64 encoded version of the AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY and AWS_REGION of the iam user and bucket created on the Pre-requisites section.

Create the kubernetes secret by executing:

kubectl apply -n swoop -f ./secret.yaml

Then modify the <REPLACE_WITH_MINIO_HOST>:<REPLACE_WITH_MINIO_PORT> variables inside the workflow_copyassets_no_template.yaml and the workflow_copyassets_with_template.yaml. If you're running this example from a minio created by the terraform stack, you should be able to replace <REPLACE_WITH_MINIO_HOST>:<REPLACE_WITH_MINIO_PORT> with minio.io:9000.

Finally, run and watch the argo workflow task via:

argo submit -n swoop --watch ./workflow_copyassets_no_template.yaml

You should be able to see your workflow pod succeed in the terminal:

Argo Workflow Success



And you should be able to see your asset S3 bucket populated with a tumbnail image:

aws s3 ls s3://<REPLACE_WITH_ASSETS_S3_BUCKET_NAME>/data/naip/tx_m_2609719_se_14_060_20201217/

2023-08-17 10:00:08       9776 thumbnail.jpg

Similarly, you should be able to run the workflow_copyassets_with_template.yaml workflow in the publish-stac-task repository by following these steps:

  1. Create first the workflow template by the following command:
kubectl apply -n swoop -f ./workflow-template.yaml
  1. Submit the argo workflow by:
argo submit -n swoop --watch ./workflow_copyassets_with_template.yaml



Notes:

  • When utilizing the argo workflow installation provided via the swoop-bundle, you should be able to run argo workflows in the following manner:
argo submit -n swoop --watch <FULL PATH TO THE SAMPLE WORKFLOW YAML FILE>
  • The're is a service account created to support the required argo permissions serviceAccountName: argo
  • You should not expect the argo server, nor archive logs to be functional with the default argo installation. In order to enable those settings please see:



Running SWOOP Conductor with Mirror Workflow

This helm chart will deploy SWOOP Conductor onto a Kubernetes cluster.

Installation

The commands below require you to be on top level directory of the filmdrop-k8s-tf-modules project.

  1. First, update local.tfvars or create your own .tfvars:
  • For enabling swoop-api and it's dependent services you will need to enable at least the following from your tfvars:
deploy_swoop_api          = true
deploy_swoop_caboose      = true
deploy_swoop_conductor    = true
deploy_db_migration       = true
deploy_argo_workflows     = true
deploy_postgres           = true
deploy_db_init            = true
deploy_minio              = true
deploy_workflow_config    = true

aws_access_key = "<REPLACE_WITH_BASE64_AWS_ACCESS_KEY>"
aws_secret_access_key = "<REPLACE_WITH_BASE64_AWS_SECRET_ACCESS_KEY>"
aws_region = "<REPLACE_WITH_BASE64_AWS_REGION>"
aws_session_token = "<REPLACE_WITH_BASE64_AWS_SESSION_TOKEN>"
swoop_workflow_output_s3_bucket = "<REPLACE_WITH_SWOOP_WORKFLOW_OUTPUT_S3_BUCKET_NAME>"
  1. Next, initialize terraform:
terraform init
  1. Validate that the terraform resources are valid. If your terraform is valid the validate command will respond with "Success! The configuration is valid."
terraform validate
  1. Run a terraform plan. The terraform plan will give you a summary of all the changes terraform will perform prior to deploying any change. You will a need
terraform plan -var-file=local.tfvars
  1. Deploy the changes by applying the terraform plan. You will be asked to confirm the changes and must respond with "yes".
terraform apply -var-file=local.tfvars

After the terraform apply succeeds, you can do:

kubectl get workflowtemplate -n swoop and

kubectl get configmap -n swoop

to see that the workflow templates and SWOOP configmap that were deployed.

Testing swoop-conductor and running mirror-workflow

First port-forward swoop-api to your local port 8000:

kubectl port-forward -n swoop svc/swoop-api 8000:8000 &

Then clone the publish-stac-task repository locally, and then run the following from the top level of the your local publish-stac-task clone:

python3 swoop_api_payload_test.py

The script will ask you for 2 inputs:

Enter SWOOP API Host (localhost:8000): localhost:8000
Enter STAC ASSETS BUCKET NAME: <REPLACE_WITH_ASSETS_S3_BUCKET_NAME>

Replace the <REPLACE_WITH_ASSETS_S3_BUCKET_NAME>` with an S3 Bucket name. The credentials that you configured in the previous "Installing and configuring workflow-config helm chart" section should have read and write access to this bucket.

You should see in the response a status of "accepted"

$ python3 swoop_api_payload_test.py

Enter SWOOP API Host (localhost:8000): localhost:8000
Enter STAC ASSETS BUCKET NAME: REDACTED_BUCKET_NAME
***** INPUTS SUMMARY *****
SWOOP API Host: localhost:8000
STAC ASSETS BUCKET NAME: REDACTED_BUCKET_NAME
POST URL: http://localhost:8000/processes/mirror/execution
POST Input payload:
{"inputs": {"payload": {"id": "test", "type": "FeatureCollection", "features": [{"id": "tx_m_2609719_se_14_060_20201217", "bbox": [-97.690252, 26.622563, -97.622203, 26.689923], "type": "Feature", "links": [{"rel": "collection", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip"}, {"rel": "parent", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip"}, {"rel": "root", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/"}, {"rel": "self", "type": "application/geo+json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip/items/tx_m_2609719_se_14_060_20201217"}, {"rel": "preview", "href": "https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=naip&item=tx_m_2609719_se_14_060_20201217", "title": "Map of item", "type": "text/html"}], "assets": {"image": {"href": "https://naipeuwest.blob.core.windows.net/naip/v002/tx/2020/tx_060cm_2020/26097/m_2609719_se_14_060_20201217.tif", "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": ["data"], "title": "RGBIR COG tile", "eo:bands": [{"name": "Red", "common_name": "red"}, {"name": "Green", "common_name": "green"}, {"name": "Blue", "common_name": "blue"}, {"name": "NIR", "common_name": "nir", "description": "near-infrared"}]}, "thumbnail": {"href": "https://naipeuwest.blob.core.windows.net/naip/v002/tx/2020/tx_060cm_2020/26097/m_2609719_se_14_060_20201217.200.jpg", "type": "image/jpeg", "roles": ["thumbnail"], "title": "Thumbnail"}}, "geometry": {"type": "Polygon", "coordinates": [[[-97.623004, 26.622563], [-97.622203, 26.689286], [-97.68949, 26.689923], [-97.690252, 26.623198], [-97.623004, 26.622563]]]}, "collection": "naip", "properties": {"gsd": 0.6, "datetime": "2020-12-17T00:00:00Z", "naip:year": "2020", "proj:bbox": [630384, 2945370, 637080, 2952762], "proj:epsg": 26914, "naip:state": "tx", "proj:shape": [12320, 11160], "proj:transform": [0.6, 0, 630384, 0, -0.6, 2952762, 0, 0, 1]}, "stac_extensions": ["https://stac-extensions.github.io/eo/v1.0.0/schema.json", "https://stac-extensions.github.io/projection/v1.0.0/schema.json"], "stac_version": "1.0.0"}], "process": [{"description": "string", "workflow": "mirror", "upload_options": {"path_template": "s3://REDACTED_BUCKET_NAME/data/${collection}/${id}/", "collections": {"naip": "*"}, "public_assets": [], "s3_urls": false}, "tasks": {"copy-assets": {"assets": ["thumbnail"], "drop_assets": ["image"]}, "publish": {"public": false, "stac_validate": true}}}]}}, "response": "document"}
**************************
**** RESPONSE SUMMARY ****
Response from POST to http://localhost:8000/processes/mirror/execution:
 {"processID":"mirror","type":"process","jobID":"018a528f-a4f8-7094-821a-1b8eef82e290","status":"accepted","message":null,"created":"2023-09-01T21:04:20.216530+00:00","started":null,"finished":null,"updated":"2023-09-01T21:04:20.216530+00:00","links":[{"href":"http://localhost:8000/","rel":"root","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290","rel":"self","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290/results","rel":"results","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290/inputs","rel":"inputs","type":"application/json"},{"href":"http://localhost:8000/processes/mirror","rel":"process","type":"application/json"},{"href":"http://localhost:8000/payloadCacheEntries/f7ceceb1-5871-54c9-837d-d24380cb252b","rel":"cache","type":"application/json"}]}
**************************

You should see now a 018a528f-a4f8-7094-821a-1b8eef82e290-copy-task-someid pod matching the jobID 018a528f-a4f8-7094-821a-1b8eef82e290 running via:

$ kubectl get pods

NAME                                                              READY   STATUS      RESTARTS   AGE
migration-job-8-hpsnq                                             0/1     Completed   0          7m29s
wait-for-migration-job-8-g7nd7                                    0/1     Completed   0          7m29s
swoop-bundle-argo-workflows-workflow-controller-6654d9947-nsb7c   1/1     Running     0          7m29s
swoop-api-7b45d969fc-h6vj4                                        1/1     Running     0          7m29s
wait-for-swoop-api-8-6hb6b                                        0/1     Completed   0          7m29s
swoop-caboose-7fb689f8b8-d8ncs                                    1/1     Running     0          7m29s
swoop-conductor-65b98d6d8-4klz4                                   1/1     Running     0          7m29s
wait-for-swoop-caboose-8-px4bd                                    0/1     Completed   0          7m29s
018a528f-a4f8-7094-821a-1b8eef82e290-copy-task-3015836743         0/2     Completed   0          102s
018a528f-a4f8-7094-821a-1b8eef82e290-publish-2564547885           0/2     Completed   0          38s

In a couple of minutes, the job will complete, and you will not see the pod running anymore with kubectl get pods.

Now if you run the same swoop-api test, with the same parameters, you will see the results from the cache with a status of successful:

$ python3 swoop_api_payload_test.py

Enter SWOOP API Host (localhost:8000): localhost:8000
Enter STAC ASSETS BUCKET NAME: REDACTED_BUCKET_NAME
***** INPUTS SUMMARY *****
SWOOP API Host: localhost:8000
STAC ASSETS BUCKET NAME: REDACTED_BUCKET_NAME
POST URL: http://localhost:8000/processes/mirror/execution
POST Input payload:
{"inputs": {"payload": {"id": "test", "type": "FeatureCollection", "features": [{"id": "tx_m_2609719_se_14_060_20201217", "bbox": [-97.690252, 26.622563, -97.622203, 26.689923], "type": "Feature", "links": [{"rel": "collection", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip"}, {"rel": "parent", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip"}, {"rel": "root", "type": "application/json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/"}, {"rel": "self", "type": "application/geo+json", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip/items/tx_m_2609719_se_14_060_20201217"}, {"rel": "preview", "href": "https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=naip&item=tx_m_2609719_se_14_060_20201217", "title": "Map of item", "type": "text/html"}], "assets": {"image": {"href": "https://naipeuwest.blob.core.windows.net/naip/v002/tx/2020/tx_060cm_2020/26097/m_2609719_se_14_060_20201217.tif", "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": ["data"], "title": "RGBIR COG tile", "eo:bands": [{"name": "Red", "common_name": "red"}, {"name": "Green", "common_name": "green"}, {"name": "Blue", "common_name": "blue"}, {"name": "NIR", "common_name": "nir", "description": "near-infrared"}]}, "thumbnail": {"href": "https://naipeuwest.blob.core.windows.net/naip/v002/tx/2020/tx_060cm_2020/26097/m_2609719_se_14_060_20201217.200.jpg", "type": "image/jpeg", "roles": ["thumbnail"], "title": "Thumbnail"}}, "geometry": {"type": "Polygon", "coordinates": [[[-97.623004, 26.622563], [-97.622203, 26.689286], [-97.68949, 26.689923], [-97.690252, 26.623198], [-97.623004, 26.622563]]]}, "collection": "naip", "properties": {"gsd": 0.6, "datetime": "2020-12-17T00:00:00Z", "naip:year": "2020", "proj:bbox": [630384, 2945370, 637080, 2952762], "proj:epsg": 26914, "naip:state": "tx", "proj:shape": [12320, 11160], "proj:transform": [0.6, 0, 630384, 0, -0.6, 2952762, 0, 0, 1]}, "stac_extensions": ["https://stac-extensions.github.io/eo/v1.0.0/schema.json", "https://stac-extensions.github.io/projection/v1.0.0/schema.json"], "stac_version": "1.0.0"}], "process": [{"description": "string", "workflow": "mirror", "upload_options": {"path_template": "s3://REDACTED_BUCKET_NAME/data/${collection}/${id}/", "collections": {"naip": "*"}, "public_assets": [], "s3_urls": false}, "tasks": {"copy-assets": {"assets": ["thumbnail"], "drop_assets": ["image"]}, "publish": {"public": false, "stac_validate": true}}}]}}, "response": "document"}
**************************
**** RESPONSE SUMMARY ****
Response from POST to http://localhost:8000/processes/mirror/execution:
 {"processID":"mirror","type":"process","jobID":"018a528f-a4f8-7094-821a-1b8eef82e290","status":"successful","created":"2023-09-01T21:04:20.216530+00:00","started":"2023-09-01T21:04:22+00:00","finished":"2023-09-01T21:06:05+00:00","updated":"2023-09-01T21:06:05+00:00","links":[{"href":"http://localhost:8000/","rel":"root","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290","rel":"self","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290/results","rel":"results","type":"application/json"},{"href":"http://localhost:8000/jobs/018a528f-a4f8-7094-821a-1b8eef82e290/inputs","rel":"inputs","type":"application/json"},{"href":"http://localhost:8000/processes/mirror","rel":"process","type":"application/json"},{"href":"http://localhost:8000/payloadCacheEntries/f7ceceb1-5871-54c9-837d-d24380cb252b","rel":"cache","type":"application/json"}]}
**************************

Lastly, you should be able to see the output on MinIO.

First port-forward MinIO port 9000 via:

kubectl port-forward -n swoop svc/minio 9000:9000 &

Then connect to the MinIO client via:

export MINIO_ACCESS_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.access_key_id}} | base64 --decode`
export MINIO_SECRET_KEY=`kubectl get secrets -n io minio-secret-credentials --template={{.data.secret_access_key}} | base64 --decode`
mc alias set swoopminio http://127.0.0.1:9000 $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

Check that you have an output file:

$ mc ls swoopminio/swoop/executions/018a528f-a4f8-7094-821a-1b8eef82e290/

[2023-08-29 13:14:57 EDT] 2.5KiB STANDARD input.json
[2023-08-29 13:15:50 EDT] 2.1KiB STANDARD output.json
[2023-08-29 13:16:00 EDT] 5.0KiB STANDARD workflow.json

Copy the output file locally:

$ mc cp swoopminio/swoop/executions/018a528f-a4f8-7094-821a-1b8eef82e290/output.json .
...528f-a4f8-7094-821a-1b8eef82e290/output.json: 2.34 KiB / 2.34 KiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.18 KiB/s 0s

Check the contents of the output.json:

$ cat output.json

{"type": "FeatureCollection", "features": [{"type": "Feature", "stac_version": "1.0.0", "id": "tx_m_2609719_se_14_060_20201217", "properties": {"gsd": 0.6, "datetime": "2020-12-17T00:00:00Z", "naip:year": "2020", "proj:bbox": [630384, 2945370, 637080, 2952762], "proj:epsg": 26914, "naip:state": "tx", "proj:shape": [12320, 11160], "proj:transform": [0.6, 0, 630384, 0, -0.6, 2952762, 0, 0, 1], "processing:software": {"publish": "0.1.0"}, "created": "2023-09-01T20:48:19.861207+00:00", "updated": "2023-09-01T21:05:42.233712+00:00"}, "geometry": {"type": "Polygon", "coordinates": [[[-97.623004, 26.622563], [-97.622203, 26.689286], [-97.68949, 26.689923], [-97.690252, 26.623198], [-97.623004, 26.622563]]]}, "links": [{"rel": "self", "href": "s3://REDACTED_BUCKET_NAME/data/naip/tx_m_2609719_se_14_060_20201217/tx_m_2609719_se_14_060_20201217.json", "type": "application/json"}, {"rel": "canonical", "href": "s3://REDACTED_BUCKET_NAME/data/naip/tx_m_2609719_se_14_060_20201217/tx_m_2609719_se_14_060_20201217.json", "type": "application/json"}, {"rel": "collection", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip", "type": "application/json"}, {"rel": "parent", "href": "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip", "type": "application/json"}, {"rel": "preview", "href": "https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=naip&item=tx_m_2609719_se_14_060_20201217", "type": "text/html", "title": "Map of item"}], "assets": {"thumbnail": {"href": "https://REDACTED_BUCKET_NAME.s3.us-west-2.amazonaws.com/data/naip/tx_m_2609719_se_14_060_20201217/thumbnail.jpg", "type": "image/jpeg", "title": "Thumbnail", "roles": ["thumbnail"]}}, "bbox": [-97.690252, 26.622563, -97.622203, 26.689923], "stac_extensions": ["https://stac-extensions.github.io/processing/v1.1.0/schema.json", "https://stac-extensions.github.io/projection/v1.0.0/schema.json", "https://stac-extensions.github.io/eo/v1.0.0/schema.json"], "collection": "naip"}], "process": {"description": "string", "tasks": {"copy-assets": {"assets": ["thumbnail"], "drop_assets": ["image"]}, "publish": {"public": false, "stac_validate": true}}, "upload_options": {"path_template": "s3://REDACTED_BUCKET_NAME/data/${collection}/${id}/", "collections": {"naip": "*"}, "s3_urls": false, "public_assets": []}, "workflow": "mirror"}}%



Database Migrations on K8s

The Filmdrop K8s Terraform modules can be used to perform schema migrations on Postgres database pods in K8s.

Background

In order to apply database migrations in K8s, we first need an existing state database within which to apply migrations. The SWOOP infrastructure contains multiple components - the SWOOP API, Caboose, and Conductor- that also need to access this database with their own roles, each of which have certain privileges to perform actions on the database. To create a database with the appropriate set of roles after which migrations will be applied, two separate jobs have been created - a database initialization job and a database migration job. The purpose of the database initialization job is to create the set of appropriate roles and create a database named swoop where migrations will be later applied by the migration job. The initialization job also creates an 'owner' role named swoop that will own all of the objects created in the database, along with a 'read/write' role, and application roles for each individual SWOOP component (SWOOP API, SWOOP Caboose, and SWOOP Conductor) to access the database within which migrations will be applied. The application roles are members of the SWOOP read/write role. The initialization job does not create any tables within the swoop schema; these are created during the migrations. However, in order for the owner role to own the objects created during the migrations, the initialization job sets the owner role as the owner of the database. After the database is initialized, migrations can be performed; this is the purpose of the migration job, to migrate or rollback the swoop database to an appropriate database version. To prevent any issues with any existing active connections from any of the application roles interfering with the migration process, an optional no_wait parameter can be specified for the migration job that when set to false will wait for all active connections from any of the application roles to first close before applying any migrations. Once the migrations are complete, the swoop schema within the swoop database will contain all of the tables for the migration version applied, and the swoop.schema_version table will show the current migration version of the database and the time at which it was applied. After deploying to K8s, the pod for each SWOOP application component (api, caboose, and conductor) will have an initContainer that waits for the migrations to be applied before reaching a completed state, after which the pod gets deployed onto the cluster.

Deployment

In order to apply migrations via the Terraform modules, three flags need to be enabled at a minimum. The local.tfvars file should look like:

deploy_linkerd            = false
deploy_ingress_nginx      = false
deploy_grafana_prometheus = false
deploy_loki               = false
deploy_promtail           = false
deploy_argo_workflows     = false
deploy_titiler            = false
deploy_stacfastapi        = false
deploy_swoop_api          = false
deploy_swoop_caboose      = false
deploy_db_migration       = true
deploy_postgres           = true
deploy_db_init            = true
deploy_minio              = false
deploy_swoop_conductor    = false
deploy_workflow_config    = false

If only database initialiation is desired (with no migrations), then the deploy_db_migration flag should be set to false.

deploy_linkerd            = false
deploy_ingress_nginx      = false
deploy_grafana_prometheus = false
deploy_loki               = false
deploy_promtail           = false
deploy_argo_workflows     = false
deploy_titiler            = false
deploy_stacfastapi        = false
deploy_swoop_api          = false
deploy_swoop_caboose      = false
deploy_db_migration       = false
deploy_postgres           = true
deploy_db_init            = true
deploy_minio              = false
deploy_swoop_conductor    = false
deploy_workflow_config    = false

Make sure you have a K8s cluster running and your Kubeconfig file contains the proper credentials to authenticate to that cluster.

Then, do:

  1. Initialize Terraform:
terraform init
  1. Validate that the Terraform resources are valid. If your Terraform is valid the validate command will respond with "Success! The configuration is valid."
terraform validate
  1. Run a terraform plan. The terraform plan will give you a summary of all the changes terraform will perform prior to deploying any change.
terraform plan -var-file=local.tfvars
  1. Deploy the changes by applying the terraform plan. You will be asked to confirm the changes and must respond with "yes".
terraform apply -var-file=local.tfvars

This will deploy, in order, the postgres database, the database initialization job, and the database migration job (if enabled). These resources will ultimately get deployed through their helm charts, provided in the filmdrop-k8s-helm-charts.

After deploying, you will see a db namespace and a swoop namespace.

Namespaces deployed by db init/migrations


The db namespace will contain the database initialization pod and the swoop namespace will contain the migration job pod.

Pods in the db namespace


Pods in the db namespace


The logs from each of these pods should show messages coming from their respective scripts. For example:

Logs for db initialization pod


Logs for db migration pod


After deploying to K8s, you can port-forward the postgres service to a localhost port in Rancher Desktop as follows:

Port-forward postgres service


and connect to the database with pgAdmin using the swoop role:

Connect to database


You will see that there is a database named swoop containing all tables, and that all roles that were created by the database initialization script appear under Login/Group Roles. All of the objects in the swoop schema are owned by the swoop role, since that role was used to apply the migrations.

If you open the swoop.schema_version table, you wil see a record in the table with the migration version number (8, in this example) and the time at which it was applied:

Swoop schema_version table


Customization of deployment

The above procedure will create the database and run the migrations using default values for all parameters. However, the Filmdrop K8s Terraform modules allow you to do a lot of customization for any deployment. Customization is performed through setting variable values in Terraform. Variables have precedence, depending on where they are defined.

Any variable that is assigned a value in local.tfvars should be defined within the inputs.tf file in the root directory. The variable can have a default value that gets used if one is not explicitly provided in local.tfvars; this default value can be specified in inputs.tf. This repository is setup using the concept of a profile, a template combining multiple modules that configure an environment stack. A module is a unique collection of resources that get deployed as one unit onto K8s, and this repository contains separate modules for database, I/O, ingress, STAC, etc (see the modules directory),. There is only one profile named core used in this repo, and the configuration values for variables are passed into the profile via the local.tf file, also in the root directory. The core profile, located at ./profiles/core/profile.tf, deploys all of the individual Terraform modules using the values passed in from local.tf. The --var-file option in terraform apply overrides the value of any variable that is defined within the modules. If a value for a variable has been assigne within local.tfvars, that value gets used as the value for that variable, overriding any values defined within the core profile itself or any individual module used by the core profile for that variable. If, however, a variable is not contained within the local.tfvars file and/or not passed in to the core profile in local.tf, it gets the default value assigned to it in the profile itself at ./profiles/core/profile.tf.

Roles and permissions

The database initialization script creates, in order, an owner role (named swoop), a SWOOP read/write role (named swoop_readwrite), and three application roles (named swoop_api, swoop_caboose, and swoop_conductor) for each of the SWOOP components to access the database. All roles except the SWOOP read/write role are user roles, meaning that they have login privileges into the database as users. The swoop_readwrite role is a group role, which contains the three application roles as its members. Being a group role, any privileges assigned to the swoop_readwrite role are automatically assigned to its member roles as privileges are inherited by member roles from parent group roles by default in Postgres. The swoop_readwrite role, is given connection privileges on the swoop database, and has read/write privileges on objects within the schema swoop. During the migration job, however, connect privileges from the swoop_readwrite role are revoked initially if the NO_WAIT environment variable is set to false, so that all active connections any of the application roles to the database are closed first before any migrations are applied. Connect privileges are granted back to the swoop_readwrite role at the end of the migration job regardless of whether or not the job waited for any active connections from the application roles to close, to maintain a consistent state of the database regardless of whether or not we have waited for active connections to close.



Uninstall swoop-api

To uninstall the release, do terraform destroy -var-file=local.tfvars.