Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collection validation #53

Merged
merged 1 commit into from
Sep 20, 2023
Merged

Add collection validation #53

merged 1 commit into from
Sep 20, 2023

Conversation

gadomski
Copy link
Contributor

Includes:

  • Python dependencies file and instructions
  • CI
  • Actual fixes

N.B. there were a strange amount of numbers-as-strings in the bboxes, so I'd guess there's something funny going on in the data generation step where the inputs are being stringified? I'll try to track down where this stuff comes from but I'm sure someone already knows :-P.

I also don't know if fixing the collections here in this repo is the first/only step -- I'm assuming these changes need to be sync-d with a DB somewhere?

@gadomski gadomski self-assigned this Sep 19, 2023
Includes:

- Python dependencies file and instructions
- CI
- Actual fixes
@j08lue
Copy link
Contributor

j08lue commented Sep 20, 2023

there were a strange amount of numbers-as-strings in the bboxes

That sounds familiar - I think we had that with EPSG codes, too, before and it had to do with serialization/deserialization, but @anayeaye would know better...

@gadomski
Copy link
Contributor Author

I think we had that with EPSG codes, too

The EPSG issue also appeared here: #57.

@anayeaye
Copy link
Contributor

@gadomski we have had an assortment of invalid number formats and I think we determined that an ingest queueing DDB JSON serialization step was the root cause, which was fixed downstream here. In some cases updating the stac-extension version in items led to more graceful handling of numeric formats (need to find the ref where @jsignell worked on a pystac fix).

We've gone back and forth between updating and using the fixed construct from eoapi-cdk (which is behind our pgstac version) and adding the fix to our veda-stac-ingestor for now. @ividito did we end up making a change in veda-stac-ingestor?

@ividito
Copy link
Collaborator

ividito commented Sep 20, 2023

@ividito did we end up making a change in veda-stac-ingestor?

Nope, it made it into our different ingest-api experiments but never got applied to our live ingestor.

I also don't know if fixing the collections here in this repo is the first/only step -- I'm assuming these changes need to be sync-d with a DB somewhere?

I think these fixes reflect the various changes made in the ingestor at different points in time. I'll double check a few of them, but the version in the DB should be accurate already. Changing the files here will only make sure that future ingests of the same data use the correct input format.

@gadomski
Copy link
Contributor Author

need to find the ref where @jsignell worked on a pystac fix

The issue is stac-utils/pystac#1044 but there isn't a fix, and we never opened the tracking issue mentioned in stac-utils/pystac#1044 (comment). There's some larger issues around extensions (stac-utils/pystac#1051, stac-utils/pystac#448) and serialization (stac-utils/pystac#1092) in pystac, so a correction probably won't come from there anytime soon.

@gadomski gadomski merged commit 10519f1 into main Sep 20, 2023
1 check passed
@gadomski gadomski deleted the validate branch September 20, 2023 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants