Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate STAC catalog option #187

Open
huard opened this issue Nov 12, 2020 · 3 comments
Open

Investigate STAC catalog option #187

huard opened this issue Nov 12, 2020 · 3 comments

Comments

@huard
Copy link
Contributor

huard commented Nov 12, 2020

STAC could be used server side as a catalog server. If I understand correctly, there is an ESM extension in development by the Pangeo community.

Ideally, THREDDS would have a STAC endpoint serving STAC + ESM catalogs. This is however unlikely to occur in the near to mid-future. I suspect we'll have to deploy software to convert the THREDDS catalog to a STAC + ESM catalog, then serve this using a dedicated STAC server. The client could be intake-stac with intake-xarray.

My suggestion for short term progress would be to

  • Find/write THREDDS crawler based on TDS Catalog
  • Write function to export to intake-esm catalog (planning to switch to future stac-esm catalog)
  • Demonstrate usage with intake client

While

  • Contributing to ESM extension schema
  • Installing STAC server
  • Building experience with intake-stac client

Tools

  • siphon to interact with the THREDDS catalog. Could possibly have a plugin to export catalog object to STAC or Intake catalog object.
  • pystac: read/write stac catalogs

References

@philipkershaw
Copy link

Hi David @huard - found this issue from pangeo-forge/cmip6-pipeline#7 Just wanted to highlight that for the ESGF future architecture work we are planning to ditch using THREDDS catalogues for datasets. The new container-based release of ESGF that we are trialling does just this. All dataset information would need to be referenced direct from the ESGF Search API. In the longer run we are looking at community standards for the search API for ESGF. An ESM profile of STAC could be a good candidate.

@huard
Copy link
Contributor Author

huard commented Nov 13, 2020

Hi Phil @philipkershaw !

We are relying on the NcML API to provide aggregated views of multiple files composing the same dataset (periods, members, variables), as well as other OGC APIs (WMS, WCS). Do you know if support for these APIs is in the cards within the new ESGF stack ?

I think it would be worthwhile for us to get better acquainted with the ESGF new architecture roadmap to schedule our own efforts and collaborate. Are there documents we can look into ?

@philipkershaw
Copy link

There would be nothing to stop a data provider in ESGF generating specific THREDDS catalogues and NcML for the purposes of aggregations or whatever else. We have had some experience of this at CEDA with the ESA CCI data we host.

However, the change for the future architecture is that the new publishing system would not generate THREDDS catalogues per dataset as part of the publishing process. We are still some time away from having deployment of the new system across the federation. Planning is still underway for the roadmap to get to this point. The next step is to deploy the new system as a pilot at sites that would like to participate. There is a project board for the new work:

https://github.com/orgs/ESGF/projects/1

...but this is more of detailed development view than a broader roadmap. There is also the ESGF Future Architecture report which has an analysis of the existing system and proposals for changes and a roadmap for implementing these:

https://doi.org/10.5281/zenodo.3928222

I will try and keep you up to date with our plans and would be happy to discuss further if you have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants