Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans to provide an official docker image? #1949

Open
2 tasks done
fmigneault opened this issue Oct 10, 2024 · 5 comments
Open
2 tasks done

Plans to provide an official docker image? #1949

fmigneault opened this issue Oct 10, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@fmigneault
Copy link

fmigneault commented Oct 10, 2024

Addressing a Problem?

There is a growing number of applications involving pre-processing Climate data to perform case studies analysis. These analyses are often involved in a larger workflow processing chain, that needs containerization and encapsulation of the dependencies required for each step. Sometimes, analyses try to combine Earth Observation with Climate data, leading to package conflicts. Other times, parts of workflow processing chains need to be dispatched to different locations to address different resource requirements, platform availability or data-access requirements.

An example of how Earth Observation + Climate data workflow could be addressed is by using Common Workflow Language (CWL) and OGC API - Processes (e.g.: https://github.com/crim-ca/weaver). This is actually considered in ongoing work in OGC Testbed-20 for GeoDataCubes. CWL + OGC API - Processes has also been discussed during the recent OGC 2024 Climate Services Code Sprint.

However, whenever a user wants to employ climate indices such as provided by xclim, they need to redefine their own Python environment and manage dependencies. They also need to figure out how to build docker images and publish them to container registries, which is not an easy feat for everyone. The scientific community would benefit from a pre-built docker image that could be directly pulled and employed in a processing workflow.

Potential Solution

Provide an official Dockerfile with all relevant dependencies for climate indices analysis, and publish images built from it in a public container registry (DockerHub or directly on the xclim GitHub container registry). The docker image would simply have xclim CLI as its entrypoint to be ready to use directly.

Additional context

This is something that will most probably be needed for OGC Testbed-20 for GeoDataCubes efforts. Therefore, I want to discuss the idea of adding globally to xclim rather than doing it only on my end.

If such an image is provided, all platforms using Weaver (Ouranos PAVICS, CRIM Hirondelle, University of Toronto RedOak, ClimateData.ca) could potentially share a common xclim docker reference for larger and interoperable processing workflows.

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@fmigneault fmigneault added the enhancement New feature or request label Oct 10, 2024
@huard
Copy link
Collaborator

huard commented Oct 10, 2024

Sounds good !

@aulemahal
Copy link
Collaborator

Wow! That sounds nice!

The CLI entrypoint has not been maintained with the same attention as the other parts of the package, if you have any suggestions to improve it, we would interested I think!

Personnally, I did not like the fact that xclim would manage I/O, it seems quite out of scope. But a good cli would need to do that, maybe better than how it is currently done. Would it make sense to split it off to a new xclim-cli repo ?

@fmigneault
Copy link
Author

Which kind of I/O management do you foresee? I expect the CLI to receive some URI (local or remote file) and pass it down to the relevant index to compute, similar to the snippet in the README. Would there be other manipulations for other operations?

@aulemahal
Copy link
Collaborator

Anything involving xr.open_dataset and dask/chunking is I/O management to me in this context. Setting up a client / configuring workers would also count towards these "out-of-scope" manipulations. We already do those and the cli module is well isolated from the rest, so maybe my inquiétudes of the module growing and spilling over in the rest of the package are unjustified.

@Zeitsperre
Copy link
Collaborator

@fmigneault

I recently updated the Dockerfile "recipe" used in birdhouse images here: https://github.com/bird-house/cookiecutter-birdhouse/blob/master/%7B%7Bcookiecutter.project_slug%7D%7D/Dockerfile. If you want to use this as a basis for a Pull Request here, feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants