-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create frozen Conda environments for modules #2193
Comments
See also Paolo's post in #bioconda https://nfcore.slack.com/archives/CM46YC6BZ/p1677007405615889 |
See also discussion |
I believe wave supports conda-lock files now! My issue would be with readability on the environment.yml. I kinda just want to see what exactly we want and not the 100 dependencies. |
@emiller88 maybe we need a an environment-lock.yml in addition to the environment.yml? I know, another file, but would serve the different use cases of complete reproducibility vs flexible environment solve. Would get messy with different architectures though... |
Maybe a I think it's a trade-off at the end of the day. If you want to be sure about reproducibility, you use the container images. If you want to roll the dice, use conda. It'll get you pretty close 95% of the time. |
See where you're coming from, don't completely agree. I should be able to inspect the package complement of a frozen software env without poking about in a Docker image, and in an ideal world I'd like to be able to tweak an env to add something simple without rebuilding the whole thing (though since new thing may have its own deps I appreciate that's not a given). |
I think this was in a time before My issue is if they'll get updated and maintained. I think we can automate this now. Bump the |
Same as Automation as @edmundmiller says FTW 👍🏻 |
Made a proof of concept, forgot to link it here though nf-core/modules#5827 |
Description of feature
Problem
Conda environments are not reproducible over time. The sometimes large dependency trees mean you get a different software stack next week to the one you have today. This is bad for reproducible science.
The often used workaround for this has been to use Docker images, which have the effect of freezing dependency trees, but then if you find yourself rebuilding Docker images (e.g. to patch due to security concerns) you lose those frozen dependencies. Some (e.g. Paolo, I think) would say that really, we should be using Docker as a software delivery mechanism only.
A better way of doing this is to actually record the state of the environment when modules are created, and when the conda dependencies are updated, creating a frozen dependencies file that can be used to create environments when the workflows are run.
Available solutions
pythonspeed has an excellent (if not quite up to date) summary of this.
Essentially there are two ways to go.
conda env export
Create the environments, immediately record their state.
conda-lock
See https://github.com/conda/conda-lock.
How I imagine the tools commands working
I don't know how we might persuade Nextflow itself to use lock files to create the environments from lock files at run time. So imagine a different sequence:
nf-core modules conda-lock
- Runs conda-lock, creates lockfiles for all architectures requirednf-core init-locked_envs
- Creates environments for all the lockfiles for all the the modules of a workflow that have them.Then, when the workflow is run, the module environments are all recognised as being in place, and off we go. This could work incrementally, such that environments were still created on the fly for modules lacking lock files.
Potential problems
The text was updated successfully, but these errors were encountered: