Create frozen Conda environments for modules #2193

pinin4fjords · 2023-02-24T15:21:48Z

Description of feature

Problem

Conda environments are not reproducible over time. The sometimes large dependency trees mean you get a different software stack next week to the one you have today. This is bad for reproducible science.

The often used workaround for this has been to use Docker images, which have the effect of freezing dependency trees, but then if you find yourself rebuilding Docker images (e.g. to patch due to security concerns) you lose those frozen dependencies. Some (e.g. Paolo, I think) would say that really, we should be using Docker as a software delivery mechanism only.

A better way of doing this is to actually record the state of the environment when modules are created, and when the conda dependencies are updated, creating a frozen dependencies file that can be used to create environments when the workflows are run.

Available solutions

pythonspeed has an excellent (if not quite up to date) summary of this.

Essentially there are two ways to go.

`conda env export`

Create the environments, immediately record their state.

Advantages: no extra software required
Disadvantages: would be difficult for developers to do on a single machine in order to generate the separate environments that would be required for e.g. MacOS and Linux. Maybe it could be done with different machines in CI?

`conda-lock`

See https://github.com/conda/conda-lock.

Advantages:
- can make multi-platform lock files
- Bypasses the conda solver (you're basically just storing a list of URIs to the package archives). That could speed things up significantly.
Disadvantages
- Requires more software
- Users would need to install conda-lock to re-create environments at run time.

How I imagine the tools commands working

I don't know how we might persuade Nextflow itself to use lock files to create the environments from lock files at run time. So imagine a different sequence:

nf-core modules conda-lock - Runs conda-lock, creates lockfiles for all architectures required
nf-core init-locked_envs - Creates environments for all the lockfiles for all the the modules of a workflow that have them.

Then, when the workflow is run, the module environments are all recognised as being in place, and off we go. This could work incrementally, such that environments were still created on the fly for modules lacking lock files.

Potential problems

Rebuilding lock files when conda packages were bumped.
CI to ensure the above.
There may be some overlap with all the new funky Wave stuff

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2023-02-24T16:07:55Z

See also Paolo's post in #bioconda https://nfcore.slack.com/archives/CM46YC6BZ/p1677007405615889

pinin4fjords · 2023-02-27T09:54:44Z

See also discussion

edmundmiller · 2023-11-21T14:54:37Z

I believe wave supports conda-lock files now!

My issue would be with readability on the environment.yml. I kinda just want to see what exactly we want and not the 100 dependencies.

pinin4fjords · 2023-11-22T09:20:54Z

@emiller88 maybe we need a an environment-lock.yml in addition to the environment.yml? I know, another file, but would serve the different use cases of complete reproducibility vs flexible environment solve.

Would get messy with different architectures though...

edmundmiller · 2023-11-22T16:09:30Z

Maybe a .conda directory to keep it cleaner?

I think it's a trade-off at the end of the day.

If you want to be sure about reproducibility, you use the container images.

If you want to roll the dice, use conda. It'll get you pretty close 95% of the time.

pinin4fjords · 2023-11-22T18:40:18Z

See where you're coming from, don't completely agree.

I should be able to inspect the package complement of a frozen software env without poking about in a Docker image, and in an ideal world I'd like to be able to tweak an env to add something simple without rebuilding the whole thing (though since new thing may have its own deps I appreciate that's not a given).

edmundmiller · 2024-06-17T12:25:55Z

I think this was in a time before tests/ and everything else in a modules directory. I think having both and environment.yml and environment.lock.yml isn't ridiculous at this point.

My issue is if they'll get updated and maintained.

I think we can automate this now.

Bump the environment.yml -> Create a lock file -> Pass the lock file to wave

ewels · 2024-06-17T15:41:14Z

maybe we need a an environment-lock.yml in addition to the environment.yml

Same as package.json and package-lock.json for npm. This is what I'd expect for conda lockfiles tbh.

Automation as @edmundmiller says FTW 👍🏻

edmundmiller · 2024-07-16T14:48:55Z

Made a proof of concept, forgot to link it here though nf-core/modules#5827

pinin4fjords added the enhancement label Feb 24, 2023

mirpedrol added the infrastructure label Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create frozen Conda environments for modules #2193

Create frozen Conda environments for modules #2193

pinin4fjords commented Feb 24, 2023

pinin4fjords commented Feb 24, 2023

pinin4fjords commented Feb 27, 2023

edmundmiller commented Nov 21, 2023 •

edited

Loading

pinin4fjords commented Nov 22, 2023

edmundmiller commented Nov 22, 2023

pinin4fjords commented Nov 22, 2023

edmundmiller commented Jun 17, 2024

ewels commented Jun 17, 2024

edmundmiller commented Jul 16, 2024

Create frozen Conda environments for modules #2193

Create frozen Conda environments for modules #2193

Comments

pinin4fjords commented Feb 24, 2023

Description of feature

Problem

Available solutions

conda env export

conda-lock

How I imagine the tools commands working

Potential problems

pinin4fjords commented Feb 24, 2023

pinin4fjords commented Feb 27, 2023

edmundmiller commented Nov 21, 2023 • edited Loading

pinin4fjords commented Nov 22, 2023

edmundmiller commented Nov 22, 2023

pinin4fjords commented Nov 22, 2023

edmundmiller commented Jun 17, 2024

ewels commented Jun 17, 2024

edmundmiller commented Jul 16, 2024

`conda env export`

`conda-lock`

edmundmiller commented Nov 21, 2023 •

edited

Loading