Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container cache download #3163

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from
Draft

Container cache download #3163

wants to merge 5 commits into from

Conversation

muffato
Copy link
Member

@muffato muffato commented Sep 7, 2024

Fixes #3019 and #3162.

First #3162. The code has to deal with an out_path (always defined) and sometimes a cache_path too. The implementation was slightly convoluted as it was doing fresh downloads into the cache_path or out_path (first decision point) and then doing an extra copy if needed (second decision point). Because of that confusion, symlinks across container registries were not all created across both locations.
I propose to reverse the logic to make it more straightforward:

  1. Always download into out_path and create its symlinks.
  2. Then, optionally copy to cache_path (and create symlinks there).

Then, #3019: I propose to handle the singularity "library" directory this way:

  1. Just like in Nextflow itself, it's considered a read-only location. It means that containers can only be copied from it, not to it, and that we shouldn't be even creating symlinks there.
  2. There is no point in having a --container-library-utilisation parameter for the library because i) remote would be redundant with the --container-cache-utilisation's remote mode, ii) amend is not possible as per the read-only rule, so iii) copy is the only possible mode.
  3. Therefore, the most natural place to use the library is as a source of containers, in parallel of https downloads and singularity pull. When NXF_SINGULARITY_LIBRARYDIR is set and the container exists in the library, it is copied to the target directories (out_path and possibly cache_path too)

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • Unit-tests
  • Documentation in README.md

… and are then copied to the cache if needed

This allows getting rid of the confusing `output_path` variable.
@muffato muffato added the download nf-core download label Sep 7, 2024
@muffato muffato self-assigned this Sep 7, 2024
@muffato muffato added the WIP Work in progress label Sep 7, 2024
Copy link

codecov bot commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 75.52%. Comparing base (8e47a33) to head (78650af).

Files with missing lines Patch % Lines
nf_core/pipelines/download.py 0.00% 16 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@muffato
Copy link
Member Author

muffato commented Sep 10, 2024

The code is there, but I still need to add unit-tests and documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
download nf-core download WIP Work in progress
Projects
None yet
1 participant