Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minia assembly is non-deterministic in parallel #381

Open
tetedange13 opened this issue May 11, 2023 · 1 comment
Open

Minia assembly is non-deterministic in parallel #381

tetedange13 opened this issue May 11, 2023 · 1 comment
Labels
enhancement Improvement for existing functionality

Comments

@tetedange13
Copy link

Description of feature

Hi all,


I noticed that when running assembly with Minia twice with exact same parameters, produced assembled genomes are slightly different
=> This is something known: GATB/minia#25 (comment)

To avoid that, Minia should be run with -nb-cores 1 -nb-glue-partitions 200 (not the case in current Minia nf-core module)
=> See "11 - Reproducibility" section of Minia manual

It first thought it would be OK to set deterministic behaviour as default, because Minia is fast and nextflow should be able to parallellize Minia processes over multiple samples
=> But my tests show a rather important impact on performances
(I only changed main.nf of Minia module, maybe there was something else to tweak to maintain performances ?)

So rather it could useful to add an optional parameter to enable deterministic Minia assembly (could be something like --deterministic-minia) ?
=> In case one wants to compare assembled genomes ?
=> Warning user that it will have a huge impact on performances


Hope this helps !
Have a nice day,
Felix.

@tetedange13 tetedange13 added the enhancement Improvement for existing functionality label May 11, 2023
@tetedange13
Copy link
Author

tetedange13 commented May 11, 2023

Edit : Impact on performances (runtime) when using "deterministic Minia" is relative
=> Looking at execution_report file, with deterministic Minia :

  • Takes about as long as Kraken2 step
  • Still takes way less time than alignment with Bowtie2 step, which is the bottleneck (but only applies to people running both branches of viralrecon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant