Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging template updates 2.1 #56

Merged
merged 74 commits into from
Aug 25, 2021

Conversation

d4straub
Copy link
Collaborator

@d4straub d4straub commented Aug 2, 2021

This should solve #52 . This PR is about making the pipeline work again, not adding new features.

As a byproduct, #51 is solved as well.

As far as I can see there is no test with appropriate data for running canu, PycoQC, or nanopolish.

PR checklist

  • pull in template update 2.1 → make linting work
  • adapt input samplesheet
  • make input channels work
  • check if modules exist https://github.com/nf-core/bacass/blob/dev/conf/base.config
  • make modules, use conf/base.config & main.nf
  • make test profile work
  • make all tests work
  • update all software to current/working version
  • polish output directories
  • update CHANGELOG
  • update README
  • update output.md
  • update CITATIONS
  • make sure NanoPolish works (not covered by tests)
  • make sure PyroQC works (not covered by tests)
  • make sure Canu works (not covered by tests)
  • update usage.md
  • add full size test data

@d4straub d4straub self-assigned this Aug 2, 2021
@d4straub d4straub added the WIP label Aug 2, 2021
@d4straub d4straub linked an issue Aug 2, 2021 that may be closed by this pull request
@d4straub d4straub added this to the 2.0.0 milestone Aug 2, 2021
@d4straub
Copy link
Collaborator Author

d4straub commented Aug 4, 2021

Unfortunately nf-core modules unicycler & spades seem both to allow only short read assembly, quite disappointing. Might have to use custom modules than. I postpone that.

@d4straub
Copy link
Collaborator Author

d4straub commented Aug 6, 2021

short read processing works.

nf-core modules that I was looking into:

  • unicycler: allows only short read assembly
  • Nanoplot: accepts only .fastq.gz files ( I have .fastq, could be changed) (the sample sheets specifies incompatible "*.fq.gz")
  • minimap2: unclear, nf-core tools did not allow me to install it installed minimap/align
  • kraken2: installed
  • quast: installed
  • prokka: could not make it work when not using valid files in "path proteins" or "path prodigal" installed

@d4straub
Copy link
Collaborator Author

The container of Nanopolish isn't containing parallel and therefore the process fails (parallel: command not found). I test now without parallel (see last commit 545763a).

@d4straub
Copy link
Collaborator Author

d4straub commented Aug 20, 2021

Hi @apeltzer , I would need suggestions/comments.
NanoPolish seems to work, real data is still running, but it passed previously failing steps. So all seems fine right now. Update: NanoPolish works, everything is working.

What I see is still lacking:

  • no data in test_full.config yet, therefore no AWS tests or AWS results. For the sake of testing, I could add there the test_hybrid, would that be fine?
    edit: alternatively I could add Illumina-only data from PRJNA563526 = https://pubmed.ncbi.nlm.nih.gov/32561582/, either both genomes or just one.
  • unused parameters are still appearing, i.e. igenomes, params.fasta, params.genomes. Can I remove those or might they have any value? Right now they are not interfering, but are not used either.
  • I attempted to complete the docs (were before actually incomplete), but some things might have slipped through.

Whats next when this PR is fine? I was thinking to address the most urgent matter(s) in #57 , right now I think its the structural changes. (You might remember, you and Stefan had some trouble with a bacterial assembly, this is solved with this version already, miniasm assembly produces 1 contig in correct size. But that might have not been obvious because all software combinations/options have to be specified individually, I'd recommend that by default all assemblers run based on the provided data, so that one can compare performance of all methods.)

However, it might be best to release the pipeline before those structural changes, because I might need some more weeks/months to implement it?!

@d4straub d4straub removed the WIP label Aug 23, 2021
@d4straub d4straub requested a review from apeltzer August 23, 2021 09:21
@apeltzer
Copy link
Member

  • no data in test_full.config yet, therefore no AWS tests or AWS results. For the sake of testing, I could add there the test_hybrid, would that be fine?

Yes, I think this wasn't fully there in the past when I started working on the workflow (which I also took over from Andreas Wilm in the very beginning to be honest!)

edit: alternatively I could add Illumina-only data from PRJNA563526 = https://pubmed.ncbi.nlm.nih.gov/32561582/, either both genomes or just one.

Both fine for me , large scale tests can be large-scale ;-)

  • unused parameters are still appearing, i.e. igenomes, params.fasta, params.genomes. Can I remove those or might they have any value? Right now they are not interfering, but are not used either.

Drop everything that isn't used, this just confuses people.

  • I attempted to complete the docs (were before actually incomplete), but some things might have slipped through.

I assume this could all benefit from some extra work on that side. Happy to help out here and there on that and check that all is there.

Whats next when this PR is fine? I was thinking to address the most urgent matter(s) in #57 , right now I think its the structural changes. (You might remember, you and Stefan had some trouble with a bacterial assembly, this is solved with this version already, miniasm assembly produces 1 contig in correct size. But that might have not been obvious because all software combinations/options have to be specified individually, I'd recommend that by default all assemblers run based on the provided data, so that one can compare performance of all methods.)

If this is working fine, one could do it. At the moment it was more a choice of 1 option and then running multiple runs with different tools and finally comparing - but could do as you proposed yes, I agree.

However, it might be best to release the pipeline before those structural changes, because I might need some more weeks/months to implement it?!

Yup, a release should help address open points / issues & then moving forward with the restructure is a good idea 👍🏻

@d4straub
Copy link
Collaborator Author

Alright, all done & tested. I tested test_full on our cluster, but not on AWS; but I will start the AWS test using github actions once the PR is merged.

@apeltzer
Copy link
Member

So local runs finished here too - happy to see this on dev and proceed from there 👍🏻

Copy link
Member

@apeltzer apeltzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good but will do a final review on the dev to masteras this is basically an entire rewrite of the origina DSLv1 pipeline

@apeltzer apeltzer merged commit c343fb9 into nf-core:dev Aug 25, 2021
@d4straub
Copy link
Collaborator Author

Great, thanks!

@d4straub d4straub deleted the merging-template-updates-2.1 branch August 25, 2021 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion to DSL2 & update of tools
4 participants