-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dsl2: metagenomics uncollapsed paired end #1098
base: dev
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
|
dsl-2 metagenomics-pairedend
Ready for review: Basic Overview: We can now retain PE information from unmerged input fastq files (when PE samples going into metagenomics will be separately input into metaphlan, kraken2, and krakenuniq. Malt does not accept PE reads so it is not allowed with the param combo. Also fixed a small error in the parsing of metagenomics input data from bamfiltering --> metagnomics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on visual inspection, it looks good to me!
I would remove the one map-channel manipulation in metagenomics (see comment) as it is not necessary anymore
Also I have not tested anything, so I trust your tests for now :P
subworkflows/local/bamfiltering.nf
Outdated
} | ||
|
||
// Routing for metagenomic screening -> first accounting for paired-end mapping, then merged mapping, then no metagenomics | ||
if ( ( params.run_metagenomics && params.metagenomics_input == 'unmapped' ) && params.preprocessing_skippairmerging ) { | ||
ch_fastq_for_metagenomics = CAT_FASTQ_UNMAPPED.out.reads | ||
ch_fastq_for_metagenomics = CAT_FASTQ_UNMAPPED.out.reads.mix(ch_paired_fastq_for_cat) | ||
} else if ( ( params.run_metagenomics && ( params.metagenomics_input == 'mapped' || params.metagenomics_input == 'all' ) ) && params.preprocessing_skippairmerging ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input now is 'all', the unmapped reads are not taken into account.
I think that needs one more if-condition, with something like
if metagenomics_input == 'all':
CAT_FASTQ_MAPPED.out.reads.mix(CAT_FASTQ_UNMAPPED.out.reads).mix(ch_paired_fastq_for_cat)
but I might miss something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the unmapped reads are not filtered out? then it makes sense...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the splitting of all vs only mapped reads is dealt with at the modules.config level.
withName: SAMTOOLS_FASTQ_MAPPED { tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" } ext.args = [ params.metagenomics_input == 'all' ? '' : '-F 4',
We could split the module/import into an additional SAMTOOLS_FASTQ_ALL, for clarity for anyone looking into this section of the pipeline... @jfy133 what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted this whole section to imporve clarity. please see latest commit.
essentially the splitting of the fastq and cat into unmapped and mapped versions is unnecessary. all parameters can be parsed fine within modules config.
i confirmed that the md5sum of the final files sent to the ch_fastq_for_metagenomics channel are consistent across all the parameter combos that we were there.
|
||
So: needs to be fixed up higher (eg in bamfiltering.nf, likely with a new adjustment to the SAMTOOLS_FASTQ_UNMAPPED, SAMTOOLS_FASTQ_MAPPED, and SAMTOOLS_VIEW_BAM_FILTERING modules ) | ||
|
||
ISSUE FOUND: while the outputting of PE reads is OK in bamfiltering.nf (fastq_mapped & fastq_unmapped) when overlap merging is not done cat_fastq weirdly merges singletons to one PE file and other to the other PE file, so then everything gets fucked up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a hackathon-thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me double check this behavior and add more context of what we might need to do in a hackathon... i this was more for my own notes reference but was not edited prior to opening the PR.
…tering --> fastq for metagenomics. improves clarity (IMO)
ch_fastq_for_metagenomics = SAMTOOLS_FASTQ_MAPPED.out.other | ||
} else if ( !params.run_metagenomics ) { | ||
// No length/quality filtering applies to metagenomic bam files (could be extension) | ||
// All bam -> fastq filtering options (-F 4, -f 4 or none) will be dealt with within modules.config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TCLamnidis please take a quick look -- i restructured everything into a single call, since the modules.config is easily robust enough to deal with the splitting. and it allows for much cleaner code in this section which should help going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or @jfy133 who i think wrote the initial implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes it much easier! Nice one :)
PR for metagenomics paired-end merging off, retaining that info for metagenomic tools that have PE modes.
This is a QOL improvement for tools that can use paired end information (kraken2, krakenuniq), and now allows for variable inputs into these tools (by splitting inputs into either PE or SE data and running separate instances).
Main issues resolved:
--preprocessing_skippairmerging
used.PR checklist
scrape_software_versions.py
nf-core lint .
).nextflow run . -profile test,docker
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).