Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken save_no_host config saves both unclassified and classified reads and files #312

Closed
svarona opened this issue Aug 12, 2024 · 2 comments
Assignees

Comments

@svarona
Copy link
Member

svarona commented Aug 12, 2024

Kraken save_no_host config saves these files:

  • {sample}.classified_1.fastq.gz and {sample}.classified_2.fastq.gz: Reads classified with the database, in viralrecon, host reads (we don't need this file)
  • {sample}.kraken2.classifiedreads.txt: A file with the classification of all the reads, which is very big, similar to one fastq.file (we don't need this file)
  • {sample}.kraken2.report.txt: Normal file saved always with the results (we need this one)
  • {sample}.unclassified_1.fastq.gz and {sample}.unclassified_2.fastq.gz: Reads that didn't classified with the database, in viralrecon, no-host reads (we need this one)

Find a way to remove or exclude .classified_{1,2}.fastq.gz and .kraken2.classifiedreads.txt

@victor5lm victor5lm self-assigned this Sep 5, 2024
@victor5lm
Copy link
Contributor

In the develop branch of the buisciii tools, the save_nohost.config file no longer exists. In any case, wouldn't the solution be to add the following lines into the viralrecon.config file?

`withName: 'KRAKEN2_KRAKEN2' {
            publishDir = [
                pattern: "*.{unclassified.fastq.gz,unclassified_1.fastq.gz,unclassified_2.fastq.gz,txt}"
            ]
        }`

This would have to be done, however, manually when the researcher asks explicitly for the no host reads. Another approach might be modifying /data/bi/pipelines/nf-core-viralrecon/nf-core-viralrecon-2.6.0/workflow/modules/nf-core/kraken2/kraken2/main.nf, since it currently displays:

`output:
    tuple val(meta), path('*.classified{.,_}*')     , optional:true, emit: classified_reads_fastq
    tuple val(meta), path('*.unclassified{.,_}*')   , optional:true, emit: unclassified_reads_fastq
    tuple val(meta), path('*classifiedreads.txt')   , optional:true, emit: classified_reads_assignment
    tuple val(meta), path('*report.txt')                           , emit: report
    path "versions.yml"                                            , emit: versions`

I believe that lines referring to classified reads and classified.txt could be simply deleted if that's what's being requested in this issue, but some advice on this aspect will be very welcome in any case.

@saramonzon
Copy link
Member

It'd be the first solution you propose I think, but test it just in case. As you mentioned we don't have the config for the no_host output, I think the best approach is if you create a new config as the sars_nanopore one that adds this configuration.

Sounds good?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants