Add kraken2 phylogenetic assignment subworkflow #47

ctuni · 2024-10-28T10:13:04Z

Added a subworkflow for a "phylogenetic QC" that does kraken2 assignment for each sample and then plots them on an interactive krona plot. I have not added the kraken2 reports to multiqc, this should be the next step.

PR checklist

github-actions · 2024-10-28T10:40:31Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 758ce12

+| ✅ 194 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  22 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in README.md: TODO nf-core:
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in test.config: Specify the paths to your test data on nf-core/test-datasets
pipeline_todos - TODO string in test.config: Give any required params for the test so that command line flags are not needed
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
schema_description - Ungrouped param in schema: save_uncompressed_k2db

❔ Tests ignored:

files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-seqinspector_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-seqinspector_logo_light.png
files_exist - File found: docs/images/nf-core-seqinspector_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-seqinspector_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowSeqinspector.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.sample_size= 0
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_light.png matches the template
files_unchanged - docs/images/nf-core-seqinspector_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: nf-test.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - SEQTK_SAMPLE found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - KRAKEN2_KRAKEN2 found in conf/modules.config and Nextflow scripts.
modules_config - KRONA_KTUPDATETAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - KRONA_KTIMPORTTAXONOMY found in conf/modules.config and Nextflow scripts.
modules_config - UNTAR found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_GLOBAL found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC_PER_TAG found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2024-10-30 15:14:41

nf-core-bot · 2024-10-28T14:33:23Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 2.14.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

docs/output.md

Co-authored-by: Natalia Garcia Garcia <[email protected]>

MatthiasZepper · 2024-10-29T18:26:13Z

I think, you have this here well covered.

Therefore, I just wanted to point out, that a similar functionality was recently added to the rnaseq pipeline. So in case you still need some inspiration or have some more ugh moments (fancy commit messages^^), you might already find a suitable solution over there.

ctuni · 2024-10-30T09:25:16Z

I think, you have this here well covered.

Therefore, I just wanted to point out, that a similar functionality was recently added to the rnaseq pipeline. So in case you still need some inspiration or have some more ugh moments (fancy commit messages^^), you might already find a suitable solution over there.

Thank you! I took inspiration from the taxprofiler pipeline and tried simplifying it. I'll check how the rnaseq pipeline does it and see if the PR could be improved.

… into feature/kraken2

nggvs

Hi! I have made some suggestions that maybe can be interesting, but feel free to apply them or not

workflows/seqinspector.nf

nggvs · 2024-10-30T10:11:07Z

conf/modules.config

@@ -22,6 +22,31 @@ process {
        ext.args = '--quiet'
    }

+    withName: 'KRAKEN2_KRAKEN2' {
+        publishDir = [


I think there is a general statement for this in this (file)[https://github.com/nf-core/seqinspector/blob/31c1f829d97c4b98d21b68beed4af050fd331a37/conf/modules.config#L15], so I don't think is needed to add it twice, except if that bit is going to be removed it later?

There is! I just added it here for two reasons: the first is that I wanted more descriptive names for the folders (kraken2_reports instead of just kraken2) and I wanted the krona plots to be inside the kraken2_reports folder, with a more descriptive name as well.
The second reason I have added this seemingly redundant code is that kraken2 and kronatools can produce more output than what is produced now. I left these lines here looking into the future: they might need to be modified depending on the needs of the pipeline once it reaches a more stable status.

nggvs · 2024-10-30T10:11:32Z

conf/modules.config

+    }
+
+    withName: 'KRONA_KTIMPORTTAXONOMY' {
+        publishDir = [


same as before

nggvs · 2024-10-30T10:13:11Z

conf/modules.config

+    }
+
+    withName: 'UNTAR' {
+        publishDir = [


I'm not sure you want to output the kraken db, because it's size can be huge (depending on the selected one) and also it has been previously downloaded by the user, so already in user's device? You may want to use the storeDir in case you want to store the db and reuse it for later without the need of publishing it in the output

I see what you mean! I could create a patch to the UNTAR module to add the storeDir directive, that would also need some changes to the config but it can be done.

In any case, to avoid unnecessary waste of space by saving the uncompressed database, the pipeline works differently if the user provides a gzipped database or an uncompressed one. If the pipeline is gzipped, the UNTAR module uncompresses it and uses it, but by default, it won't save the uncompressed database if the user provided a compressed database.

The outputting of the uncompressed kraken2 db is turned off by default by the params.save_uncompressed_k2db, which is set as false. On the modules.config file this is read by the enable declaration.

If the database is uncompressed, and the user passes a path to the kraken2_db param, the UNTAR module is not called; the database is simply used and remains in the user's original directory.

nggvs · 2024-10-30T10:16:58Z

nextflow.config

@@ -19,6 +19,12 @@ params {
    igenomes_base              = 's3://ngi-igenomes/igenomes/'
    igenomes_ignore            = false

+    // Kraken2 options
+    kraken2_db                 = 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/kraken2/testdb-kraken2.tar.gz'


which db is? there are different ones: https://benlangmead.github.io/aws-indexes/k2 which requires different resources depending on the size. The one used, even if its for test should be documented somewhere (maybe it's and I haven't seen it).

I used the minimalest possible database for testing purposes, but I agree with you that it should not be default one, it should just be set to null. I used the taxprofiler test one, which was built like this: https://github.com/nf-core/test-datasets/blob/taxprofiler/README.md#kraken2

nextflow_schema.json

nextflow.config

Removed debugging commented statement Co-authored-by: Natalia Garcia Garcia <[email protected]>

… into feature/kraken2

ctuni added 3 commits October 28, 2024 11:12

first commit with kraken2 module

036daf0

added a database to the config

e7c1e71

added kraken2 param to schema

4549e1a

ctuni added 13 commits October 28, 2024 11:40

readded modules

c158510

fixed kraken2 name

d058ccf

added missing kraken2 options

e942664

fixed something in the linting

033df6e

added output to multiqc

c5edd67

changed the kraken2 channels

6c55393

removed kraken2 from mutiqc for now

59c4108

added kraken2 reports to the multqc channel

bf8eaa2

trying other methods to creae the multiqc files channel

a9aba3e

trying to pass kraken2 report to multiqc

07bd918

why is multiqc not working?

f4dc19a

why is multiqc not working?

d430a27

ugh

3e0d357

ctuni force-pushed the feature/kraken2 branch 2 times, most recently from 75a8e33 to 3e0d357 Compare October 28, 2024 14:52

ctuni added 4 commits October 28, 2024 15:55

removed kraken2 from multiqc files

b6c0d88

updated schema

ff8efdd

Merge branch 'dev' into feature/kraken2

5b602dd

fixing some things

7b2f665

ctuni linked an issue Oct 28, 2024 that may be closed by this pull request

add kraken2 to seqinspector #44

Open

changed schema version

ca597f7

ctuni changed the title ~~first commit with kraken2 module~~ Add kraken2 phylogenetic assignment subworkflow Oct 28, 2024

ctuni added 3 commits October 28, 2024 16:25

trying to unbreak things

400a76b

added prettier

4906699

further unbreaking things

54ce903

ctuni added 2 commits October 29, 2024 13:06

several improvements

7e41b19

prettier

82c68e7

nggvs reviewed Oct 29, 2024

View reviewed changes

docs/output.md Outdated Show resolved Hide resolved

ctuni and others added 4 commits October 29, 2024 15:46

updated citations

3e9b674

Update docs/output.md

dbb92c1

Co-authored-by: Natalia Garcia Garcia <[email protected]>

updated output

0324ead

updated output

f1759e1

ctuni added 7 commits October 30, 2024 10:25

Merge branch 'dev' into feature/kraken2

dd9e07c

added kraken2 reports to multiqc

828abd6

Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspector…

31c1f82

… into feature/kraken2

schema

3a22a9e

Merge branch 'dev' into feature/kraken2

5a4ce9c

Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspector…

34b766d

… into feature/kraken2

prettier

b698d41

nggvs reviewed Oct 30, 2024

View reviewed changes

ctuni and others added 11 commits October 30, 2024 11:31

Update workflows/seqinspector.nf

6dd4859

Removed debugging commented statement Co-authored-by: Natalia Garcia Garcia <[email protected]>

disabled the publish of the taxonomy file

07dee47

Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspector…

abea138

… into feature/kraken2

removed default database for a null one

a60d573

added test kraken2 database to the test configs

2488887

fixed typo

e28397a

miseq test is failing

9d8f631

miseq test is failing

5d1a15b

promethion test is failing

4729ada

novaseq test is failing

bb8e3b0

novaseq test is failing

758ce12

ctuni requested review from Aratz and nggvs October 30, 2024 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kraken2 phylogenetic assignment subworkflow #47

Add kraken2 phylogenetic assignment subworkflow #47

ctuni commented Oct 28, 2024 •

edited

Loading

github-actions bot commented Oct 28, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

nf-core-bot commented Oct 28, 2024

MatthiasZepper commented Oct 29, 2024

ctuni commented Oct 30, 2024

nggvs left a comment

nggvs Oct 30, 2024

ctuni Oct 30, 2024

nggvs Oct 30, 2024

nggvs Oct 30, 2024

ctuni Oct 30, 2024

nggvs Oct 30, 2024

ctuni Oct 30, 2024

Add kraken2 phylogenetic assignment subworkflow #47

Are you sure you want to change the base?

Add kraken2 phylogenetic assignment subworkflow #47

Conversation

ctuni commented Oct 28, 2024 • edited Loading

PR checklist

github-actions bot commented Oct 28, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

nf-core-bot commented Oct 28, 2024

MatthiasZepper commented Oct 29, 2024

ctuni commented Oct 30, 2024

nggvs left a comment

Choose a reason for hiding this comment

nggvs Oct 30, 2024

Choose a reason for hiding this comment

ctuni Oct 30, 2024

Choose a reason for hiding this comment

nggvs Oct 30, 2024

Choose a reason for hiding this comment

nggvs Oct 30, 2024

Choose a reason for hiding this comment

ctuni Oct 30, 2024

Choose a reason for hiding this comment

nggvs Oct 30, 2024

Choose a reason for hiding this comment

ctuni Oct 30, 2024

Choose a reason for hiding this comment

ctuni commented Oct 28, 2024 •

edited

Loading

github-actions bot commented Oct 28, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️