Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module : Pilon #3331

Merged
merged 7 commits into from
May 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions modules/nf-core/pilon/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
process PILON {
tag "$meta.id"
label 'process_medium'

conda "bioconda::pilon=1.24"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pilon:1.24--hdfd78af_0':
'quay.io/biocontainers/pilon:1.24--hdfd78af_0' }"

input:
tuple val(meta), path(fasta)
tuple val(meta_bam), path(bam), path(bai)
Comment on lines +11 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because fasta, bam and bai have to come all from the same sample, would it be rather:

Suggested change
tuple val(meta), path(fasta)
tuple val(meta_bam), path(bam), path(bai)
tuple val(meta), path(fasta), path(bam), path(bai)

I am not sure, thats just a question because I do not know better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting suggestion. Both would work.
I prefer to have them split, and it works if meta and meta_bam are the same.
In the case of purge_dups, they concatenated all the inputs and I had to split them because sometimes my meta reflects the input sequencing type (SR, PacBio, ONT), not only the sample name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have them split, and it works if meta and meta_bam are the same.

Hm ok, I would prefer having them not split. Because I think the typical application is that before the module is some channel magic that joins the required input (genome = fasta, alignment = bam & bai) to have matching files by ensuring that meta is matching.

In the case of purge_dups, they concatenated all the inputs and I had to split them because sometimes my meta reflects the input sequencing type (SR, PacBio, ONT), not only the sample name.

That seems to me rather the exception than the usual module output. But I might be wrong.

Unfortunately the nf-core modules guidelines do not specify this clearly, see https://nf-co.re/developers/modules#inputoutput-options, i.e. Directly associated auxiliary files to an input file MAY be defined within the same input channel alongside the main input channel (e.g. BAM and BAI)., so it seems I shouldn't insist on it!

val pilon_mode

output:
tuple val(meta), path("*.fasta") , emit: improved_assembly
tuple val(meta), path("*.vcf") , emit: vcf , optional : true
tuple val(meta), path("*.change"), emit: change_record , optional : true
tuple val(meta), path("*.bed") , emit: tracks_bed , optional : true
tuple val(meta), path("*.wig") , emit: tracks_wig , optional : true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def valid_mode = ["frags", "jumps", "unpaired", "bam"]
if ( !valid_mode.contains(pilon_mode) ) { error "Unrecognised mode to run Pilon. Options: ${valid_mode.join(', ')}" }
"""
pilon \\
--genome $fasta \\
--output ${meta.id} \\
--threads $task.cpus \\
$args \\
--$pilon_mode $bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
pilon: \$(echo \$(pilon --version) | sed 's/^.*version //; s/ .*\$//' )
"""
}
71 changes: 71 additions & 0 deletions modules/nf-core/pilon/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: "pilon"
description: Automatically improve draft assemblies and find variation among strains, including large event detection
keywords:
- polishing
- assembly
- variant calling
tools:
- "pilon":
description: "Pilon is an automated genome assembly improvement and variant detection tool."
homepage: "https://github.com/broadinstitute/pilon/wiki"
documentation: "https://github.com/broadinstitute/pilon/wiki/Requirements-&-Usage"
tool_dev_url: "https://github.com/broadinstitute/pilon"
doi: "https://doi.org/10.1371/journal.pone.0112963"
licence: "['GPL v2']"

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- fasta:
type: file
description: FASTA of the input genome
pattern: "*.{fasta}"
- bam:
type: file
description: BAM file of reads aligned to the input genome
pattern: "*.{bam}"
- bai:
type: file
description: BAI file (BAM index) of BAM reads aligned to the input genome
pattern: "*.{bai}"
- pilon_mode:
type: value
description: Indicates the type of bam file used (frags for paired-end sequencing of DNA fragments, such as Illumina paired-end reads of fragment size <1000bp, jumps for paired sequencing data of larger insert size, such as Illumina mate pair libraries, typically of insert size >1000bp, unpaired for unpaired sequencing reads, bam will automatically classify the BAM as one of the three types above (version 1.17 and higher).
pattern: ["frags", "jumps", "unpaired", "bam"]

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- improved_assembly:
type: file
description: fasta file, improved assembly
pattern: "*.{fasta}"
- change_record:
type: file
description: file containing a space-delimited record of every change made in the assembly as instructed by the --fix option
pattern: "*.{change}"
- vcf:
type: file
description: Pilon variant output
pattern: "*.{vcf}"
- tracks_bed:
type: file
description: files that may be viewed in genome browsers such as IGV, GenomeView, and other applications that support these formats
pattern: "*.{bed}"
- tracks_wig:
type: file
description: files that may be viewed in genome browsers such as IGV, GenomeView, and other applications that support these formats
pattern: "*.{wig}"

authors:
- "@scorreard"
4 changes: 4 additions & 0 deletions tests/config/pytest_modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2595,6 +2595,10 @@ picard/sortvcf:
- modules/nf-core/picard/sortvcf/**
- tests/modules/nf-core/picard/sortvcf/**

pilon:
- modules/nf-core/pilon/**
- tests/modules/nf-core/pilon/**

pindel/pindel:
- modules/nf-core/pindel/pindel/**
- tests/modules/nf-core/pindel/pindel/**
Expand Down
20 changes: 20 additions & 0 deletions tests/modules/nf-core/pilon/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { PILON } from '../../../../modules/nf-core/pilon/main.nf'

workflow test_pilon {

input = [
[ id:'test', single_end:false ], // meta map
file(params.test_data['homo_sapiens']['genome']['genome_fasta'], checkIfExists: true)
]

bam_tuple_ch = Channel.of([ [ id:'test', single_end:false ], // meta map
file(params.test_data['homo_sapiens']['illumina']['test_paired_end_sorted_bam'], checkIfExists: true),
file(params.test_data['homo_sapiens']['illumina']['test_paired_end_sorted_bam_bai'], checkIfExists: true),
])

PILON ( input, bam_tuple_ch, "bam" )
}
5 changes: 5 additions & 0 deletions tests/modules/nf-core/pilon/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
process {

publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }

}
8 changes: 8 additions & 0 deletions tests/modules/nf-core/pilon/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- name: pilon test_pilon
command: nextflow run ./tests/modules/nf-core/pilon -entry test_pilon -c ./tests/config/nextflow.config -c ./tests/modules/nf-core/pilon/nextflow.config
tags:
- pilon
files:
- path: output/pilon/test.fasta
md5sum: 2e881994820a5a641da9ea594ab4958f
- path: output/pilon/versions.yml
d4straub marked this conversation as resolved.
Show resolved Hide resolved