Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new module: Flye #1164

Merged
merged 35 commits into from
Jun 30, 2022
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
00a752a
changing mv by gzip
mirpedrol Oct 29, 2021
c1cb4fe
changing mv by gzip
mirpedrol Oct 29, 2021
6963945
first module creation
mirpedrol Dec 7, 2021
21cf17e
add test.yml
mirpedrol Dec 13, 2021
146d99e
Merge branch 'flye' of https://github.com/asthara10/modules into flye
mirpedrol Dec 13, 2021
c573575
update changes
mirpedrol Jun 10, 2022
1318541
add flye to pyestes_modules.yml
mirpedrol Jun 10, 2022
8e3fb3c
update flye module
mirpedrol Jun 10, 2022
ff31209
delete functions.nf
mirpedrol Jun 10, 2022
fc7921e
generate test.yml
mirpedrol Jun 10, 2022
1a0fb3d
Apply changes made in gitpod
mirpedrol Jun 10, 2022
9109d46
fix contains from test.yml
mirpedrol Jun 10, 2022
08e2252
test file assembly_info.txt with regex
mirpedrol Jun 13, 2022
ddcd84d
check that file contains at least contig_1
mirpedrol Jun 13, 2022
b1527cd
fix typo in contains
mirpedrol Jun 13, 2022
6ae2a69
Merge branch 'master' into flye
mirpedrol Jun 13, 2022
4772601
update version
mirpedrol Jun 13, 2022
a5a1b15
split fastq file for raw runs
mirpedrol Jun 14, 2022
5064fab
use asm-coverage to reduce memory usage
mirpedrol Jun 29, 2022
2da96e8
fix module name error
mirpedrol Jun 29, 2022
93bd62c
add genome-size
mirpedrol Jun 29, 2022
ed20095
decrease coverage
mirpedrol Jun 29, 2022
0918e49
change test data for raw runs
mirpedrol Jun 29, 2022
2821de2
add coverage and genome size
mirpedrol Jun 29, 2022
fc65226
Apply comments from code review
mirpedrol Jun 29, 2022
1ef2c17
after many trys, add a stub run
mirpedrol Jun 29, 2022
14467f6
remove md5sum for stub run
mirpedrol Jun 29, 2022
0210875
Merge branch 'master' into flye
mirpedrol Jun 29, 2022
34e9819
Merge branch 'master' into flye
mirpedrol Jun 30, 2022
4d656a6
Apply suggestions from code review
mirpedrol Jun 30, 2022
f9302c4
fix review comments
mirpedrol Jun 30, 2022
f07c309
Apply suggestions from code review
mirpedrol Jun 30, 2022
42f74d3
no hardcoded version in stub run
mirpedrol Jun 30, 2022
0c913e2
Update modules/flye/main.nf
mirpedrol Jun 30, 2022
01fb2da
Merge branch 'master' into flye
mirpedrol Jun 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions modules/flye/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
process FLYE {
tag "$meta.id"
label 'process_high'

conda (params.enable_conda ? "bioconda::flye=2.9" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/flye:2.9--py39h6935b12_1':
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
'quay.io/biocontainers/flye:2.9--py39h6935b12_1' }"

input:
tuple val(meta), path(reads)
val mode

output:
tuple val(meta), path("*.fasta.gz"), emit: fasta
tuple val(meta), path("*.gfa.gz") , emit: gfa
tuple val(meta), path("*.gv.gz") , emit: gv
tuple val(meta), path("*.txt") , emit: txt
tuple val(meta), path("*.log") , emit: log
tuple val(meta), path("*.json") , emit: json
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
"""
flye $mode $reads --out-dir . --threads $task.cpus $args
mahesh-panchal marked this conversation as resolved.
Show resolved Hide resolved

gzip -c assembly.fasta > ${prefix}.assembly.fasta.gz
gzip -c assembly_graph.gfa > ${prefix}.assembly_graph.gfa.gz
gzip -c assembly_graph.gv > ${prefix}.assembly_graph.gv.gz
mv assembly_info.txt ${prefix}.assembly_info.txt
mv flye.log ${prefix}.flye.log
mv params.json ${prefix}.params.json
cat <<-END_VERSIONS > versions.yml
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
"${task.process}":
flye: \$(echo \$(flye --version | sed 's/-b1768//' ))
mahesh-panchal marked this conversation as resolved.
Show resolved Hide resolved
END_VERSIONS
"""
}
65 changes: 65 additions & 0 deletions modules/flye/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: "flye"
description: De novo assembler for single molecule sequencing reads
keywords:
- assembly
- genome
- de novo
- genome assembler
- single molecule
tools:
- "flye":
description: "Fast and accurate de novo assembler for single molecule sequencing reads"
homepage: "https://github.com/fenderglass/Flye"
documentation: "https://github.com/fenderglass/Flye/blob/flye/docs/USAGE.md"
tool_dev_url: "https://github.com/fenderglass/Flye"
doi: "doi:s41592-020-00971-x"
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
licence: "['BSD-3-clause']"

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- reads:
type: file
description: Input file in FASTA/FASTQ format.
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
pattern: "*.{fasta,fastq,fasta.gz,fastq.gz,fa,fq,fa.gz,fq.gz}"

mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test' ]
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- fasta:
type: file
description: Assembled FASTA file
pattern: "*.fasta.gz"
- gfa:
type: file
description: Repeat graph
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
pattern: "*.gfa.gz"
- gv:
type: file
description: Repeat graph
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
pattern: "*.gv.gz"
- txt:
type: file
description: Extra contig information
mirpedrol marked this conversation as resolved.
Show resolved Hide resolved
pattern: "*.txt"
- log:
type: file
description: Flye log file
pattern: "*.log"
- json:
type: file
description: Flye parameters
pattern: "*.json"

authors:
- "@mirpedrol"
4 changes: 4 additions & 0 deletions tests/config/pytest_modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,10 @@ flash:
- modules/flash/**
- tests/modules/flash/**

flye:
- modules/flye/**
- tests/modules/flye/**

freebayes:
- modules/freebayes/**
- tests/modules/freebayes/**
Expand Down
71 changes: 71 additions & 0 deletions tests/modules/flye/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { FLYE } from '../../../modules/flye/main.nf'

workflow test_flye_pacbio_raw {

input = [
[ id:'test' ], // meta map
file(params.test_data['bacteroides_fragilis']['nanopore']['test_fastq_gz'], checkIfExists: true)
]
mode = "--pacbio-raw"

FLYE ( input, mode )
}

workflow test_flye_pacbio_corr {

input = [
[ id:'test' ], // meta map
file(params.test_data['homo_sapiens']['pacbio']['hifi'], checkIfExists: true)
]
mode = "--pacbio-corr"

FLYE ( input, mode )
}

workflow test_flye_pacbio_hifi {

input = [
[ id:'test' ], // meta map
file(params.test_data['homo_sapiens']['pacbio']['hifi'], checkIfExists: true)
]
mode = "--pacbio-hifi"

FLYE ( input, mode )
}

workflow test_flye_nano_raw {

input = [
[ id:'test' ], // meta map
file(params.test_data['bacteroides_fragilis']['nanopore']['test_fastq_gz'], checkIfExists: true)
]
mode = "--nano-raw"

FLYE ( input, mode )
}

workflow test_flye_nano_corr {

input = [
[ id:'test' ], // meta map
file(params.test_data['homo_sapiens']['pacbio']['hifi'], checkIfExists: true)
]
mode = "--nano-corr"

FLYE ( input, mode )
}

workflow test_flye_nano_hq {

input = [
[ id:'test' ], // meta map
file(params.test_data['homo_sapiens']['pacbio']['hifi'], checkIfExists: true)
]
mode = "--nano-hq"

FLYE ( input, mode )
}
7 changes: 7 additions & 0 deletions tests/modules/flye/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
process {

publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }
withName: FLYE_LOW_MEM {
ext.args = '--asm-coverage 40 --genome-size 5.2k'
}
}
87 changes: 87 additions & 0 deletions tests/modules/flye/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# According to the issue https://github.com/fenderglass/Flye/issues/164
# Some fluctuations are expected because of the heuristics
# Here we check the that test.assembly_info.txt contains at least one contig

- name: flye test_flye_pacbio_raw
command: nextflow run ./tests/modules/flye -entry test_flye_pacbio_raw -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9

- name: flye test_flye_pacbio_corr
command: nextflow run ./tests/modules/flye -entry test_flye_pacbio_corr -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9

- name: flye test_flye_pacbio_hifi
command: nextflow run ./tests/modules/flye -entry test_flye_pacbio_hifi -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9

- name: flye test_flye_nano_raw
command: nextflow run ./tests/modules/flye -entry test_flye_nano_raw -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9

- name: flye test_flye_nano_corr
command: nextflow run ./tests/modules/flye -entry test_flye_nano_corr -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9

- name: flye test_flye_nano_hq
command: nextflow run ./tests/modules/flye -entry test_flye_nano_hq -c ./tests/config/nextflow.config -c ./tests/modules/flye/nextflow.config
tags:
- flye
files:
- path: output/flye/test.assembly.fasta.gz
- path: output/flye/test.assembly_graph.gfa.gz
- path: output/flye/test.assembly_graph.gv.gz
- path: output/flye/test.assembly_info.txt
contains: ["contig_1"]
- path: output/flye/test.flye.log
- path: output/flye/test.params.json
md5sum: 54b576cb6d4d27656878a7fd3657bde9