diff --git a/CHANGELOG.md b/CHANGELOG.md index ed5ccd3e..3803a9eb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,15 @@ # Changelog All notable changes to the call-gSV pipeline. +## [Unreleased] +### Added +- Added ability to call germline SVs with Manta +- Added parameters to control which SV caller is used (run_delly & run_manta) +- Added pipeline version from manifest to pipeline logging output + +### Changed +- Changed output directories to correspond with tool name casing + ## [2.2.0] - 2021-05-07 ### Changed - Updated modules to point to tool specific Docker Hub repos diff --git a/README.md b/README.md index d4789f4a..5a781552 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ ## Overview -The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly). It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and reference GRCh38 reference genome, where paired-end FASTQ's were created with BAM Surgeon. +The call-gSV nextflow pipeline, calls structural variants and copy number variants utilizing [Delly](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). It is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations and validates the output quality with [BCFtools](https://github.com/samtools/bcftools). The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally it has been validated with the SMC-HET dataset and reference GRCh38 reference genome, where paired-end FASTQ's were created with BAM Surgeon. Developer's Notes: @@ -28,11 +28,11 @@ The call-gSV nextflow pipeline, calls structural variants and copy number varian ### Node Specific Config File Settings -| Config File | Available Node cpus / memory | Designated Process 1; cpus / memory | Designated Process 2; cpus / memory | -|:------------|:---------|:-------------------------|:-------------------------| -| `lowmem.config` | 2 / 3 GB | delly_call_sv; 1 / 3 GB | validate_file; 1 / 1 GB | -| `midmem.config` | 72 / 136.8 GB | delly_call_sv; 71 / 130 GB | validate_file; 1 / 1 GB | -| `execute.config` | 64 / 950 GB | delly_call_sv; 63 / 940 GB | validate_file; 1 / 1 GB | +| Config File | Available Node cpus / memory | Designated Process 1; cpus / memory | Designated Process 2; cpus / memory | Designated Process 3; cpus / memory | +|:------------|:---------|:-------------------------|:-------------------------|:-------------------------| +| `lowmem.config` | 2 / 3 GB | call_gSV_Delly; 1 / 2 GB | call_gSV_Manta; 1 / 2 GB | validate_file; 1 / 1 GB | +| `midmem.config` | 72 / 136.8 GB | call_gSV_Delly; 35 / 65 GB | call_gSV_Manta; 35 / 65 GB | validate_file; 1 / 1 GB | +| `execute.config` | 64 / 950 GB | call_gSV_Delly; 31 / 470 GB | call_gSV_Manta; 31 / 470 GB | validate_file; 1 / 1 GB | --- ## How To Run @@ -61,35 +61,48 @@ A directed acyclic graph of your pipeline. ### 1. Calling Structural Variants -The first step of the pipeline requires an aligned and sorted BAM file and BAM index as an input for variant calling with [Delly.](https://github.com/dellytools/delly) Delly combines short-range and long-range paired-end mapping and split-read analysis for the discovery of balanced and unbalanced structural variants at single-nucleotide breakpoint resolution (deletions, tandem duplications, inversions and translocations.) Structural variants are called, annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data. - -Currently the following filters are applied and or considered for application and parameterization in subsequent releases: -* **map-qual:** >= 20 (Applied / Parameterized) -* **pe:** >= 5 (Not yet Applied / Non-parameterized) -* **sr:** >= 5 (Not yet Applied / Non-parameterized) -* **keep_imprecise:** >= true (Not yet Applied / Non-parameterized) +The first step of the pipeline requires an aligned and sorted BAM file and BAM index as an input for variant calling with [Delly](https://github.com/dellytools/delly) or [Manta](https://github.com/Illumina/manta). Delly combines short-range and long-range paired-end mapping and split-read analysis for the discovery of balanced and unbalanced structural variants at single-nucleotide breakpoint resolution (deletions, tandem duplications, inversions and translocations.) Structural variants are called, annotated and merged into a single BCF file. A default exclude map of Delly can be incorporated as an input which removes the telomeric and centromeric regions of all human chromosomes since these regions cannot be accurately analyzed with short-read data. +Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow. + +Currently the following filters are applied by Delly when calling structural variants. Parameters with a "call-gSV default" can be updated in the nextflow.config file. +
+| Parameter | Delly default | call-gSV default | Description | +|:------------|:----------|:-------------------------|-------------| +| `svtype` | ALL | | SV type to compute (DEL, INS, DUP, INV, BND, ALL) | +| `map-qual` | 1 | 20 | Minimum paired-end (PE) mapping quality | +| `qual-tra` | 20 | | Minimum PE quality for translocation | +| `mad-cutoff` | 9 | | Insert size cutoff, median+s*MAD (deletions only) | +| `minclip` | 25 | | Minimum clipping length | +| `min-clique-size` | 2 | | Minimum PE/SR clique size | +| `minrefsep` | 25 | | Minimum reference separation | +| `maxreadsep` | 40 | | Maximum read separation | +
### 2. Calling Copy Number Variants The second step of the pipeline identifies any found copy number variants (CNVs). To do this, Delly requires an aligned and sorted BAM file and BAM index as an input, as well as the BCF output from the initial structural variant calling (to refine breakpoints) and a mappability map. Any CNVs identified are annotated and output as a single BCF file. -Currently the following filters are applied and or considered for application and parameterization in subsequent releases: -* **quality:** = 10 (Applied / Non-parameterized) -* **ploidy:** = 2 (Applied / Non-parameterized) -* **sdrd:** = 2 (Applied / Non-parameterized) -* **cn-offset** = 0.100000001 (Applied / Non-parameterized) -* **cnv-size** = 1000 (Applied / Non-parameterized) -* **window-size** = 10000 (Applied / Non-parameterized) -* **window-offset** = 10000 (Applied / Non-parameterized) -* **fraction-window** = 0.25 (Applied / Non-parameterized) -* **scan-window** = 10000 (Applied / Non-parameterized) -* **fraction-unique** = 0.800000012 (Applied / Non-parameterized) -* **mad-cutoff** = 3 (Applied / Non-parameterized) -* **percentile** = 0.000500000024 (Applied / Non-parameterized) +Currently the following filters are applied by Delly when calling copy number variants. Parameters with a "call-gSV default" can be updated in the nextflow.config file. +
+| Parameter | Delly default | call-gSV default | Description | +|:------------|:----------|:-------------------------|-------------| +| `quality` | 10 | | Minimum mapping quality | +| `ploidy` | 2 | | Baseline ploidy | +| `sdrd` | 2 | | Minimum SD read-depth shift | +| `cn-offset` | 0.100000001 | | Minimum CN offset | +| `cnv-size` | 1000 | | Minimum CNV size | +| `window-size` | 10000 | | Window size | +| `window-offset` | 10000 | | Window offset | +| `fraction-window` | 0.25 | | Minimum callable window fraction [0,1] | +| `scan-window` | 10000 | | Scanning window size | +| `fraction-unique` | 0.800000012 | | Uniqueness filter for scan windows [0,1] | +| `mad-cutoff` | 3 | | Median + 3 * mad count cutoff | +| `percentile` | 0.000500000024 | | Excl. extreme GC fraction | +
### 3. Check Output Quality -VCF files are generated from the BCFs to run the vcf-validate command from [VCFTools](https://vcftools.github.io/perl_module.html#vcf-validator) and vcfstats from [RTGTools](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats). Outputs from both provide preliminary summary statistics that can be viewed and evaluated in preparation for downstream cohort-wide re-calling and re-genotyping. +For Delly, VCF files are generated from the BCFs to run the vcf-validate command from [VCFTools](https://vcftools.github.io/perl_module.html#vcf-validator) and vcfstats from [RTGTools](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats). Outputs from both provide preliminary summary statistics that can be viewed and evaluated in preparation for downstream cohort-wide re-calling and re-genotyping. In the Manta branch of the pipeline, a stats directory is generated under the specific output directory /Manta-/results/stats where information can be found regarding the SVs identified. --- @@ -119,7 +132,9 @@ VCF files are generated from the BCFs to run the vcf-validate command from [VCFT | `exclusion_file` | yes | path | Absolute path to the delly reference genome `exclusion` file utilized to remove suggested regions for structural variant calling. On Slurm/SGE, an HG38 exclusion file is located at /[hot\|data]/ref/hg38/delly/human.hg38.excl.tsv | | `mappability_map` | yes | path | Absolute path to the delly mappability map to support GC and mappability fragment correction in CNV calling | | `map_qual` | no | path | minimum paired-end (PE) mapping quaility threshold for Delly). | -| `run_qc` | no | boolean | Optional parameter to indicate whether subsequent quality checks should be run. Default value is false. | +| `run_delly` | true | boolean | Whether or not the workflow should run Delly (either run_delly or run_manta must be set to true) | +| `run_manta` | true | boolean | Whether or not the workflow should run Manta (either run_delly or run_manta must be set to true) | +| `run_qc` | no | boolean | Optional parameter to indicate whether subsequent quality checks should be run on Delly outputs. Default value is false. | | `save_intermediate_files` | yes | boolean | Optional parameter to indicate whether intermediate files will be saved. Default value is true. | | `output_dir` | yes | path | Absolute path to the directory where the output files to be saved. | `temp_dir` | yes | path | Absolute path to the directory where the nextflow's intermediate files are saved. | @@ -131,7 +146,7 @@ VCF files are generated from the BCFs to run the vcf-validate command from [VCFT | Output | Output Type | Description | |:-------|:---------|:------------| | `.bcf` | final | Binary VCF output format with structural variants if found. | -| `.vcf` | intermediate | VCF output format with structural variants if found.| +| `.vcf` | intermediate | VCF output format with structural variants if found. If output by Manta, these VCFs will be compressed. | | `.bcf.csi` | final | CSI-format index for BAM files. | | `.validate.txt` | final | output file from vcf-validator. | | `.stats.txt` | final | output file from RTG Tools. | @@ -205,7 +220,8 @@ Included is a template for validating your input files. For more information on ## References 1. [Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28(18):i333-i339. doi:10.1093/bioinformatics/bts378](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436805/) -2. [VCFtools - vcf-validator](https://vcftools.github.io/perl_module.html#vcf-validator) -3. [Real Time Genomics RTG Tools Operations Manual - vcfstats](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats) -4. [Boutros Lab -CallSV Quality Control pipeline]() -5. [The 1000 Genomes Project Consortium., Corresponding authors., Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393](https://www.nature.com/articles/nature15393) +2. Chen, X. et al. (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220-1222. [doi:10.1093/bioinformatics/btv710](https://academic.oup.com/bioinformatics/article/32/8/1220/1743909) +3. [VCFtools - vcf-validator](https://vcftools.github.io/perl_module.html#vcf-validator) +4. [Real Time Genomics RTG Tools Operations Manual - vcfstats](https://cdn.rawgit.com/RealTimeGenomics/rtg-tools/master/installer/resources/tools/RTGOperationsManual/rtg_command_reference.html#vcfstats) +5. [Boutros Lab -CallSV Quality Control pipeline]() +6. [The 1000 Genomes Project Consortium., Corresponding authors., Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393](https://www.nature.com/articles/nature15393) diff --git a/call-gSV-flowchart-diagram.drawio.svg b/call-gSV-flowchart-diagram.drawio.svg index 2e1ae1bc..6e3f3551 100644 --- a/call-gSV-flowchart-diagram.drawio.svg +++ b/call-gSV-flowchart-diagram.drawio.svg @@ -1,3 +1,3 @@ -
Sorted
BAM and
BAM Index
Sorted...
Validate Inputs
Validate Inputs
Call Structural Variants
Call Structural...
Check Quality
Check Quality
Validate Outputs
Validate Outputs
BCF and VCF
BCF and VCF
Create Sha512
Create Sha512
Call Copy Number Variants
Call Copy Numbe...
Viewer does not support full SVG 1.1
\ No newline at end of file +
Sorted
BAM and
BAM Index
Sorted...
Validate Inputs
Validate Inputs
Call Structural Variants (Delly)
Call Structural...
Check Quality
Check Quality
Validate Outputs
Validate Outputs
BCF and VCF
BCF and VCF
Create SHA-512
Create SHA-512
Call Copy Number Variants
Call Copy Numbe...
Call Structural Variants (Manta)
Call Structural...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/metadata.yaml b/metadata.yaml index 270e48f4..43074d11 100644 --- a/metadata.yaml +++ b/metadata.yaml @@ -1,11 +1,11 @@ --- Category: "pipeline" -Description: "a pipeline for calling structural variants and copy number variants utilizing Delly" +Description: "a pipeline for calling structural variants and copy number variants utilizing Delly and Manta" Maintainers: "Boutros Lab Infrastructure BoutrosLabInfrastructure@mednet.ucla.edu" Contributors: ["Yael Berkovich", "Tim Sanders"] Languages: ["Nextflow", "Docker"] Dependencies: ["Java", "Nextflow", "Docker"] References: "https://confluence.mednet.ucla.edu/display/BOUTROSLAB/Guide+to+Nextflow" -Tools: ["Delly:v0.8.7", "BCFtools:v1.12", "VCFtools:v0.1.16", "RTG-tools:v3.12", "Validate-nf:v2.1.0"] +Tools: ["Delly:v0.8.7", "Manta:v1.6.0", "BCFtools:v1.12", "VCFtools:v0.1.16", "RTG-tools:v3.12", "Validate-nf:v2.1.5"] Engineering_Owner: [] Scientific_Owner: [] diff --git a/pipeline/call-gSV.nf b/pipeline/call-gSV.nf index cac9512f..a6a3200e 100644 --- a/pipeline/call-gSV.nf +++ b/pipeline/call-gSV.nf @@ -31,9 +31,12 @@ Current Configuration: save_intermediate_files: ${params.save_intermediate_files} run_qc: ${params.run_qc} map_qual: ${params.map_qual} + run_delly: ${params.run_delly} + run_manta: ${params.run_manta} - tools: delly: ${params.delly_version} + manta: ${params.manta_version} bcftools: ${params.bcftools_version} vcftools: ${params.vcftools_version} rtgtools: ${params.rtgtools_version} @@ -48,10 +51,11 @@ Starting workflow... include { run_validate } from './modules/validation' include { call_gSV_Delly; call_gCNV_Delly } from './modules/delly' +include { call_gSV_Manta } from './modules/manta' include { convert_BCF2VCF_BCFtools as convert_gSV_BCF2VCF_BCFtools; convert_BCF2VCF_BCFtools as convert_gCNV_BCF2VCF_BCFtools } from './modules/bcftools' include { run_vcfstats_RTGTools } from './modules/rtgtools' include { run_vcf_validator_VCFtools } from './modules/vcftools' -include { run_sha512sum } from './modules/sha512' +include { run_sha512sum as run_sha512sum_Delly; run_sha512sum as run_sha512sum_Manta } from './modules/sha512' input_bam_ch = Channel .fromPath(params.input_csv, checkIfExists:true) @@ -67,19 +71,24 @@ input_bam_ch = Channel if (!params.reference_fasta) { // error out - must provide a reference FASTA file error "***Error: You must specify a reference FASTA file***" -} + } if (!params.exclusion_file) { // error out - must provide exclusion file error "***Error: You must provide an exclusion file***" -} + } + +if (!params.run_delly && !params.run_manta) { + // error out - must specify a valid SV caller + error "***Error: You must specify either Delly or Manta***" + } if (params.reference_fasta_index) { reference_fasta_index = params.reference_fasta_index -} + } else { reference_fasta_index = "${params.reference_fasta}.fai" -} + } // Create channel for validation validation_channel = Channel @@ -94,13 +103,19 @@ validation_channel = Channel workflow { run_validate(validation_channel) - call_gSV_Delly(input_bam_ch, params.reference_fasta, reference_fasta_index, params.exclusion_file) - call_gCNV_Delly(input_bam_ch, call_gSV_Delly.out.bcf_sv_file, params.reference_fasta, reference_fasta_index, params.mappability_map) - convert_gSV_BCF2VCF_BCFtools(call_gSV_Delly.out.bcf_sv_file, call_gSV_Delly.out.bam_sample_name, 'SV') - convert_gCNV_BCF2VCF_BCFtools(call_gCNV_Delly.out.bcf_cnv_file, call_gCNV_Delly.out.bam_sample_name, 'CNV') - if (params.run_qc) { - run_vcfstats_RTGTools(convert_gSV_BCF2VCF_BCFtools.out.vcf_file, call_gSV_Delly.out.bam_sample_name) - run_vcf_validator_VCFtools(convert_gSV_BCF2VCF_BCFtools.out.vcf_file, call_gSV_Delly.out.bam_sample_name) - } - run_sha512sum(call_gSV_Delly.out.bcf_sv_file.mix(convert_gSV_BCF2VCF_BCFtools.out.vcf_file,call_gCNV_Delly.out.bcf_cnv_file,convert_gCNV_BCF2VCF_BCFtools.out.vcf_file)) + if (params.run_manta) { + call_gSV_Manta(input_bam_ch, params.reference_fasta, reference_fasta_index) + run_sha512sum_Manta(call_gSV_Manta.out.vcf_small_indel_sv_file.mix(call_gSV_Manta.out.vcf_diploid_sv_file, call_gSV_Manta.out.vcf_candidate_sv_file)) + } + if (params.run_delly) { + call_gSV_Delly(input_bam_ch, params.reference_fasta, reference_fasta_index, params.exclusion_file) + call_gCNV_Delly(input_bam_ch, call_gSV_Delly.out.bcf_sv_file, params.reference_fasta, reference_fasta_index, params.mappability_map) + convert_gSV_BCF2VCF_BCFtools(call_gSV_Delly.out.bcf_sv_file, call_gSV_Delly.out.bam_sample_name, 'SV') + convert_gCNV_BCF2VCF_BCFtools(call_gCNV_Delly.out.bcf_cnv_file, call_gCNV_Delly.out.bam_sample_name, 'CNV') + if (params.run_qc) { + run_vcfstats_RTGTools(convert_gSV_BCF2VCF_BCFtools.out.vcf_file, call_gSV_Delly.out.bam_sample_name) + run_vcf_validator_VCFtools(convert_gSV_BCF2VCF_BCFtools.out.vcf_file, call_gSV_Delly.out.bam_sample_name) + } + run_sha512sum_Delly(call_gSV_Delly.out.bcf_sv_file.mix(convert_gSV_BCF2VCF_BCFtools.out.vcf_file, call_gCNV_Delly.out.bcf_cnv_file, convert_gCNV_BCF2VCF_BCFtools.out.vcf_file)) + } } diff --git a/pipeline/config/execute.config b/pipeline/config/execute.config index bb25b497..a390da5b 100644 --- a/pipeline/config/execute.config +++ b/pipeline/config/execute.config @@ -4,7 +4,11 @@ process { memory = 1.GB } withName: call_gSV_Delly { - cpus = 63 - memory = 940.GB + cpus = 31 + memory = 470.GB + } + withName: call_gSV_Manta { + cpus = 31 + memory = 470.GB } } \ No newline at end of file diff --git a/pipeline/config/lowmem.config b/pipeline/config/lowmem.config index 6bf4c59e..acdb0611 100644 --- a/pipeline/config/lowmem.config +++ b/pipeline/config/lowmem.config @@ -5,6 +5,10 @@ process { } withName: call_gSV_Delly { cpus = 1 - memory = 3.GB + memory = 2.GB + } + withName: call_gSV_Manta { + cpus = 1 + memory = 2.GB } } \ No newline at end of file diff --git a/pipeline/config/methods.config b/pipeline/config/methods.config index 047b5844..176af723 100644 --- a/pipeline/config/methods.config +++ b/pipeline/config/methods.config @@ -95,10 +95,11 @@ methods.setup() params { // Pipeline tool versions delly_version = '0.8.7' + manta_version = '1.6.0' bcftools_version = '1.12' vcftools_version = '0.1.16' rtgtools_version = '3.12' - validate_version = '2.1.0' + validate_version = '2.1.5' sha512_version = '0.1' } diff --git a/pipeline/config/midmem.config b/pipeline/config/midmem.config index 799de112..3ff7d991 100644 --- a/pipeline/config/midmem.config +++ b/pipeline/config/midmem.config @@ -4,7 +4,11 @@ process { memory = 1.GB } withName: call_gSV_Delly { - cpus = 71 - memory = 130.GB + cpus = 35 + memory = 65.GB + } + withName: call_gSV_Manta { + cpus = 35 + memory = 65.GB } } \ No newline at end of file diff --git a/pipeline/config/nextflow.config b/pipeline/config/nextflow.config index a7440f48..864627f6 100644 --- a/pipeline/config/nextflow.config +++ b/pipeline/config/nextflow.config @@ -5,8 +5,8 @@ manifest { nextflowVersion = '>=20.07.1' author = 'Tim Sanders & Yael Berkovich' homePage = 'https://github.com/uclahs-cds/pipeline-call-gSV' - description = 'A pipeline to call structural variants utilizing Delly' - version = '2.2.0' + description = 'A pipeline to call structural variants utilizing Delly and Manta' + version = '3.0.0' } params { @@ -23,6 +23,9 @@ params { exclusion_file = '/path/to/exclusion.tsv' mappability_map = '/path/to/mappability_map' map_qual = 20 // min. paired-end (PE) mapping quality for Delly + // run_delly or run_manta (or both) must be set to true + run_delly = true + run_manta = true run_qc = true save_intermediate_files = true output_dir = '/path/to/outputs' diff --git a/pipeline/modules/bcftools.nf b/pipeline/modules/bcftools.nf index c659b624..98dc903b 100644 --- a/pipeline/modules/bcftools.nf +++ b/pipeline/modules/bcftools.nf @@ -16,7 +16,7 @@ process convert_BCF2VCF_BCFtools { publishDir params.output_dir, pattern: "*.vcf", mode: "copy", - saveAs: { "bcftools-${params.bcftools_version}/${file(it).getName()}" } + saveAs: { "BCFtools-${params.bcftools_version}/${file(it).getName()}" } publishDir params.output_log_dir, pattern: ".command.*", diff --git a/pipeline/modules/delly.nf b/pipeline/modules/delly.nf index 81242294..8a4627bb 100644 --- a/pipeline/modules/delly.nf +++ b/pipeline/modules/delly.nf @@ -17,7 +17,7 @@ process call_gSV_Delly { enabled: params.save_intermediate_files, pattern: "*.bcf*", mode: "copy", - saveAs: { "delly-${params.delly_version}/${file(it).getName()}" } + saveAs: { "Delly-${params.delly_version}/${file(it).getName()}" } publishDir params.output_log_dir, pattern: ".command.*", @@ -56,7 +56,7 @@ process call_gCNV_Delly { enabled: params.save_intermediate_files, pattern: "*.bcf*", mode: "copy", - saveAs: { "delly-${params.delly_version}/${file(it).getName()}" } + saveAs: { "Delly-${params.delly_version}/${file(it).getName()}" } publishDir params.output_log_dir, pattern: ".command.*", diff --git a/pipeline/modules/manta.nf b/pipeline/modules/manta.nf new file mode 100644 index 00000000..3ac8f333 --- /dev/null +++ b/pipeline/modules/manta.nf @@ -0,0 +1,54 @@ +#!/usr/bin/env nextflow + +def docker_image_manta = "blcdsdockerregistry/manta:${params.manta_version}" + +log.info """\ +------------------------------------ + M A N T A +------------------------------------ +Docker Images: +- docker_image_manta: ${docker_image_manta} +""" + +process call_gSV_Manta { + container docker_image_manta + + publishDir params.output_dir, + pattern: "MantaWorkflow/results", + mode: "copy", + saveAs: { "Manta-${params.manta_version}/${file(it).getName()}" } + + publishDir params.output_log_dir, + pattern: ".command.*", + mode: "copy", + saveAs: { "call_gSV_Manta/${bam_sample_name}.log${file(it).getName()}" } + + input: + tuple val(patient), val(bam_sample_name), path(input_bam), path(input_bam_bai) + path(reference_fasta) + path(reference_fasta_fai) + + + output: + path("MantaWorkflow/results/variants/candidateSmallIndels.vcf.gz"), emit: vcf_small_indel_sv_file + path("MantaWorkflow/results/variants/candidateSmallIndels.vcf.gz.tbi") + path("MantaWorkflow/results/variants/diploidSV.vcf.gz"), emit: vcf_diploid_sv_file + path("MantaWorkflow/results/variants/diploidSV.vcf.gz.tbi") + path("MantaWorkflow/results/variants/candidateSV.vcf.gz"), emit: vcf_candidate_sv_file + path("MantaWorkflow/results/variants/candidateSV.vcf.gz.tbi") + //path "MANTA-${params.manta_version}_SV_${params.dataset_id}_${bam_sample_name}.vcf.gz", emit: vcf_sv_file + //path "MANTA-${params.manta_version}_SV_${params.dataset_id}_${bam_sample_name}.vcf.gz.tbi" + path "MantaWorkflow/results" + path ".command.*" + val bam_sample_name, emit: bam_sample_name + + """ + set -euo pipefail + configManta.py \ + --normalBam $input_bam \ + --referenceFasta $reference_fasta \ + --runDir MantaWorkflow + + MantaWorkflow/runWorkflow.py + """ +} \ No newline at end of file diff --git a/pipeline/modules/sha512.nf b/pipeline/modules/sha512.nf index b6b95147..f48c66eb 100644 --- a/pipeline/modules/sha512.nf +++ b/pipeline/modules/sha512.nf @@ -16,12 +16,17 @@ process run_sha512sum { publishDir params.output_dir, pattern: "*.vcf.sha512", mode: "copy", - saveAs: { "bcftools-${params.bcftools_version}/${file(it).getName()}" } + saveAs: { "BCFtools-${params.bcftools_version}/${file(it).getName()}" } publishDir params.output_dir, pattern: "*.bcf.sha512", mode: "copy", - saveAs: { "delly-${params.delly_version}/${file(it).getName()}" } + saveAs: { "Delly-${params.delly_version}/${file(it).getName()}" } + + publishDir params.output_dir, + pattern: "*.vcf.gz*.sha512", + mode: "copy", + saveAs: { "Manta-${params.manta_version}/results/variants/${file(it).getName()}" } publishDir params.output_log_dir, pattern: ".command.*", diff --git a/pipeline/modules/vcftools.nf b/pipeline/modules/vcftools.nf index 5edeb4a7..47962da3 100644 --- a/pipeline/modules/vcftools.nf +++ b/pipeline/modules/vcftools.nf @@ -16,7 +16,7 @@ process run_vcf_validator_VCFtools { publishDir params.output_dir, pattern: "*_validation.txt", mode: "copy", - saveAs: { "vcftools-${params.vcftools_version}/${file(it).getName()}" } + saveAs: { "VCFtools-${params.vcftools_version}/${file(it).getName()}" } publishDir params.output_log_dir, pattern: ".command.*",