From d1c468c0e7033690d5e8e3f4c32f86c0954375fa Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 13 May 2021 17:17:22 +0100 Subject: [PATCH 01/68] Update CHANGELOG --- CHANGELOG.md | 6 ++++++ nextflow.config | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f35b20ab..2fe9a3ab 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unpublished Version / DEV] + +### Enhancements & fixes + +### Parameters + ## [[2.0](https://github.com/nf-core/rnaseq/releases/tag/2.0)] - 2021-05-13 ### :warning: Major enhancements diff --git a/nextflow.config b/nextflow.config index a05bbfbc..d2443436 100644 --- a/nextflow.config +++ b/nextflow.config @@ -242,7 +242,7 @@ manifest { description = 'Assembly and intrahost/low-frequency variant calling for viral samples' mainScript = 'main.nf' nextflowVersion = '!>=21.04.0' - version = '2.0' + version = '2.1dev' } // Function to ensure that resource requirements don't go beyond From a73f1d44f8bdea3753e65bad537754acb3ba20e6 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:08:51 +0100 Subject: [PATCH 02/68] Remove test_sra CI workflow --- .github/workflows/ci.yml | 26 -------------------------- 1 file changed, 26 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6065fd0f..66867ad8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -70,32 +70,6 @@ jobs: run: | nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }} - test_sra: - name: Test SRA workflow - if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/viralrecon') }} - runs-on: ubuntu-latest - env: - NXF_VER: ${{ matrix.nxf_ver }} - NXF_ANSI_LOG: false - strategy: - matrix: - parameters: [--skip_sra_fastq_download, ""] - - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 - - - name: Install Nextflow - env: - CAPSULE_LOG: none - run: | - wget -qO- get.nextflow.io | bash - sudo mv nextflow /usr/local/bin/ - - - name: Run pipeline to download SRA ids and various options - run: | - nextflow run ${GITHUB_WORKSPACE} -profile test_sra,docker ${{ matrix.parameters }} - test_sispa: name: Test SISPA workflow if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/viralrecon') }} From bb783dbd478880720e8256184e250d1efce9fe7e Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:09:03 +0100 Subject: [PATCH 03/68] Remove docs for fetching public data --- README.md | 30 +++++++++--------------------- 1 file changed, 9 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index e4d2e043..03413dc4 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ ## Introduction -**nfcore/viralrecon** is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: [ARTIC SARS-CoV-2 enrichment protocol](https://artic.network/ncov-2019); or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/). +**nf-core/viralrecon** is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: [ARTIC SARS-CoV-2 enrichment protocol](https://artic.network/ncov-2019); or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/). On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from running the full-sized tests individually for each `--platform` option can be viewed on the [nf-core website](https://nf-co.re/viralrecon/results) and the output directories will be named accordingly i.e. `platform_illumina/` and `platform_nanopore/`. @@ -28,12 +28,11 @@ The pipeline has numerous options to allow you to run only specific aspects of t ### Illumina -1. Download samples via SRA, ENA or GEO ids ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html); *if required*) -2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html)) -3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) -4. Adapter trimming ([`fastp`](https://github.com/OpenGene/fastp)) -5. Removal of host reads ([`Kraken 2`](http://ccb.jhu.edu/software/kraken2/); *optional*) -6. Variant calling +1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html)) +2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) +3. Adapter trimming ([`fastp`](https://github.com/OpenGene/fastp)) +4. Removal of host reads ([`Kraken 2`](http://ccb.jhu.edu/software/kraken2/); *optional*) +5. Variant calling 1. Read alignment ([`Bowtie 2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)) 2. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/)) 3. Primer sequence removal ([`iVar`](https://github.com/andersen-lab/ivar); *amplicon data only*) @@ -47,14 +46,14 @@ The pipeline has numerous options to allow you to run only specific aspects of t * Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade)) * Individual variant screenshots with annotation tracks ([`ASCIIGenome`](https://asciigenome.readthedocs.io/en/latest/)) 8. Intersect variants across callers ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html)) -7. _De novo_ assembly +6. _De novo_ assembly 1. Primer trimming ([`Cutadapt`](https://cutadapt.readthedocs.io/en/stable/guide.html); *amplicon data only*) 2. Choice of multiple assembly tools ([`SPAdes`](http://cab.spbu.ru/software/spades/) *||* [`Unicycler`](https://github.com/rrwick/Unicycler) *||* [`minia`](https://github.com/GATB/minia)) * Blast to reference genome ([`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch)) * Contiguate assembly ([`ABACAS`](https://www.sanger.ac.uk/science/tools/pagit)) * Assembly report ([`PlasmidID`](https://github.com/BU-ISCIII/plasmidID)) * Assembly assessment report ([`QUAST`](http://quast.sourceforge.net/quast)) -8. Present QC and visualisation for raw read, alignment, assembly and variant calling results ([`MultiQC`](http://multiqc.info/)) +7. Present QC and visualisation for raw read, alignment, assembly and variant calling results ([`MultiQC`](http://multiqc.info/)) ### Nanopore @@ -130,14 +129,6 @@ The pipeline has numerous options to allow you to run only specific aspects of t -profile ``` - * Typical command for downloading public data: - - ```bash - nextflow run nf-core/viralrecon \ - --public_data_ids ids.txt \ - -profile - ``` - * An executable Python script called [`fastq_dir_to_samplesheet.py`](https://github.com/nf-core/viralrecon/blob/master/bin/fastq_dir_to_samplesheet.py) has been provided if you are using `--platform illumina` and would like to auto-create an input samplesheet based on a directory containing FastQ files **before** you run the pipeline (requires Python 3 installed locally) e.g. ```console @@ -145,13 +136,10 @@ The pipeline has numerous options to allow you to run only specific aspects of t ``` * You can find the default keys used to specify `--genome` in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config). Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys. If you are able to get permissions from the vendor/supplier to share the primer information then we would be more than happy to support it within the pipeline. - * The commands to obtain public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the raw data in an initial pipeline run (`results/public_data/`) and then to curate the auto-created samplesheet based on the available sample metadata before you run the pipeline again properly. - -See [usage](https://nf-co.re/viralrecon/usage) and [parameter](https://nf-co.re/viralrecon/parameters) docs for all of the available options when running the pipeline. ## Documentation -The nf-core/viralrecon pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/viralrecon/usage) and [output](https://nf-co.re/viralrecon/output). +The nf-core/viralrecon pipeline comes with documentation about the pipeline [usage](https://nf-co.re/viralrecon/usage), [parameters](https://nf-co.re/viralrecon/parameters) and [output](https://nf-co.re/viralrecon/output). ## Credits From a3a825e008bc9988fdee36a59d908d326936c34c Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:09:19 +0100 Subject: [PATCH 04/68] Remove section for fetching public data --- docs/usage.md | 30 ------------------------------ 1 file changed, 30 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 7b1657c8..985a2301 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -54,36 +54,6 @@ sample,barcode | `sample` | Custom sample name, one per barcode. | | `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | -## Direct download of public repository data - -> **NB:** This is an experimental feature but should work beautifully when it does! :) - -The pipeline has a separate workflow to automatically download raw FastQ files from public repositories. Identifiers can be provided in a file, one-per-line via the `--public_data_ids` parameter. Currently, the following identifiers are supported: - -| `SRA` | `ENA` | `GEO` | -|--------------|--------------|------------| -| SRR11605097 | ERR4007730 | GSM4432381 | -| SRX8171613 | ERX4009132 | GSE147507 | -| SRS6531847 | ERS4399630 | | -| SAMN14689442 | SAMEA6638373 | | -| SRP256957 | ERP120836 | | -| SRA1068758 | ERA2420837 | | -| PRJNA625551 | PRJEB37513 | | - -If `SRR`/`ERR` run ids are provided then these will be resolved back to their appropriate `SRX`/`ERX` ids to be able to merge multiple runs from the same experiment. This is conceptually the same as merging multiple libraries sequenced from the same sample. - -The final sample information for all identifiers is obtained from the ENA which provides direct download links for FastQ files as well as their associated md5 sums. If download links exist, the files will be downloaded in parallel by FTP otherwise they will NOT be downloaded. - -As a bonus, the pipeline will also generate a valid samplesheet with paths to the downloaded data that can be used with the `--input` parameter to run the main analysis arm of the pipeline, however, it is highly recommended that you double-check that all of the identifiers you defined using `--public_data_ids` are represented in the samplesheet. All of the sample metadata obtained from the ENA has been appended as additional columns to help you manually curate the samplesheet before you run the pipeline if required. - -If you have a GEO accession (found in the data availability section of published papers) you can directly download a text file containing the appropriate SRA ids to pass to the pipeline: - -* Search for your GEO accession on [GEO](https://www.ncbi.nlm.nih.gov/geo) -* Click `SRA Run Selector` at the bottom of the GEO accession page -* Select the desired samples in the `SRA Run Selector` and then download the `Accession List` - -This downloads a text file called `SRR_Acc_List.txt` which can be directly provided to the pipeline e.g. `--public_data_ids SRR_Acc_List.txt`. - ## Running the pipeline The typical command for running the pipeline is as follows: From eb26d8bebbfa9b66ab0162c7d6612b74a1fd1697 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:49:50 +0100 Subject: [PATCH 05/68] Add section about Nanopore input format to usage docs --- docs/usage.md | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/docs/usage.md b/docs/usage.md index 985a2301..4958291d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -54,6 +54,72 @@ sample,barcode | `sample` | Custom sample name, one per barcode. | | `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | +## Nanopore input format + +For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/). The [artic minion](https://artic.readthedocs.io/en/latest/commands/) tool from the [ARTIC field bioinformatics pipeline](https://github.com/artic-network/fieldbioinformatics) is used to align reads, call variants and to generate the consensus sequence. + +### Nanopolish + +The default variant caller used by artic minion is [Nanopolish](https://github.com/jts/nanopolish) and this requires that you provide `*.fastq`, `*.fast5` and `sequencing_summary.txt` files as input to the pipeline. These files can typically be obtained after demultiplexing and basecalling the sequencing data using [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis) (see [ARTIC SOP docs](https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html)). This pipeline requires that the files are organised in the format outlined below: + +```console +. +└── fastq_pass + └── barcode26 + ├── FAP51364_pass_barcode01_97ca62ca_0.fastq + ├── FAP51364_pass_barcode01_97ca62ca_1.fastq + ├── FAP51364_pass_barcode01_97ca62ca_2.fastq + ├── FAP51364_pass_barcode01_97ca62ca_3.fastq + ├── FAP51364_pass_barcode01_97ca62ca_4.fastq + ├── FAP51364_pass_barcode01_97ca62ca_5.fastq + +``` + +```console +. +└── fast5_pass + ├── barcode01 + ├── FAP51364_pass_barcode01_97ca62ca_0.fast5 + ├── FAP51364_pass_barcode01_97ca62ca_1.fast5 + ├── FAP51364_pass_barcode01_97ca62ca_2.fast5 + ├── FAP51364_pass_barcode01_97ca62ca_3.fast5 + ├── FAP51364_pass_barcode01_97ca62ca_4.fast5 + ├── FAP51364_pass_barcode01_97ca62ca_5.fast5 + +``` + +The command to run the pipeline would then be: + +```console +nextflow run nf-core/viralrecon \ + --input samplesheet.csv \ + --platform nanopore \ + --genome 'MN908947.3' \ + --primer_set_version 3 \ + --fastq_dir fastq_pass/ \ + --fast5_dir fast5_pass/ \ + --sequencing_summary sequencing_summary.txt \ + -profile +``` + +### Medaka + +You also have the option of using [Medaka](https://github.com/nanoporetech/medaka) as an alternative variant caller to Nanopolish via the `--artic_minion_caller medaka` parameter. Medaka is faster than Nanopolish, performs mostly the same and can be run directly from `fastq` input files as opposed to requiring the `fastq`, `fast5` and `sequencing_summary.txt` files required to run Nanopolish. You must provide the appropriate [Medaka model](https://github.com/nanoporetech/medaka#models) via the `--artic_minion_medaka_model` parameter if using `--artic_minion_caller medaka`. The `fastq` files have to be organised in the same way as for Nanopolish as outlined in the section above. + +The command to run the pipeline would then be: + +```console +nextflow run nf-core/viralrecon \ + --input samplesheet.csv \ + --platform nanopore \ + --genome 'MN908947.3' \ + --primer_set_version 3 \ + --fastq_dir fastq_pass/ \ + --artic_minion_caller medaka \ + --artic_minion_medaka_model ./r941_min_high_g360_model.hdf5 \ + -profile +``` + ## Running the pipeline The typical command for running the pipeline is as follows: From ee8a6f5a6d01d1834545f3178e4c9a8763697a67 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:49:59 +0100 Subject: [PATCH 06/68] Fix tyop --- bin/ivar_variants_to_vcf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bin/ivar_variants_to_vcf.py b/bin/ivar_variants_to_vcf.py index 37727cac..d217a6be 100755 --- a/bin/ivar_variants_to_vcf.py +++ b/bin/ivar_variants_to_vcf.py @@ -38,7 +38,7 @@ def ivar_variants_to_vcf(FileIn,FileOut,passOnly=False,minAF=0): '##FORMAT=\n' '##FORMAT=\n' '##FORMAT=\n' - '##FORMAT=\n' + '##FORMAT=\n' '##FORMAT=\n' '##FORMAT=\n') header += '#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t'+filename+'\n' From cb0b41d8e2a1522e8115d5ecb4bf24e084fe4ef2 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:50:38 +0100 Subject: [PATCH 07/68] Delete SRA download specific files --- bin/sra_ids_to_runinfo.py | 178 ------------------------- bin/sra_runinfo_to_ftp.py | 113 ---------------- bin/sra_runinfo_to_samplesheet.py | 91 ------------- conf/test_sra.config | 24 ---- lib/WorkflowSraDownload.groovy | 19 --- modules/local/sra_fastq_ftp.nf | 47 ------- modules/local/sra_ids_to_runinfo.nf | 31 ----- modules/local/sra_merge_samplesheet.nf | 31 ----- modules/local/sra_runinfo_to_ftp.nf | 28 ---- modules/local/sra_to_samplesheet.nf | 43 ------ workflows/sra_download.nf | 117 ---------------- 11 files changed, 722 deletions(-) delete mode 100755 bin/sra_ids_to_runinfo.py delete mode 100755 bin/sra_runinfo_to_ftp.py delete mode 100755 bin/sra_runinfo_to_samplesheet.py delete mode 100644 conf/test_sra.config delete mode 100755 lib/WorkflowSraDownload.groovy delete mode 100644 modules/local/sra_fastq_ftp.nf delete mode 100644 modules/local/sra_ids_to_runinfo.nf delete mode 100644 modules/local/sra_merge_samplesheet.nf delete mode 100644 modules/local/sra_runinfo_to_ftp.nf delete mode 100644 modules/local/sra_to_samplesheet.nf delete mode 100644 workflows/sra_download.nf diff --git a/bin/sra_ids_to_runinfo.py b/bin/sra_ids_to_runinfo.py deleted file mode 100755 index 002a859a..00000000 --- a/bin/sra_ids_to_runinfo.py +++ /dev/null @@ -1,178 +0,0 @@ -#!/usr/bin/env python - -import os -import re -import sys -import csv -import errno -import requests -import argparse - - -## Example ids supported by this script -SRA_IDS = ['PRJNA63463', 'SAMN00765663', 'SRA023522', 'SRP003255', 'SRR390278', 'SRS282569', 'SRX111814'] -ENA_IDS = ['ERA2421642', 'ERP120836', 'ERR674736', 'ERS4399631', 'ERX629702', 'PRJEB7743', 'SAMEA3121481'] -GEO_IDS = ['GSE18729', 'GSM465244'] -ID_REGEX = r'^[A-Z]+' -PREFIX_LIST = sorted(list(set([re.search(ID_REGEX,x).group() for x in SRA_IDS + ENA_IDS + GEO_IDS]))) - - -def parse_args(args=None): - Description = 'Download and create a run information metadata file from SRA/ENA/GEO identifiers.' - Epilog = 'Example usage: python fetch_sra_runinfo.py ' - - parser = argparse.ArgumentParser(description=Description, epilog=Epilog) - parser.add_argument('FILE_IN', help="File containing database identifiers, one per line.") - parser.add_argument('FILE_OUT', help="Output file in tab-delimited format.") - parser.add_argument('-pl', '--platform', type=str, dest="PLATFORM", default='', help="Comma-separated list of platforms to use for filtering. Accepted values = 'ILLUMINA', 'OXFORD_NANOPORE' (default: '').") - parser.add_argument('-ll', '--library_layout', type=str, dest="LIBRARY_LAYOUT", default='', help="Comma-separated list of library layouts to use for filtering. Accepted values = 'SINGLE', 'PAIRED' (default: '').") - return parser.parse_args(args) - - -def validate_csv_param(param,valid_vals,param_desc): - valid_list = [] - if param: - user_vals = param.split(',') - intersect = list(set(user_vals) & set(valid_vals)) - if len(intersect) == len(user_vals): - valid_list = intersect - else: - print("ERROR: Please provide a valid {} parameter!\nProvided values = {}\nAccepted values = {}".format(param_desc,param,','.join(validVals))) - sys.exit(1) - return valid_list - - -def make_dir(path): - if not len(path) == 0: - try: - os.makedirs(path) - except OSError as exception: - if exception.errno != errno.EEXIST: - raise - - -def fetch_url(url,encoding='utf-8'): - try: - r = requests.get(url) - except requests.exceptions.RequestException as e: - raise SystemExit(e) - if r.status_code != 200: - print("ERROR: Connection failed\nError code '{}'".format(r.status_code)) - sys.exit(1) - return r.content.decode(encoding).splitlines() - - -def id_to_srx(db_id): - ids = [] - url = 'https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term={}'.format(db_id) - for row in csv.DictReader(fetch_url(url), delimiter=','): - ids.append(row['Experiment']) - return ids - - -def id_to_erx(db_id): - ids = [] - fields = ['run_accession', 'experiment_accession'] - url = 'http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession={}&result=read_run&fields={}'.format(db_id,','.join(fields)) - for row in csv.DictReader(fetch_url(url), delimiter='\t'): - ids.append(row['experiment_accession']) - return ids - - -def gse_to_srx(db_id): - ids = [] - url = 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc={}&targ=gsm&view=data&form=text'.format(db_id) - gsm_ids = [x.split('=')[1].strip() for x in fetch_url(url) if x.find('GSM') != -1] - for gsm_id in gsm_ids: - ids += id_to_srx(gsm_id) - return ids - - -def get_ena_fields(): - fields = [] - url = 'https://www.ebi.ac.uk/ena/portal/api/returnFields?dataPortal=ena&format=tsv&result=read_run' - for row in csv.DictReader(fetch_url(url), delimiter='\t'): - fields.append(row['columnId']) - return fields - - -def fetch_sra_runinfo(file_in,file_out,platform_list=[],library_layout_list=[]): - total_out = 0 - seen_ids = []; run_ids = [] - header = [] - make_dir(os.path.dirname(file_out)) - ena_fields = get_ena_fields() - with open(file_in,"r") as fin, open(file_out,"w") as fout: - for line in fin: - db_id = line.strip() - match = re.search(ID_REGEX, db_id) - if match: - prefix = match.group() - if prefix in PREFIX_LIST: - if not db_id in seen_ids: - - ids = [db_id] - ## Resolve/expand these ids against GEO URL - if prefix in ['GSE']: - ids = gse_to_srx(db_id) - - ## Resolve/expand these ids against SRA URL - elif prefix in ['GSM', 'PRJNA', 'SAMN', 'SRR']: - ids = id_to_srx(db_id) - - ## Resolve/expand these ids against ENA URL - elif prefix in ['ERR']: - ids = id_to_erx(db_id) - - ## Resolve/expand to get run identifier from ENA and write to file - for id in ids: - url = 'http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession={}&result=read_run&fields={}'.format(id,','.join(ena_fields)) - csv_dict = csv.DictReader(fetch_url(url), delimiter='\t') - for row in csv_dict: - run_id = row['run_accession'] - if not run_id in run_ids: - - write_id = True - if platform_list: - if row['instrument_platform'] not in platform_list: - write_id = False - if library_layout_list: - if row['library_layout'] not in library_layout_list: - write_id = False - - if write_id: - if total_out == 0: - header = sorted(row.keys()) - fout.write('{}\n'.format('\t'.join(sorted(header)))) - else: - if header != sorted(row.keys()): - print("ERROR: Metadata columns do not match for id {}!\nLine: '{}'".format(run_id,line.strip())) - sys.exit(1) - fout.write('{}\n'.format('\t'.join([row[x] for x in header]))) - total_out += 1 - run_ids.append(run_id) - seen_ids.append(db_id) - - if not ids: - print("ERROR: No matches found for database id {}!\nLine: '{}'".format(db_id,line.strip())) - sys.exit(1) - - else: - id_str = ', '.join([x + "*" for x in PREFIX_LIST]) - print("ERROR: Please provide a valid database id starting with {}!\nLine: '{}'".format(id_str,line.strip())) - sys.exit(1) - else: - id_str = ', '.join([x + "*" for x in PREFIX_LIST]) - print("ERROR: Please provide a valid database id starting with {}!\nLine: '{}'".format(id_str,line.strip())) - sys.exit(1) - - -def main(args=None): - args = parse_args(args) - platform_list = validate_csv_param(args.PLATFORM,valid_vals=['ILLUMINA'],param_desc='--platform') - library_layout_list = validate_csv_param(args.LIBRARY_LAYOUT,valid_vals=['SINGLE', 'PAIRED'],param_desc='--library_layout') - fetch_sra_runinfo(args.FILE_IN,args.FILE_OUT,platform_list,library_layout_list) - - -if __name__ == '__main__': - sys.exit(main()) diff --git a/bin/sra_runinfo_to_ftp.py b/bin/sra_runinfo_to_ftp.py deleted file mode 100755 index 8056015f..00000000 --- a/bin/sra_runinfo_to_ftp.py +++ /dev/null @@ -1,113 +0,0 @@ -#!/usr/bin/env python - -import os -import sys -import errno -import argparse -import collections - - -def parse_args(args=None): - Description = "Create samplesheet with FTP download links and md5ums from sample information obtained via 'sra_ids_to_runinfo.py' script." - Epilog = 'Example usage: python sra_runinfo_to_ftp.py ' - - parser = argparse.ArgumentParser(description=Description, epilog=Epilog) - parser.add_argument('FILES_IN', help="Comma-separated list of metadata file created from 'sra_ids_to_runinfo.py' script.") - parser.add_argument('FILE_OUT', help="Output file containing paths to download FastQ files along with their associated md5sums.") - return parser.parse_args(args) - - -def make_dir(path): - if not len(path) == 0: - try: - os.makedirs(path) - except OSError as exception: - if exception.errno != errno.EEXIST: - raise - - -def parse_sra_runinfo(file_in): - runinfo_dict = {} - with open(file_in, "r") as fin: - header = fin.readline().strip().split('\t') - for line in fin: - line_dict = dict(zip(header,line.strip().split('\t'))) - line_dict = collections.OrderedDict(sorted(list(line_dict.items()))) - run_id = line_dict['run_accession'] - exp_id = line_dict['experiment_accession'] - library = line_dict['library_layout'] - fastq_files = line_dict['fastq_ftp'] - fastq_md5 = line_dict['fastq_md5'] - print(line_dict) - - db_id = exp_id - sample_dict = collections.OrderedDict() - if library == 'SINGLE': - sample_dict = collections.OrderedDict([('fastq_1',''), ('fastq_2',''), ('md5_1',''), ('md5_2',''), ('single_end','true')]) - if fastq_files: - sample_dict['fastq_1'] = fastq_files - sample_dict['md5_1'] = fastq_md5 - else: - ## In some instances FTP links don't exist for FastQ files - ## These have to be downloaded via fastq-dump / fasterq-dump / parallel-fastq-dump via the run id - db_id = run_id - - elif library == 'PAIRED': - sample_dict = collections.OrderedDict([('fastq_1',''), ('fastq_2',''), ('md5_1',''), ('md5_2',''), ('single_end','false')]) - if fastq_files: - fq_files = fastq_files.split(';')[-2:] - fq_md5 = fastq_md5.split(';')[-2:] - if len(fq_files) == 2: - if fq_files[0].find('_1.fastq.gz') != -1 and fq_files[1].find('_2.fastq.gz') != -1: - sample_dict['fastq_1'] = fq_files[0] - sample_dict['fastq_2'] = fq_files[1] - sample_dict['md5_1'] = fq_md5[0] - sample_dict['md5_2'] = fq_md5[1] - else: - print("Invalid FastQ files found for database id:'{}'!.".format(run_id)) - else: - print("Invalid number of FastQ files ({}) found for paired-end database id:'{}'!.".format(len(fq_files), run_id)) - else: - db_id = run_id - - if sample_dict: - sample_dict.update(line_dict) - if db_id not in runinfo_dict: - runinfo_dict[db_id] = [sample_dict] - else: - if sample_dict in runinfo_dict[db_id]: - print("Input run info file contains duplicate rows!\nLine: '{}'".format(line)) - else: - runinfo_dict[db_id].append(sample_dict) - return runinfo_dict - - -def sra_runinfo_to_ftp(files_in,file_out): - samplesheet_dict = {} - for file_in in files_in: - runinfo_dict = parse_sra_runinfo(file_in) - for db_id in runinfo_dict.keys(): - if db_id not in samplesheet_dict: - samplesheet_dict[db_id] = runinfo_dict[db_id] - else: - print("Duplicate sample identifier found!\nID: '{}'".format(db_id)) - - ## Write samplesheet with paths to FastQ files and md5 sums - if samplesheet_dict: - out_dir = os.path.dirname(file_out) - make_dir(out_dir) - with open(file_out, "w") as fout: - header = ['id'] + list(samplesheet_dict[list(samplesheet_dict.keys())[0]][0].keys()) - fout.write("\t".join(header) + "\n") - for db_id in sorted(samplesheet_dict.keys()): - for idx,val in enumerate(samplesheet_dict[db_id]): - fout.write('\t'.join(["{}_T{}".format(db_id,idx+1)] + [val[x] for x in header[1:]]) + '\n') - - -def main(args=None): - args = parse_args(args) - sra_runinfo_to_ftp([x.strip() for x in args.FILES_IN.split(',')], args.FILE_OUT) - - -if __name__ == '__main__': - sys.exit(main()) diff --git a/bin/sra_runinfo_to_samplesheet.py b/bin/sra_runinfo_to_samplesheet.py deleted file mode 100755 index 77c9eded..00000000 --- a/bin/sra_runinfo_to_samplesheet.py +++ /dev/null @@ -1,91 +0,0 @@ -#!/usr/bin/env python - -import os -import sys -import errno -import argparse - -def parse_args(args=None): - Description = "Create valid nf-core/viralrecon samplesheet file from output of 'fetch_sra_runinfo.py' script." - Epilog = """Example usage: python sra_runinfo_to_samplesheet.py """ - - parser = argparse.ArgumentParser(description=Description, epilog=Epilog) - parser.add_argument('FILE_IN', help="Input metadata file created from 'fetch_sra_runinfo.py' script.") - parser.add_argument('FILE_OUT', help="Output file.") - return parser.parse_args(args) - - -def make_dir(path): - if not len(path) == 0: - try: - os.makedirs(path) - except OSError as exception: - if exception.errno != errno.EEXIST: - raise - - -def sra_runinfo_to_samplesheet(FileIn,FileOut): - - sampleRunDict = {} - fin = open(FileIn,'r') - header = fin.readline().strip().split('\t') - while True: - line = fin.readline() - if line: - line_dict = dict(zip(header,line.strip().split('\t'))) - run_id = line_dict['run_accession'] - exp_id = line_dict['experiment_accession'] - library = line_dict['library_layout'] - fastq_files = line_dict['fastq_ftp'] - fastq_md5 = line_dict['fastq_md5'] - - db_id = exp_id - sample_info = [] ## [single_end, is_sra, is_ftp, fastq_1, fastq_2, md5_1, md5_2] - if library == 'SINGLE': - if fastq_files: - sample_info = ['1', '1', '1', fastq_files , '', fastq_md5, ''] - else: - db_id = run_id - sample_info = ['1', '1', '0', '', '', '', ''] - elif library == 'PAIRED': - if fastq_files: - fq_files = fastq_files.split(';')[-2:] - if fq_files[0].find('_1.fastq.gz') != -1 and fq_files[1].find('_2.fastq.gz') != -1: - sample_info = ['0', '1', '1'] + fq_files + fastq_md5.split(';')[-2:] - else: - print("Invalid FastQ files found for database id:'{}'!.".format(run_id)) - else: - db_id = run_id - sample_info = ['0', '1', '0', '', '', '', ''] - - if sample_info: - if db_id not in sampleRunDict: - sampleRunDict[db_id] = [sample_info] - else: - if sample_info in sampleRunDict[db_id]: - print("Input run info file contains duplicate rows!\nLine: '{}'".format(line)) - else: - sampleRunDict[db_id].append(sample_info) - else: - break - fin.close() - - ## Write samplesheet with appropriate columns - if len(sampleRunDict) != 0: - OutDir = os.path.dirname(FileOut) - make_dir(OutDir) - fout = open(FileOut,'w') - fout.write(','.join(['sample_id', 'single_end', 'is_sra', 'is_ftp', 'fastq_1', 'fastq_2', 'md5_1', 'md5_2']) + '\n') - for db_id in sorted(sampleRunDict.keys()): - for idx,val in enumerate(sampleRunDict[db_id]): - fout.write(','.join(["{}_T{}".format(db_id,idx+1)] + val) + '\n') - fout.close() - - -def main(args=None): - args = parse_args(args) - sra_runinfo_to_samplesheet(args.FILE_IN,args.FILE_OUT) - - -if __name__ == '__main__': - sys.exit(main()) diff --git a/conf/test_sra.config b/conf/test_sra.config deleted file mode 100644 index 8c20a32e..00000000 --- a/conf/test_sra.config +++ /dev/null @@ -1,24 +0,0 @@ -/* -======================================================================================== - Nextflow config file for running minimal tests -======================================================================================== - Defines input files and everything required to run a fast and simple pipeline test. - - Use as follows: - nextflow run nf-core/viralrecon -profile test_sra, - ----------------------------------------------------------------------------------------- -*/ - -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' - - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = 6.GB - max_time = 6.h - - // Input data to test SRA download functionality using SISPA/metagenomics data - public_data_ids = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_sra.txt' -} diff --git a/lib/WorkflowSraDownload.groovy b/lib/WorkflowSraDownload.groovy deleted file mode 100755 index 69e697da..00000000 --- a/lib/WorkflowSraDownload.groovy +++ /dev/null @@ -1,19 +0,0 @@ -// -// This file holds functions specific to the workflow/sra_download.nf in the nf-core/viralrecon pipeline -// - -class WorkflowSraDownload { - - // - // Print a warning after SRA download has completed - // - public static void sraDownloadWarn(log) { - log.warn "=============================================================================\n" + - " Please double-check the samplesheet that has been auto-created using the\n" + - " public database ids provided via the '--public_data_ids' parameter.\n\n" + - " All of the sample metadata obtained from the ENA has been appended\n" + - " as additional columns to help you manually curate the samplesheet before\n" + - " you run the main workflow(s) in the pipeline.\n" + - "===================================================================================" - } -} diff --git a/modules/local/sra_fastq_ftp.nf b/modules/local/sra_fastq_ftp.nf deleted file mode 100644 index 4b800d19..00000000 --- a/modules/local/sra_fastq_ftp.nf +++ /dev/null @@ -1,47 +0,0 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - -process SRA_FASTQ_FTP { - tag "$meta.id" - label 'process_medium' - label 'error_retry' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } - - input: - tuple val(meta), val(fastq) - - output: - tuple val(meta), path("*fastq.gz"), emit: fastq - tuple val(meta), path("*md5") , emit: md5 - - script: - if (meta.single_end) { - """ - bash -c 'until curl $options.args -L ${fastq[0]} -o ${meta.id}.fastq.gz; do sleep 1; done'; - echo "${meta.md5_1} ${meta.id}.fastq.gz" > ${meta.id}.fastq.gz.md5 - md5sum -c ${meta.id}.fastq.gz.md5 - """ - } else { - """ - bash -c 'until curl $options.args -L ${fastq[0]} -o ${meta.id}_1.fastq.gz; do sleep 1; done'; - echo "${meta.md5_1} ${meta.id}_1.fastq.gz" > ${meta.id}_1.fastq.gz.md5 - md5sum -c ${meta.id}_1.fastq.gz.md5 - - bash -c 'until curl $options.args -L ${fastq[1]} -o ${meta.id}_2.fastq.gz; do sleep 1; done'; - echo "${meta.md5_2} ${meta.id}_2.fastq.gz" > ${meta.id}_2.fastq.gz.md5 - md5sum -c ${meta.id}_2.fastq.gz.md5 - """ - } -} diff --git a/modules/local/sra_ids_to_runinfo.nf b/modules/local/sra_ids_to_runinfo.nf deleted file mode 100644 index baee3eb9..00000000 --- a/modules/local/sra_ids_to_runinfo.nf +++ /dev/null @@ -1,31 +0,0 @@ -// Import generic module functions -include { saveFiles; getSoftwareName } from './functions' - -params.options = [:] - -process SRA_IDS_TO_RUNINFO { - tag "$id" - label 'error_retry' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - - conda (params.enable_conda ? "conda-forge::requests=2.24.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/requests:2.24.0" - } else { - container "quay.io/biocontainers/requests:2.24.0" - } - - input: - val id - - output: - path "*.tsv", emit: tsv - - script: - """ - echo $id > id.txt - sra_ids_to_runinfo.py id.txt ${id}.runinfo.tsv - """ -} diff --git a/modules/local/sra_merge_samplesheet.nf b/modules/local/sra_merge_samplesheet.nf deleted file mode 100644 index 423be634..00000000 --- a/modules/local/sra_merge_samplesheet.nf +++ /dev/null @@ -1,31 +0,0 @@ -// Import generic module functions -include { saveFiles; getSoftwareName } from './functions' - -params.options = [:] - -process SRA_MERGE_SAMPLESHEET { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } - - input: - path ('samplesheets/*') - - output: - path "*csv", emit: csv - - script: - """ - head -n 1 `ls ./samplesheets/* | head -n 1` > samplesheet.csv - for fileid in `ls ./samplesheets/*`; do - awk 'NR>1' \$fileid >> samplesheet.csv - done - """ -} diff --git a/modules/local/sra_runinfo_to_ftp.nf b/modules/local/sra_runinfo_to_ftp.nf deleted file mode 100644 index de210b5e..00000000 --- a/modules/local/sra_runinfo_to_ftp.nf +++ /dev/null @@ -1,28 +0,0 @@ -// Import generic module functions -include { saveFiles; getSoftwareName } from './functions' - -params.options = [:] - -process SRA_RUNINFO_TO_FTP { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - - conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/python:3.8.3" - } else { - container "quay.io/biocontainers/python:3.8.3" - } - - input: - path runinfo - - output: - path "*.tsv", emit: tsv - - script: - """ - sra_runinfo_to_ftp.py ${runinfo.join(',')} ${runinfo.toString().tokenize(".")[0]}.runinfo_ftp.tsv - """ -} diff --git a/modules/local/sra_to_samplesheet.nf b/modules/local/sra_to_samplesheet.nf deleted file mode 100644 index ef4487d9..00000000 --- a/modules/local/sra_to_samplesheet.nf +++ /dev/null @@ -1,43 +0,0 @@ -// Import generic module functions -include { saveFiles; getSoftwareName } from './functions' - -params.options = [:] -params.results_dir = '' - -process SRA_TO_SAMPLESHEET { - tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - - memory 100.MB - - input: - tuple val(meta), path(fastq) - - output: - tuple val(meta), path("*csv"), emit: csv - - exec: - // Remove custom keys needed to download the data - def meta_map = meta.clone() - meta_map.remove("id") - meta_map.remove("fastq_1") - meta_map.remove("fastq_2") - meta_map.remove("md5_1") - meta_map.remove("md5_2") - meta_map.remove("single_end") - - // Add required fields for the pipeline to the beginning of the map - pipeline_map = [ - sample : "${meta.id.split('_')[0..-2].join('_')}", - fastq_1 : "${params.outdir}/${params.results_dir}/${fastq[0]}", - fastq_2 : meta.single_end ? '' : "${params.outdir}/${params.results_dir}/${fastq[1]}" - ] - pipeline_map << meta_map - - // Write to file - def file = task.workDir.resolve("${meta.id}.samplesheet.csv") - file.write pipeline_map.keySet().collect{ '"' + it + '"'}.join(",") + '\n' - file.append(pipeline_map.values().collect{ '"' + it + '"'}.join(",")) + '\n' -} diff --git a/workflows/sra_download.nf b/workflows/sra_download.nf deleted file mode 100644 index e3353cc1..00000000 --- a/workflows/sra_download.nf +++ /dev/null @@ -1,117 +0,0 @@ -/* -======================================================================================== - VALIDATE INPUTS -======================================================================================== -*/ - -if (params.public_data_ids) { - Channel - .from(file(params.public_data_ids, checkIfExists: true)) - .splitCsv(header:false, sep:'', strip:true) - .map { it[0] } - .unique() - .set { ch_public_data_ids } -} else { - exit 1, 'Input file with public database ids not specified!' -} - -/* -======================================================================================== - IMPORT LOCAL MODULES/SUBWORKFLOWS -======================================================================================== -*/ - -// Don't overwrite global params.modules, create a copy instead and use that within the main script. -def modules = params.modules.clone() - -include { SRA_IDS_TO_RUNINFO } from '../modules/local/sra_ids_to_runinfo' addParams( options: modules['sra_ids_to_runinfo'] ) -include { SRA_RUNINFO_TO_FTP } from '../modules/local/sra_runinfo_to_ftp' addParams( options: modules['sra_runinfo_to_ftp'] ) -include { SRA_FASTQ_FTP } from '../modules/local/sra_fastq_ftp' addParams( options: modules['sra_fastq_ftp'] ) -include { SRA_TO_SAMPLESHEET } from '../modules/local/sra_to_samplesheet' addParams( options: modules['sra_to_samplesheet'], results_dir: modules['sra_fastq_ftp'].publish_dir ) -include { SRA_MERGE_SAMPLESHEET } from '../modules/local/sra_merge_samplesheet' addParams( options: modules['sra_merge_samplesheet'] ) - -/* -======================================================================================== - RUN MAIN WORKFLOW -======================================================================================== -*/ - -workflow SRA_DOWNLOAD { - - // - // MODULE: Get SRA run information for public database ids - // - SRA_IDS_TO_RUNINFO ( - ch_public_data_ids - ) - - // - // MODULE: Parse SRA run information, create file containing FTP links and read into workflow as [ meta, [reads] ] - // - SRA_RUNINFO_TO_FTP ( - SRA_IDS_TO_RUNINFO.out.tsv - ) - - SRA_RUNINFO_TO_FTP - .out - .tsv - .splitCsv(header:true, sep:'\t') - .map { - meta -> - meta.single_end = meta.single_end.toBoolean() - [ meta, [ meta.fastq_1, meta.fastq_2 ] ] - } - .unique() - .set { ch_sra_reads } - - if (!params.skip_sra_fastq_download) { - // - // MODULE: If FTP link is provided in run information then download FastQ directly via FTP and validate with md5sums - // - SRA_FASTQ_FTP ( - ch_sra_reads.map { meta, reads -> if (meta.fastq_1) [ meta, reads ] } - ) - - // - // MODULE: Stage FastQ files downloaded by SRA together and auto-create a samplesheet for the pipeline - // - SRA_TO_SAMPLESHEET ( - SRA_FASTQ_FTP.out.fastq - ) - - // - // MODULE: Create a merged samplesheet across all samples for the pipeline - // - SRA_MERGE_SAMPLESHEET ( - SRA_TO_SAMPLESHEET.out.csv.collect{it[1]} - ) - - // - // If ids don't have a direct FTP download link write them to file for download outside of the pipeline - // - def no_ids_file = ["${params.outdir}", "${modules['sra_fastq_ftp'].publish_dir}", "IDS_NOT_DOWNLOADED.txt" ].join(File.separator) - ch_sra_reads - .map { meta, reads -> if (!meta.fastq_1) "${meta.id.split('_')[0..-2].join('_')}" } - .unique() - .collectFile(name: no_ids_file, sort: true, newLine: true) - } -} - -/* -======================================================================================== - COMPLETION EMAIL AND SUMMARY -======================================================================================== -*/ - -workflow.onComplete { - def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) - NfcoreTemplate.email(workflow, params, summary_params, projectDir, log) - NfcoreTemplate.summary(workflow, params, log) - WorkflowSraDownload.sraDownloadWarn(log) -} - -/* -======================================================================================== - THE END -======================================================================================== -*/ From 909bd6d9f3c67ae66410a1cb2ff9decf4d99aa4e Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:51:15 +0100 Subject: [PATCH 08/68] Add camelCase bug fix for Schema --- lib/NfcoreSchema.groovy | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy index 852a2ad1..04b18edb 100755 --- a/lib/NfcoreSchema.groovy +++ b/lib/NfcoreSchema.groovy @@ -121,7 +121,8 @@ class NfcoreSchema { def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() - if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !expectedParamsLowerCase.contains(specifiedParamLowerCase)) { + def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { // Temporarily remove camelCase/camel-case params #1035 def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ From 5cb43a1546f7656a7cd5cabb849ea9c75f89746f Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:51:30 +0100 Subject: [PATCH 09/68] Remove SRA download workflow call --- main.nf | 7 ------- 1 file changed, 7 deletions(-) diff --git a/main.nf b/main.nf index 32b1f140..9ac1dcde 100644 --- a/main.nf +++ b/main.nf @@ -49,13 +49,6 @@ WorkflowMain.initialise(workflow, params, log) workflow NFCORE_VIRALRECON { - // - // WORKFLOW: Get SRA run information for public database ids, download and md5sum check FastQ files, auto-create samplesheet - // - if (params.public_data_ids) { - include { SRA_DOWNLOAD } from './workflows/sra_download' - SRA_DOWNLOAD () - // // WORKFLOW: Variant and de novo assembly analysis for Illumina data // From 70095e410d3d8277f78dc6c764b855ac78ae172d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:51:51 +0100 Subject: [PATCH 10/68] Remove SRA download specific parameters --- nextflow.config | 11 +---------- nextflow_schema.json | 26 -------------------------- 2 files changed, 1 insertion(+), 36 deletions(-) diff --git a/nextflow.config b/nextflow.config index d2443436..0d1cb5bd 100644 --- a/nextflow.config +++ b/nextflow.config @@ -14,10 +14,6 @@ params { platform = null protocol = null - // SRA download options - public_data_ids = null - skip_sra_fastq_download = false - // Reference genome options genome = null primer_set = null @@ -155,11 +151,7 @@ profiles { } docker { docker.enabled = true - // Avoid this error: - // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. - // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 - // once this is established and works well, nextflow might implement this behavior as new default. - docker.runOptions = '-u \$(id -u):\$(id -g)' + docker.userEmulation = true singularity.enabled = false podman.enabled = false shifter.enabled = false @@ -195,7 +187,6 @@ profiles { shifter.enabled = false } test { includeConfig 'conf/test.config' } - test_sra { includeConfig 'conf/test_sra.config' } test_sispa { includeConfig 'conf/test_sispa.config' } test_nanopore { includeConfig 'conf/test_nanopore.config' } test_full { includeConfig 'conf/test_full.config' } diff --git a/nextflow_schema.json b/nextflow_schema.json index 86fa9ba4..46e6885e 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -47,29 +47,6 @@ } } }, - "sra_download_options": { - "title": "SRA download options", - "type": "object", - "description": "Options for downloading publicy available data from the SRA.", - "default": "", - "fa_icon": "fas fa-cloud-download-alt", - "properties": { - "public_data_ids": { - "type": "string", - "format": "file-path", - "mimetype": "text/plain", - "pattern": "\\.txt$", - "schema": "assets/schema_public_data_ids.json", - "description": "File containing SRA/ENA/GEO identifiers one per line in order to download their associated FastQ files.", - "fa_icon": "fas fa-database" - }, - "skip_sra_fastq_download": { - "type": "boolean", - "description": "Only download metadata for public data database ids and don't download the FastQ files.", - "fa_icon": "fas fa-fast-forward" - } - } - }, "reference_genome_options": { "title": "Reference genome options", "type": "object", @@ -688,9 +665,6 @@ { "$ref": "#/definitions/input_output_options" }, - { - "$ref": "#/definitions/sra_download_options" - }, { "$ref": "#/definitions/reference_genome_options" }, From 47fec28b9cb8d1e9655d71794789a2fa85e7b98a Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 10:58:26 +0100 Subject: [PATCH 11/68] Remove schema for public_data_ids --- assets/schema_public_data_ids.json | 15 --------------- 1 file changed, 15 deletions(-) delete mode 100644 assets/schema_public_data_ids.json diff --git a/assets/schema_public_data_ids.json b/assets/schema_public_data_ids.json deleted file mode 100644 index 95b87403..00000000 --- a/assets/schema_public_data_ids.json +++ /dev/null @@ -1,15 +0,0 @@ -{ - "$schema": "http://json-schema.org/draft-07/schema", - "$id": "https://raw.githubusercontent.com/nf-core/viralrecon/master/schema_public_data_ids.json", - "title": "nf-core/viralrecon pipeline - params.public_data_ids schema", - "description": "Schema for the file provided with params.public_data_ids", - "type": "array", - "items": { - "type": "array", - "items": { - "type": "string", - "pattern": "^[SEPG][RAS][RXSMPAJXE][EN]?[AB]?\\d{4,9}$", - "errorMessage": "Please provide a valid SRA, GEO or ENA identifier" - } - } -} From 49fd0e5bd1890f21c055af2b9d0f9826ddff400d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 11:38:04 +0100 Subject: [PATCH 12/68] Add links to new docs in README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 03413dc4..834b53e6 100644 --- a/README.md +++ b/README.md @@ -135,7 +135,7 @@ The pipeline has numerous options to allow you to run only specific aspects of t ~/.nextflow/assets/nf-core/viralrecon/bin/fastq_dir_to_samplesheet.py samplesheet.csv ``` - * You can find the default keys used to specify `--genome` in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config). Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys. If you are able to get permissions from the vendor/supplier to share the primer information then we would be more than happy to support it within the pipeline. + * You can find the default keys used to specify `--genome` in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config). Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys; see [usage docs](https://nf-co.re/viralrecon/usage#illumina-primer-sets). ## Documentation From efede866662bd6edb2e1716798766b7ed2ca9e4a Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 11:38:24 +0100 Subject: [PATCH 13/68] Add description for SWIFT protocol to usage docs --- docs/usage.md | 60 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 52 insertions(+), 8 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 4958291d..7c011657 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -10,7 +10,7 @@ You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. -```bash +```console --input '[path to samplesheet file]' ``` @@ -18,7 +18,7 @@ The `sample` identifiers have to be the same when you have re-sequenced the same A final samplesheet file may look something like the one below. `SAMPLE_1` was sequenced twice in Illumina PE format, `SAMPLE_2` was sequenced once in Illumina SE format. -```bash +```console sample,fastq_1,fastq_2 SAMPLE_1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz SAMPLE_1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz @@ -35,13 +35,13 @@ SAMPLE_2,AEG588A2_S4_L003_R1_001.fastq.gz, You have the option to provide a samplesheet to the pipeline that maps sample ids to barcode ids. This allows you to associate barcode ids to clinical/public database identifiers that can be used to QC or pre-process the data with more appropriate sample names. -```bash +```console --input '[path to samplesheet file]' ``` It has to be a comma-separated file with 2 columns. A final samplesheet file may look something like the one below: -```bash +```console sample,barcode 21X983255,1 70H209408,2 @@ -120,11 +120,55 @@ nextflow run nf-core/viralrecon \ -profile ``` +## Illumina primer sets + +The Illumina processing mode of the pipeline has been tested on numerous different primer sets. Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard parameter keys. If you are able to get permissions from the vendor/supplier to share the primer information then we would be more than happy to support it within the pipeline. + +For SARS-CoV-2 data we recommend using the "MN908947.3" genome because it is supported out-of-the-box by the most commonly used primer sets available from the [ARTIC Network](https://artic.network/). For ease of use, we are also maintaining a version of the "MN908947.3" genome along with the appropriate links to the ARTIC primer sets in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config) used by the pipeline. The genomes config file can be updated independently from the main pipeline code to make it possible to dynamically extend this file for other viral genomes/primer sets on request. + +For further information or help, don't hesitate to get in touch on the [Slack `#viralrecon` channel](https://nfcore.slack.com/channels/viralrecon) (you can join with [this invite](https://nf-co.re/join/slack)). + +### ARTIC primer sets + +An example command using v3 ARTIC primers with "MN908947.3": + +```console +nextflow run nf-core/viralrecon \ + --input samplesheet.csv \ + --platform illumina \ + --protocol amplicon \ + --genome 'MN908947.3' \ + --primer_set artic \ + --primer_set_version 3 \ + --skip_assembly \ + -profile +``` + +### SWIFT primer sets + +The [SWIFT amplicon panel](https://swiftbiosci.com/swift-amplicon-sars-cov-2-panel/) is another commonly used method used to prep and sequence SARS-CoV-2 samples. We haven't been able to obtain explicit permission to host standard SWIFT primer sets but you can obtain a masterfile which is freely available from their website that contains the primer sequences as well as genomic co-ordinates. You just need to convert this file to [BED6](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format and provide it to the pipeline with `--primer_bed swift_primers.bed`. Be sure to check the values provided to [`--primer_left_suffix`] and [`--primer_right_suffix`] match the primer names defined in the BED file as highlighted in [this issue](https://github.com/nf-core/viralrecon/issues/169). For an explanation behind the usage of the `--ivar_trim_offset 5` for SWIFT primer sets see [this issue](https://github.com/nf-core/viralrecon/issues/170). + +An example command using SWIFT primers with "MN908947.3": + +```console +nextflow run nf-core/viralrecon \ + --input samplesheet.csv \ + --platform illumina \ + --protocol amplicon \ + --genome 'MN908947.3' \ + --primer_bed swift_primers.bed \ + --primer_left_suffix '_F' \ + --primer_right_suffix '_R' \ + --ivar_trim_offset 5 \ + --skip_assembly \ + -profile +``` + ## Running the pipeline The typical command for running the pipeline is as follows: -```bash +```console nextflow run nf-core/viralrecon --input samplesheet.csv --genome 'MN908947.3' -profile docker ``` @@ -132,7 +176,7 @@ This will launch the pipeline with the `docker` configuration profile. See below Note that the pipeline will create the following files in your working directory: -```bash +```console work # Directory containing the nextflow working files results # Finished results (configurable, see below) .nextflow_log # Log file from Nextflow @@ -143,7 +187,7 @@ results # Finished results (configurable, see below) When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: -```bash +```console nextflow pull nf-core/viralrecon ``` @@ -307,6 +351,6 @@ Some HPC setups also allow you to run nextflow within a cluster job submitted yo In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -```bash +```console NXF_OPTS='-Xms1g -Xmx4g' ``` From 54224ad182ea4d7627701870704038886953e6ac Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:00:27 +0100 Subject: [PATCH 14/68] Fix tests --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 9ac1dcde..27d6ea36 100644 --- a/main.nf +++ b/main.nf @@ -52,7 +52,7 @@ workflow NFCORE_VIRALRECON { // // WORKFLOW: Variant and de novo assembly analysis for Illumina data // - } else if (params.platform == 'illumina') { + if (params.platform == 'illumina') { include { ILLUMINA } from './workflows/illumina' ILLUMINA () From 012ae502c47ca962e7f10d3ac4bc78d02685f4d2 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:11:16 +0100 Subject: [PATCH 15/68] Auto-replace dashes with underscores in sample names --- bin/check_samplesheet.py | 2 ++ docs/usage.md | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py index 9d433f29..7e002e50 100755 --- a/bin/check_samplesheet.py +++ b/bin/check_samplesheet.py @@ -77,6 +77,7 @@ def check_illumina_samplesheet(file_in, file_out): ## Check sample name entries sample, fastq_1, fastq_2 = lspl[: len(HEADER)] + sample = sample.replace('-', '_') if sample: if sample.find(" ") != -1: print_error("Sample entry contains spaces!", "Line", line) @@ -176,6 +177,7 @@ def check_nanopore_samplesheet(file_in, file_out): ## Check sample entry sample, barcode = lspl[: len(HEADER)] + sample = sample.replace('-', '_') if sample: if sample.find(" ") != -1: print_error("Sample entry contains spaces!", "Line", line) diff --git a/docs/usage.md b/docs/usage.md index 7c011657..d9a41449 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -31,6 +31,8 @@ SAMPLE_2,AEG588A2_S4_L003_R1_001.fastq.gz, | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | | `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +> **NB:** Dashes (`-`) in sample names are converted to underscores (`_`) when running [QUAST](http://quast.sourceforge.net/quast) and this causes issues when creating the summary metrics for the pipeline. As a result, dashes in sample names will automatically be replaced with underscores to bypass this issue. + ### Nanopore samplesheet format You have the option to provide a samplesheet to the pipeline that maps sample ids to barcode ids. This allows you to associate barcode ids to clinical/public database identifiers that can be used to QC or pre-process the data with more appropriate sample names. @@ -54,6 +56,8 @@ sample,barcode | `sample` | Custom sample name, one per barcode. | | `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | +> **NB:** Dashes (`-`) in sample names are converted to underscores (`_`) when running [QUAST](http://quast.sourceforge.net/quast) and this causes issues when creating the summary metrics for the pipeline. As a result, dashes in sample names will automatically be replaced with underscores to bypass this issue. + ## Nanopore input format For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/). The [artic minion](https://artic.readthedocs.io/en/latest/commands/) tool from the [ARTIC field bioinformatics pipeline](https://github.com/artic-network/fieldbioinformatics) is used to align reads, call variants and to generate the consensus sequence. From 75da4afdca3e2b415c1bc39f57da4bd7547931b3 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:22:44 +0100 Subject: [PATCH 16/68] Add docs about moving SRA download workflow to fetchngs --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 834b53e6..f80dd57d 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool The pipeline has numerous options to allow you to run only specific aspects of the workflow if you so wish. For example, for Illumina data you can skip the host read filtering step with Kraken 2 with `--skip_kraken2` or you can skip all of the assembly steps with the `--skip_assembly` parameter. See the [usage](https://nf-co.re/viralrecon/usage) and [parameter](https://nf-co.re/viralrecon/parameters) docs for all of the available options when running the pipeline. +The SRA download functionality has been removed from the pipeline (`>=2.1`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline viralrecon` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly by the Illumina processing mode of nf-core/viralrecon. + ### Illumina 1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html)) From d4c5d91fa8042a2a1bc65a04f56bce92b0f748af Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:28:52 +0100 Subject: [PATCH 17/68] Fix tyop --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index d9a41449..658439c0 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -69,7 +69,7 @@ The default variant caller used by artic minion is [Nanopolish](https://github.c ```console . └── fastq_pass - └── barcode26 + └── barcode01 ├── FAP51364_pass_barcode01_97ca62ca_0.fastq ├── FAP51364_pass_barcode01_97ca62ca_1.fastq ├── FAP51364_pass_barcode01_97ca62ca_2.fastq From e4c8817ab75544fd6060769a521003f298c1ded1 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:55:32 +0100 Subject: [PATCH 18/68] Add usage docs about updating container version --- docs/usage.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/usage.md b/docs/usage.md index 658439c0..3f6bc49c 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -333,6 +333,46 @@ params { } ``` +### Updating containers + +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been release. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. + +1. Check the default versions used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) +3. Create the custom config accordingly: + + * For Docker: + + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` + + * For Singularity: + + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` + + * For Conda: + + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` + +> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. + ### nf-core/configs In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. From 7d5812e143fb3f12bf88a4eb49524194387bd0ed Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 12:58:56 +0100 Subject: [PATCH 19/68] fix Tyops --- docs/usage.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 3f6bc49c..25e18c65 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -335,9 +335,9 @@ params { ### Updating containers -The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been release. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. -1. Check the default versions used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) 2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) 3. Create the custom config accordingly: From 5cf9e29fae17a308fe12a529c85ae2fcc2db8414 Mon Sep 17 00:00:00 2001 From: JoseEspinosa Date: Wed, 9 Jun 2021 15:47:38 +0200 Subject: [PATCH 20/68] Update nextclade version to 0.14.4 --- modules/nf-core/software/nextclade/main.nf | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/nf-core/software/nextclade/main.nf b/modules/nf-core/software/nextclade/main.nf index 49161732..24ca7309 100644 --- a/modules/nf-core/software/nextclade/main.nf +++ b/modules/nf-core/software/nextclade/main.nf @@ -11,11 +11,11 @@ process NEXTCLADE { mode: params.publish_dir_mode, saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::nextclade_js=0.14.2" : null) + conda (params.enable_conda ? "bioconda::nextclade_js=0.14.4" : null) if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/nextclade_js:0.14.2--h9ee0642_0" + container "https://depot.galaxyproject.org/singularity/nextclade_js:0.14.4--h9ee0642_0" } else { - container "quay.io/biocontainers/nextclade_js:0.14.2--h9ee0642_0" + container "quay.io/biocontainers/nextclade_js:0.14.4--h9ee0642_0" } input: From 05e0b53b71917d1f12f71f4f1221fb70806076fe Mon Sep 17 00:00:00 2001 From: JoseEspinosa Date: Wed, 9 Jun 2021 15:48:03 +0200 Subject: [PATCH 21/68] Update pangolin version to 2.4.2 --- modules/nf-core/software/pangolin/main.nf | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/nf-core/software/pangolin/main.nf b/modules/nf-core/software/pangolin/main.nf index c78bef20..4efd103f 100644 --- a/modules/nf-core/software/pangolin/main.nf +++ b/modules/nf-core/software/pangolin/main.nf @@ -11,11 +11,11 @@ process PANGOLIN { mode: params.publish_dir_mode, saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? 'bioconda::pangolin=2.4.2' : null) + conda (params.enable_conda ? 'bioconda::pangolin=3.0.5' : null) if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container 'https://depot.galaxyproject.org/singularity/pangolin:2.4.2--pyhdfd78af_0' + container 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' } else { - container 'quay.io/biocontainers/pangolin:2.4.2--pyhdfd78af_0' + container 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' } input: From 5165fe34b970923749814c023b1918fea6195e7f Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 16:09:28 +0100 Subject: [PATCH 22/68] Update CHANGELOG --- CHANGELOG.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2fe9a3ab..1543b6b2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,11 +3,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unpublished Version / DEV] +## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-11 ### Enhancements & fixes -### Parameters +* Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs) +* Added docs about structure of data required for running Nanopore data +* Added docs about using other primer sets for Illumina data +* Added docs about overwriting default container definitions to use latest versions e.g. Pangolin +* Dashes in sample names will be converted to underscores to avoid issues when creating the summary metrics via QUAST + +### Software dependencies + +Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. + +| Dependency | Old version | New version | +|-------------------------------|-------------|-------------| +| `nextclade_js` | 0.14.2 | 0.14.4 | +| `pangolin` | 2.4.2 | 3.0.5 | + +> **NB:** Dependency has been __updated__ if both old and new version information is present. +> **NB:** Dependency has been __added__ if just the new version information is present. +> **NB:** Dependency has been __removed__ if new version information isn't present. ## [[2.0](https://github.com/nf-core/rnaseq/releases/tag/2.0)] - 2021-05-13 From 80f408dff22a9cc90eb0433e7b2dbbcdb6e8695d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 16:09:47 +0100 Subject: [PATCH 23/68] Bump pipeline version to 2.1 --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 0d1cb5bd..aedd3855 100644 --- a/nextflow.config +++ b/nextflow.config @@ -233,7 +233,7 @@ manifest { description = 'Assembly and intrahost/low-frequency variant calling for viral samples' mainScript = 'main.nf' nextflowVersion = '!>=21.04.0' - version = '2.1dev' + version = '2.1' } // Function to ensure that resource requirements don't go beyond From fac02c24279b045df83aeb67681645d598cb4dfd Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 17:32:19 +0100 Subject: [PATCH 24/68] Add warning to MultiQC if 0 reads after adapter trimming --- assets/multiqc_config_illumina.yaml | 524 +++++++++++---------- modules/local/multiqc_custom_onecol_txt.nf | 35 ++ modules/local/multiqc_illumina.nf | 1 + workflows/illumina.nf | 37 +- 4 files changed, 332 insertions(+), 265 deletions(-) create mode 100644 modules/local/multiqc_custom_onecol_txt.nf diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index a8d57bf9..4e8f8206 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -1,270 +1,278 @@ report_comment: > - This report has been generated by the nf-core/viralrecon - analysis pipeline. For information about how to interpret these results, please see the - documentation. + This report has been generated by the nf-core/viralrecon + analysis pipeline. For information about how to interpret these results, please see the + documentation. -data_format: 'yaml' +data_format: "yaml" max_table_rows: 10000 run_modules: - - custom_content - - fastqc - - fastp - - kraken - - bowtie2 - - samtools - - mosdepth - - bcftools - - snpeff - - quast - - cutadapt + - custom_content + - fastqc + - fastp + - kraken + - bowtie2 + - samtools + - mosdepth + - bcftools + - snpeff + - quast + - cutadapt module_order: - - fastqc: - name: 'PREPROCESS: FastQC (raw reads)' - anchor: 'fastqc_raw' - info: 'This section of the report shows FastQC results for the raw reads before adapter trimming.' - path_filters: - - './fastqc/*.zip' - - fastp: - name: 'PREPROCESS: fastp (adapter trimming)' - info: 'This section of the report shows fastp results for reads after adapter and quality trimming.' - - kraken: - name: 'PREPROCESS: Kraken 2' - info: 'This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp.' - - bowtie2: - name: 'VARIANTS: Bowtie 2' - info: 'This section of the report shows Bowtie 2 mapping results for reads after adapter trimming and quality trimming.' - - samtools: - name: 'VARIANTS: SAMTools (raw)' - anchor: 'samtools_bowtie2' - info: 'This section of the report shows SAMTools counts/statistics after mapping with Bowtie 2.' - path_filters: - - './bowtie2/*' - - samtools: - name: 'VARIANTS: SAMTools (iVar)' - anchor: 'samtools_ivar' - info: 'This section of the report shows SAMTools counts/statistics after primer sequence removal with iVar.' - path_filters: - - './ivar_trim/*' - - samtools: - name: 'VARIANTS: SAMTools (MarkDuplicates)' - anchor: 'samtools_markduplicates' - info: 'This section of the report shows SAMTools counts/statistics after duplicate removal with picard MarkDuplicates.' - path_filters: - - './picard_markduplicates/*' - - mosdepth: - name: 'VARIANTS: mosdepth' - info: 'This section of the report shows genome-wide coverage metrics generated by mosdepth.' - - bcftools: - name: 'VARIANTS: BCFTools (iVar)' - anchor: 'bcftools_ivar' - info: 'This section of the report shows BCFTools stats results for variants called by iVar.' - path_filters: - - './variants_ivar/*.txt' - - snpeff: - name: 'VARIANTS: SnpEff (iVar)' - anchor: 'snpeff_ivar' - info: 'This section of the report shows SnpEff results for variants called by iVar.' - path_filters: - - './variants_ivar/*.csv' - - quast: - name: 'VARIANTS: QUAST (iVar)' - anchor: 'quast_ivar' - info: 'This section of the report shows QUAST results for consensus sequences generated from variants with iVar.' - path_filters: - - './variants_ivar/*.tsv' - - bcftools: - name: 'VARIANTS: BCFTools (BCFTools)' - anchor: 'bcftools_bcftools' - info: 'This section of the report shows BCFTools stats results for variants called by BCFTools.' - path_filters: - - './variants_bcftools/*.txt' - - snpeff: - name: 'VARIANTS: SnpEff (BCFTools)' - anchor: 'snpeff_bcftools' - info: 'This section of the report shows SnpEff results for variants called by BCFTools.' - path_filters: - - './variants_bcftools/*.csv' - - quast: - name: 'VARIANTS: QUAST (BCFTools)' - anchor: 'quast_bcftools' - info: 'This section of the report shows QUAST results for consensus sequence generated from BCFTools variants.' - path_filters: - - './variants_bcftools/*.tsv' - - cutadapt: - name: 'ASSEMBLY: Cutadapt (primer trimming)' - info: 'This section of the report shows Cutadapt results for reads after primer sequence trimming.' - - quast: - name: 'ASSEMBLY: QUAST (SPAdes)' - anchor: 'quast_spades' - info: 'This section of the report shows QUAST results from SPAdes de novo assembly.' - path_filters: - - './assembly_spades/*.tsv' - - quast: - name: 'ASSEMBLY: QUAST (Unicycler)' - anchor: 'quast_unicycler' - info: 'This section of the report shows QUAST results from Unicycler de novo assembly.' - path_filters: - - './assembly_unicycler/*.tsv' - - quast: - name: 'ASSEMBLY: QUAST (minia)' - anchor: 'quast_minia' - info: 'This section of the report shows QUAST results from minia de novo assembly.' - path_filters: - - './assembly_minia/*.tsv' + - fastqc: + name: "PREPROCESS: FastQC (raw reads)" + anchor: "fastqc_raw" + info: "This section of the report shows FastQC results for the raw reads before adapter trimming." + path_filters: + - "./fastqc/*.zip" + - fastp: + name: "PREPROCESS: fastp (adapter trimming)" + info: "This section of the report shows fastp results for reads after adapter and quality trimming." + - kraken: + name: "PREPROCESS: Kraken 2" + info: "This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp." + - bowtie2: + name: "VARIANTS: Bowtie 2" + info: "This section of the report shows Bowtie 2 mapping results for reads after adapter trimming and quality trimming." + - samtools: + name: "VARIANTS: SAMTools (raw)" + anchor: "samtools_bowtie2" + info: "This section of the report shows SAMTools counts/statistics after mapping with Bowtie 2." + path_filters: + - "./bowtie2/*" + - samtools: + name: "VARIANTS: SAMTools (iVar)" + anchor: "samtools_ivar" + info: "This section of the report shows SAMTools counts/statistics after primer sequence removal with iVar." + path_filters: + - "./ivar_trim/*" + - samtools: + name: "VARIANTS: SAMTools (MarkDuplicates)" + anchor: "samtools_markduplicates" + info: "This section of the report shows SAMTools counts/statistics after duplicate removal with picard MarkDuplicates." + path_filters: + - "./picard_markduplicates/*" + - mosdepth: + name: "VARIANTS: mosdepth" + info: "This section of the report shows genome-wide coverage metrics generated by mosdepth." + - bcftools: + name: "VARIANTS: BCFTools (iVar)" + anchor: "bcftools_ivar" + info: "This section of the report shows BCFTools stats results for variants called by iVar." + path_filters: + - "./variants_ivar/*.txt" + - snpeff: + name: "VARIANTS: SnpEff (iVar)" + anchor: "snpeff_ivar" + info: "This section of the report shows SnpEff results for variants called by iVar." + path_filters: + - "./variants_ivar/*.csv" + - quast: + name: "VARIANTS: QUAST (iVar)" + anchor: "quast_ivar" + info: "This section of the report shows QUAST results for consensus sequences generated from variants with iVar." + path_filters: + - "./variants_ivar/*.tsv" + - bcftools: + name: "VARIANTS: BCFTools (BCFTools)" + anchor: "bcftools_bcftools" + info: "This section of the report shows BCFTools stats results for variants called by BCFTools." + path_filters: + - "./variants_bcftools/*.txt" + - snpeff: + name: "VARIANTS: SnpEff (BCFTools)" + anchor: "snpeff_bcftools" + info: "This section of the report shows SnpEff results for variants called by BCFTools." + path_filters: + - "./variants_bcftools/*.csv" + - quast: + name: "VARIANTS: QUAST (BCFTools)" + anchor: "quast_bcftools" + info: "This section of the report shows QUAST results for consensus sequence generated from BCFTools variants." + path_filters: + - "./variants_bcftools/*.tsv" + - cutadapt: + name: "ASSEMBLY: Cutadapt (primer trimming)" + info: "This section of the report shows Cutadapt results for reads after primer sequence trimming." + - quast: + name: "ASSEMBLY: QUAST (SPAdes)" + anchor: "quast_spades" + info: "This section of the report shows QUAST results from SPAdes de novo assembly." + path_filters: + - "./assembly_spades/*.tsv" + - quast: + name: "ASSEMBLY: QUAST (Unicycler)" + anchor: "quast_unicycler" + info: "This section of the report shows QUAST results from Unicycler de novo assembly." + path_filters: + - "./assembly_unicycler/*.tsv" + - quast: + name: "ASSEMBLY: QUAST (minia)" + anchor: "quast_minia" + info: "This section of the report shows QUAST results from minia de novo assembly." + path_filters: + - "./assembly_minia/*.tsv" report_section_order: - summary_assembly_metrics: - before: summary_variants_metrics - ivar_variants: - before: mosdepth - software_versions: - order: -1001 - nf-core-viralrecon-summary: - order: -1002 + summary_assembly_metrics: + before: summary_variants_metrics + ivar_variants: + before: mosdepth + software_versions: + order: -1001 + nf-core-viralrecon-summary: + order: -1002 bcftools: - collapse_complementary_changes: true + collapse_complementary_changes: true extra_fn_clean_exts: - - '.markduplicates' - - '.unclassified' + - ".markduplicates" + - ".unclassified" # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: - summary_variants_metrics: - section_name: 'Variant calling metrics' - description: 'generated by the nf-core/viralrecon pipeline' - plot_type: 'table' - headers: - '# Input reads': - description: 'Total number of reads in raw fastq file' - format: '{:,.0f}' - '% Non-host reads (Kraken 2)': - description: 'Total number of non-host reads identified by Kraken2' - format: '{:,.2f}' - '# Trimmed reads (fastp)': - description: 'Total number of reads remaining after adapter/quality trimming with fastp' - format: '{:,.0f}' - '# Mapped reads': - description: 'Total number of Bowtie2 mapped reads relative to the viral genome' - format: '{:,.0f}' - '% Mapped reads': - description: 'Percentage of Bowtie2 mapped reads relative to the viral genome' - format: '{:,.2f}' - '# Trimmed reads (iVar)': - description: 'Total number of reads remaining after primer trimming with iVar' - format: '{:,.0f}' - 'Coverage median': - description: 'Median coverage calculated by mosdepth' - format: '{:,.2f}' - '% Coverage > 1x': - description: 'Coverage > 1x calculated by mosdepth' - format: '{:,.2f}' - '% Coverage > 10x': - description: 'Coverage > 10x calculated by mosdepth' - format: '{:,.2f}' - '# SNPs (iVar)': - description: 'Total number of SNPs called by iVar' - format: '{:,.0f}' - '# INDELs (iVar)': - description: 'Total number of INDELs called by iVar' - format: '{:,.0f}' - '# Missense variants (iVar)': - description: 'Total number of variants called by iVar and identified as missense mutations with SnpEff' - format: '{:,.0f}' - '# Ns per 100kb consensus (iVar)': - description: 'Number of N bases per 100kb in consensus sequence generated by iVar' - format: '{:,.2f}' - 'Pangolin lineage (iVar)': - description: 'Pangolin lineage inferred from the consensus sequence generated by iVar' - '# SNPs (BCFTools)': - description: 'Total number of SNPs called by BCFTools' - format: '{:,.0f}' - '# INDELs (BCFTools)': - description: 'Total number of INDELs called by BCFTools' - format: '{:,.0f}' - '# Missense variants (BCFTools)': - description: 'Total number of variants called by BCFTools and identified as missense mutations with SnpEff' - format: '{:,.0f}' - '# Ns per 100kb consensus (BCFTools)': - description: 'Number of N bases per 100kb in consensus sequence generated by BCFTools' - format: '{:,.2f}' - 'Pangolin lineage (BCFTools)': - description: 'Pangolin lineage inferred from the consensus sequence generated by BCFTools' - pconfig: - id: 'summary_variants_metrics_plot' - table_title: 'Variant calling metrics' - namespace: 'Variant calling metrics' - only_defined_headers: False - format: '{:.0f}' - summary_assembly_metrics: - section_name: 'De novo assembly metrics' - description: 'generated by the nf-core/viralrecon pipeline' - plot_type: 'table' - headers: - '# Input reads': - description: 'Total number of reads in raw fastq file' - format: '{:,.0f}' - '# Trimmed reads (Cutadapt)': - description: 'Total number of reads remaining after adapter/quality trimming with fastp' - format: '{:,.0f}' - '% Non-host reads (Kraken 2)': - description: 'Total number of non-host reads identified by Kraken2' - format: '{:,.2f}' - '# Contigs (SPAdes)': - description: 'Total number of contigs in SPAdes assembly as calculated by QUAST' - format: '{:,.0f}' - 'Largest contig (SPAdes)': - description: 'Size of largest contig in SPAdes assembly as calculated by QUAST' - format: '{:,.0f}' - '% Genome fraction (SPAdes)': - description: '% genome fraction for SPAdes assembly as calculated by QUAST' - format: '{:,.2f}' - 'N50 (SPAdes)': - description: 'N50 metric for SPAdes assembly as calculated by QUAST' - format: '{:,.2f}' - '# Contigs (Unicycler)': - description: 'Total number of contigs in Unicycler assembly as calculated by QUAST' - format: '{:,.0f}' - 'Largest contig (Unicycler)': - description: 'Size of largest contig in Unicycler assembly as calculated by QUAST' - format: '{:,.0f}' - '% Genome fraction (Unicycler)': - description: '% genome fraction for Unicycler assembly as calculated by QUAST' - format: '{:,.2f}' - 'N50 (Unicycler)': - description: 'N50 metric for Unicycler assembly as calculated by QUAST' - format: '{:,.2f}' - '# Contigs (minia)': - description: 'Total number of contigs in minia assembly as calculated by QUAST' - format: '{:,.0f}' - 'Largest contig (minia)': - description: 'Size of largest contig in minia assembly as calculated by QUAST' - format: '{:,.0f}' - '% Genome fraction (minia)': - description: '% genome fraction for minia assembly as calculated by QUAST' - format: '{:,.2f}' - 'N50 (minia)': - description: 'N50 metric for minia assembly as calculated by QUAST' - format: '{:,.2f}' - pconfig: - id: 'summary_assembly_metrics_plot' - table_title: 'De novo assembly metrics' - namespace: 'De novo assembly metrics' - only_defined_headers: False - format: '{:.0f}' - fail_mapped_samples: - section_name: 'WARNING: Fail Alignment Check' - description: "List of samples that failed the Bowtie2 minimum mapped reads threshold specified via the '--min_mapped_reads' parameter, and hence were ignored for the downstream processing steps." - plot_type: 'table' - pconfig: - id: 'fail_mapped_samples_table' - table_title: 'Samples failed mapped read threshold' - namespace: 'Samples failed mapping read threshold' - format: '{:,.0f}' + summary_variants_metrics: + section_name: "Variant calling metrics" + description: "generated by the nf-core/viralrecon pipeline" + plot_type: "table" + headers: + "# Input reads": + description: "Total number of reads in raw fastq file" + format: "{:,.0f}" + "% Non-host reads (Kraken 2)": + description: "Total number of non-host reads identified by Kraken2" + format: "{:,.2f}" + "# Trimmed reads (fastp)": + description: "Total number of reads remaining after adapter/quality trimming with fastp" + format: "{:,.0f}" + "# Mapped reads": + description: "Total number of Bowtie2 mapped reads relative to the viral genome" + format: "{:,.0f}" + "% Mapped reads": + description: "Percentage of Bowtie2 mapped reads relative to the viral genome" + format: "{:,.2f}" + "# Trimmed reads (iVar)": + description: "Total number of reads remaining after primer trimming with iVar" + format: "{:,.0f}" + "Coverage median": + description: "Median coverage calculated by mosdepth" + format: "{:,.2f}" + "% Coverage > 1x": + description: "Coverage > 1x calculated by mosdepth" + format: "{:,.2f}" + "% Coverage > 10x": + description: "Coverage > 10x calculated by mosdepth" + format: "{:,.2f}" + "# SNPs (iVar)": + description: "Total number of SNPs called by iVar" + format: "{:,.0f}" + "# INDELs (iVar)": + description: "Total number of INDELs called by iVar" + format: "{:,.0f}" + "# Missense variants (iVar)": + description: "Total number of variants called by iVar and identified as missense mutations with SnpEff" + format: "{:,.0f}" + "# Ns per 100kb consensus (iVar)": + description: "Number of N bases per 100kb in consensus sequence generated by iVar" + format: "{:,.2f}" + "Pangolin lineage (iVar)": + description: "Pangolin lineage inferred from the consensus sequence generated by iVar" + "# SNPs (BCFTools)": + description: "Total number of SNPs called by BCFTools" + format: "{:,.0f}" + "# INDELs (BCFTools)": + description: "Total number of INDELs called by BCFTools" + format: "{:,.0f}" + "# Missense variants (BCFTools)": + description: "Total number of variants called by BCFTools and identified as missense mutations with SnpEff" + format: "{:,.0f}" + "# Ns per 100kb consensus (BCFTools)": + description: "Number of N bases per 100kb in consensus sequence generated by BCFTools" + format: "{:,.2f}" + "Pangolin lineage (BCFTools)": + description: "Pangolin lineage inferred from the consensus sequence generated by BCFTools" + pconfig: + id: "summary_variants_metrics_plot" + table_title: "Variant calling metrics" + namespace: "Variant calling metrics" + only_defined_headers: False + format: "{:.0f}" + summary_assembly_metrics: + section_name: "De novo assembly metrics" + description: "generated by the nf-core/viralrecon pipeline" + plot_type: "table" + headers: + "# Input reads": + description: "Total number of reads in raw fastq file" + format: "{:,.0f}" + "# Trimmed reads (Cutadapt)": + description: "Total number of reads remaining after adapter/quality trimming with fastp" + format: "{:,.0f}" + "% Non-host reads (Kraken 2)": + description: "Total number of non-host reads identified by Kraken2" + format: "{:,.2f}" + "# Contigs (SPAdes)": + description: "Total number of contigs in SPAdes assembly as calculated by QUAST" + format: "{:,.0f}" + "Largest contig (SPAdes)": + description: "Size of largest contig in SPAdes assembly as calculated by QUAST" + format: "{:,.0f}" + "% Genome fraction (SPAdes)": + description: "% genome fraction for SPAdes assembly as calculated by QUAST" + format: "{:,.2f}" + "N50 (SPAdes)": + description: "N50 metric for SPAdes assembly as calculated by QUAST" + format: "{:,.2f}" + "# Contigs (Unicycler)": + description: "Total number of contigs in Unicycler assembly as calculated by QUAST" + format: "{:,.0f}" + "Largest contig (Unicycler)": + description: "Size of largest contig in Unicycler assembly as calculated by QUAST" + format: "{:,.0f}" + "% Genome fraction (Unicycler)": + description: "% genome fraction for Unicycler assembly as calculated by QUAST" + format: "{:,.2f}" + "N50 (Unicycler)": + description: "N50 metric for Unicycler assembly as calculated by QUAST" + format: "{:,.2f}" + "# Contigs (minia)": + description: "Total number of contigs in minia assembly as calculated by QUAST" + format: "{:,.0f}" + "Largest contig (minia)": + description: "Size of largest contig in minia assembly as calculated by QUAST" + format: "{:,.0f}" + "% Genome fraction (minia)": + description: "% genome fraction for minia assembly as calculated by QUAST" + format: "{:,.2f}" + "N50 (minia)": + description: "N50 metric for minia assembly as calculated by QUAST" + format: "{:,.2f}" + pconfig: + id: "summary_assembly_metrics_plot" + table_title: "De novo assembly metrics" + namespace: "De novo assembly metrics" + only_defined_headers: False + format: "{:.0f}" + fail_mapped_reads: + section_name: "WARNING: Fail Reads Check" + description: "List of samples that had 0 reads after adapter trimming, and hence were ignored for the downstream processing steps." + plot_type: "table" + pconfig: + id: "fail_mapped_reads_table" + table_title: "Samples failed read threshold" + namespace: "Samples failed read threshold" + fail_mapped_samples: + section_name: "WARNING: Fail Alignment Check" + description: "List of samples that failed the Bowtie2 minimum mapped reads threshold specified via the '--min_mapped_reads' parameter, and hence were ignored for the downstream processing steps." + plot_type: "table" + pconfig: + id: "fail_mapped_samples_table" + table_title: "Samples failed mapped read threshold" + namespace: "Samples failed mapping read threshold" + format: "{:,.0f}" # # Customise the module search patterns to speed up execution time # # - Skip module sub-tools that we are not interested in @@ -272,11 +280,11 @@ custom_data: # # - Don't add anything that is the same as the MultiQC default # # See https://multiqc.info/docs/#optimise-file-search-patterns for details sp: - fastp: - fn: '*.fastp.json' - bowtie2: - fn: '*.bowtie2.log' - mosdepth/global_dist: - fn: '*.global.dist.txt' - cutadapt: - fn: '*.cutadapt.log' + fastp: + fn: "*.fastp.json" + bowtie2: + fn: "*.bowtie2.log" + mosdepth/global_dist: + fn: "*.global.dist.txt" + cutadapt: + fn: "*.cutadapt.log" diff --git a/modules/local/multiqc_custom_onecol_txt.nf b/modules/local/multiqc_custom_onecol_txt.nf new file mode 100644 index 00000000..677768d4 --- /dev/null +++ b/modules/local/multiqc_custom_onecol_txt.nf @@ -0,0 +1,35 @@ +// Import generic module functions +include { saveFiles; getSoftwareName } from './functions' + +params.options = [:] + +process MULTIQC_CUSTOM_ONECOL_TXT { + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" + } else { + container "biocontainers/biocontainers:v1.2.0_cv1" + } + + input: + val data + val out_prefix + + output: + path "*.txt" + + script: + if (data.size() > 0) { + """ + echo "${data.join('\n')}" >> ${out_prefix}_mqc.txt + """ + } else { + """ + touch ${out_prefix}_mqc.txt + """ + } +} diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index 306ffa1d..b25379f1 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -22,6 +22,7 @@ process MULTIQC { path multiqc_custom_config path software_versions path workflow_summary + path fail_reads_summary path fail_mapping_summary path ('fastqc/*') path ('fastp/*') diff --git a/workflows/illumina.nf b/workflows/illumina.nf index d4be0c6f..0f80ff90 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -66,15 +66,16 @@ if (!params.skip_variants) { multiqc_options.publish_files.put('variants_metrics_mqc.csv','') } -include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' addParams( options: modules['illumina_bcftools_isec'] ) -include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) -include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) +include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' addParams( options: modules['illumina_bcftools_isec'] ) +include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) +include { MULTIQC_CUSTOM_ONECOL_TXT } from '../modules/local/multiqc_custom_onecol_txt' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules @@ -244,9 +245,30 @@ workflow ILLUMINA { .join(FASTQC_FASTP.out.trim_json) .map { meta, reads, json -> - if (WorkflowIllumina.getFastpReadsAfterFiltering(json) > 0) [ meta, reads ] + pass = WorkflowIllumina.getFastpReadsAfterFiltering(json) > 0 + [ meta, reads, pass ] } + .set { ch_pass_fail_reads } + + ch_pass_fail_reads + .map { meta, reads, pass -> if (pass) [ meta, reads ] } .set { ch_variants_fastq } + + ch_pass_fail_reads + .map { + meta, reads, pass -> + if (!pass) { + fail_mapped_reads[meta.id] = 0 + return [ "$meta.id" ] + } + } + .set { ch_pass_fail_reads } + + MULTIQC_CUSTOM_ONECOL_TXT ( + ch_pass_fail_reads.collect(), + 'fail_mapped_reads' + ) + .set { ch_fail_reads_multiqc } } // @@ -631,6 +653,7 @@ workflow ILLUMINA { ch_multiqc_custom_config.collect().ifEmpty([]), GET_SOFTWARE_VERSIONS.out.yaml.collect(), ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'), + ch_fail_reads_multiqc.ifEmpty([]), ch_fail_mapping_multiqc.ifEmpty([]), FASTQC_FASTP.out.fastqc_raw_zip.collect{it[1]}.ifEmpty([]), FASTQC_FASTP.out.trim_json.collect{it[1]}.ifEmpty([]), From 691865c52016c212070dab290aae381ee8ff2f23 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 17:33:56 +0100 Subject: [PATCH 25/68] EClint fix indents in yaml --- assets/multiqc_config_nanopore.yaml | 210 ++++++++++++++-------------- 1 file changed, 105 insertions(+), 105 deletions(-) diff --git a/assets/multiqc_config_nanopore.yaml b/assets/multiqc_config_nanopore.yaml index 3412dcdd..339601f0 100644 --- a/assets/multiqc_config_nanopore.yaml +++ b/assets/multiqc_config_nanopore.yaml @@ -1,123 +1,123 @@ report_comment: > - This report has been generated by the nf-core/viralrecon - analysis pipeline. For information about how to interpret these results, please see the - documentation. + This report has been generated by the nf-core/viralrecon + analysis pipeline. For information about how to interpret these results, please see the + documentation. -data_format: 'yaml' +data_format: "yaml" max_table_rows: 10000 run_modules: - - custom_content - - pycoqc - - samtools - - bcftools - - mosdepth - - snpeff - - quast + - custom_content + - pycoqc + - samtools + - bcftools + - mosdepth + - snpeff + - quast module_order: - - pycoqc - - samtools: - path_filters: - - './samtools_stats/*' - - mosdepth - - bcftools: - path_filters: - - './bcftools_stats/*.txt' - - snpeff: - path_filters: - - './snpeff/*.csv' - - quast: - path_filters: - - './quast/*.tsv' + - pycoqc + - samtools: + path_filters: + - "./samtools_stats/*" + - mosdepth + - bcftools: + path_filters: + - "./bcftools_stats/*.txt" + - snpeff: + path_filters: + - "./snpeff/*.csv" + - quast: + path_filters: + - "./quast/*.tsv" report_section_order: - software_versions: - order: -1001 - nf-core-viralrecon-summary: - order: -1002 + software_versions: + order: -1001 + nf-core-viralrecon-summary: + order: -1002 bcftools: - collapse_complementary_changes: true + collapse_complementary_changes: true extra_fn_clean_exts: - - '.pass' + - ".pass" # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: - fail_barcodes_no_sample: - section_name: 'WARNING: Barcodes without sample id' - description: "List of barcodes that appear to have reads in the '--fastq_dir' folder but were not specified in mappings samplesheet via '--input'." - plot_type: 'table' - pconfig: - id: 'fail_barcodes_no_sample_table' - table_title: 'Barcodes without sample id' - namespace: 'Barcodes without sample id' - format: '{:,.0f}' - fail_no_barcode_samples: - section_name: 'WARNING: No barcode' - description: "List of samples that were specified in mappings samplesheet via '--input' but didn't have an associated barcode in the '--fastq_dir' folder." - plot_type: 'table' - pconfig: - id: 'fail_no_barcode_samples_table' - table_title: 'Sample ids without barcode' - namespace: 'Sample ids without barcode' - fail_barcode_count_samples: - section_name: 'WARNING: Fail barcode read count' - description: "Samples that failed the minimum number of reads required per barcode specified via the '--min_barcode_reads' parameter, and hence were ignored for the downstream processing steps." - plot_type: 'bargraph' - pconfig: - id: 'fail_barcode_count_samples_table' - table_title: 'Samples failed barcode read count threshold' - namespace: 'Samples failed barcode read count threshold' - format: '{:,.0f}' - fail_guppyplex_count_samples: - section_name: 'WARNING: Fail guppyplex read count' - description: "Samples that failed the minimum number of reads required per sample specified via the '--min_guppyplex_reads' parameter, and hence were ignored for the downstream processing steps." - plot_type: 'bargraph' - pconfig: - id: 'fail_guppyplex_count_samples_table' - table_title: 'Samples failed artic guppyplex read count threshold' - namespace: 'Samples failed artic guppyplex read count threshold' - format: '{:,.0f}' - summary_variants_metrics: - section_name: 'Variant calling metrics' - description: 'generated by the nf-core/viralrecon pipeline' - plot_type: 'table' - headers: - '# Mapped reads': - description: 'Total number of mapped reads relative to the viral genome' - format: '{:,.0f}' - 'Coverage median': - description: 'Median coverage calculated by mosdepth' - format: '{:,.2f}' - '% Coverage > 1x': - description: 'Coverage > 1x calculated by mosdepth' - format: '{:,.2f}' - '% Coverage > 10x': - description: 'Coverage > 10x calculated by mosdepth' - format: '{:,.2f}' - '# SNPs': - description: 'Total number of SNPs called by artic minion that pass quality filters' - format: '{:,.0f}' - '# INDELs': - description: 'Total number of INDELs called by artic minion that pass quality filters' - format: '{:,.0f}' - '# Missense variants': - description: 'Total number of missense mutations identified by variant annotation with SnpEff' - format: '{:,.0f}' - '# Ns per 100kb consensus': - description: 'Number of N bases per 100kb in consensus sequence generated by artic minion' - format: '{:,.2f}' - 'Pangolin lineage': - description: 'Pangolin lineage inferred from the consensus sequence generated by artic minion' - pconfig: - id: 'summary_variants_metrics_plot_table' - table_title: 'Variant calling metrics' - namespace: 'Variant calling metrics' - only_defined_headers: False - format: '{:,.0f}' + fail_barcodes_no_sample: + section_name: "WARNING: Barcodes without sample id" + description: "List of barcodes that appear to have reads in the '--fastq_dir' folder but were not specified in mappings samplesheet via '--input'." + plot_type: "table" + pconfig: + id: "fail_barcodes_no_sample_table" + table_title: "Barcodes without sample id" + namespace: "Barcodes without sample id" + format: "{:,.0f}" + fail_no_barcode_samples: + section_name: "WARNING: No barcode" + description: "List of samples that were specified in mappings samplesheet via '--input' but didn't have an associated barcode in the '--fastq_dir' folder." + plot_type: "table" + pconfig: + id: "fail_no_barcode_samples_table" + table_title: "Sample ids without barcode" + namespace: "Sample ids without barcode" + fail_barcode_count_samples: + section_name: "WARNING: Fail barcode read count" + description: "Samples that failed the minimum number of reads required per barcode specified via the '--min_barcode_reads' parameter, and hence were ignored for the downstream processing steps." + plot_type: "bargraph" + pconfig: + id: "fail_barcode_count_samples_table" + table_title: "Samples failed barcode read count threshold" + namespace: "Samples failed barcode read count threshold" + format: "{:,.0f}" + fail_guppyplex_count_samples: + section_name: "WARNING: Fail guppyplex read count" + description: "Samples that failed the minimum number of reads required per sample specified via the '--min_guppyplex_reads' parameter, and hence were ignored for the downstream processing steps." + plot_type: "bargraph" + pconfig: + id: "fail_guppyplex_count_samples_table" + table_title: "Samples failed artic guppyplex read count threshold" + namespace: "Samples failed artic guppyplex read count threshold" + format: "{:,.0f}" + summary_variants_metrics: + section_name: "Variant calling metrics" + description: "generated by the nf-core/viralrecon pipeline" + plot_type: "table" + headers: + "# Mapped reads": + description: "Total number of mapped reads relative to the viral genome" + format: "{:,.0f}" + "Coverage median": + description: "Median coverage calculated by mosdepth" + format: "{:,.2f}" + "% Coverage > 1x": + description: "Coverage > 1x calculated by mosdepth" + format: "{:,.2f}" + "% Coverage > 10x": + description: "Coverage > 10x calculated by mosdepth" + format: "{:,.2f}" + "# SNPs": + description: "Total number of SNPs called by artic minion that pass quality filters" + format: "{:,.0f}" + "# INDELs": + description: "Total number of INDELs called by artic minion that pass quality filters" + format: "{:,.0f}" + "# Missense variants": + description: "Total number of missense mutations identified by variant annotation with SnpEff" + format: "{:,.0f}" + "# Ns per 100kb consensus": + description: "Number of N bases per 100kb in consensus sequence generated by artic minion" + format: "{:,.2f}" + "Pangolin lineage": + description: "Pangolin lineage inferred from the consensus sequence generated by artic minion" + pconfig: + id: "summary_variants_metrics_plot_table" + table_title: "Variant calling metrics" + namespace: "Variant calling metrics" + only_defined_headers: False + format: "{:,.0f}" # # Customise the module search patterns to speed up execution time # # - Skip module sub-tools that we are not interested in @@ -125,5 +125,5 @@ custom_data: # # - Don't add anything that is the same as the MultiQC default # # See https://multiqc.info/docs/#optimise-file-search-patterns for details sp: - mosdepth/global_dist: - fn: '*.global.dist.txt' + mosdepth/global_dist: + fn: "*.global.dist.txt" From d75af8bf9e26bc72f53db885a0f29305fae81165 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 20:40:50 +0100 Subject: [PATCH 26/68] Finesse code to report samples with no reads after trimming --- CHANGELOG.md | 3 +- assets/multiqc_config_illumina.yaml | 3 +- lib/WorkflowIllumina.groovy | 8 +++++ modules/local/multiqc_custom_onecol_txt.nf | 35 ---------------------- workflows/illumina.nf | 15 ++++++---- 5 files changed, 21 insertions(+), 43 deletions(-) delete mode 100644 modules/local/multiqc_custom_onecol_txt.nf diff --git a/CHANGELOG.md b/CHANGELOG.md index 1543b6b2..ef45a67e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,10 +8,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Enhancements & fixes * Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs) +* Dashes in sample names will be converted to underscores to avoid issues when creating the summary metrics via QUAST +* Add warning to MultiQC report for samples that have no reads after adapter trimming * Added docs about structure of data required for running Nanopore data * Added docs about using other primer sets for Illumina data * Added docs about overwriting default container definitions to use latest versions e.g. Pangolin -* Dashes in sample names will be converted to underscores to avoid issues when creating the summary metrics via QUAST ### Software dependencies diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index 4e8f8206..94c3069f 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -258,12 +258,13 @@ custom_data: format: "{:.0f}" fail_mapped_reads: section_name: "WARNING: Fail Reads Check" - description: "List of samples that had 0 reads after adapter trimming, and hence were ignored for the downstream processing steps." + description: "List of samples that had no reads after adapter trimming, and hence were ignored for the downstream processing steps." plot_type: "table" pconfig: id: "fail_mapped_reads_table" table_title: "Samples failed read threshold" namespace: "Samples failed read threshold" + format: "{:,.0f}" fail_mapped_samples: section_name: "WARNING: Fail Alignment Check" description: "List of samples that failed the Bowtie2 minimum mapped reads threshold specified via the '--min_mapped_reads' parameter, and hence were ignored for the downstream processing steps." diff --git a/lib/WorkflowIllumina.groovy b/lib/WorkflowIllumina.groovy index fbcc540f..f9531376 100755 --- a/lib/WorkflowIllumina.groovy +++ b/lib/WorkflowIllumina.groovy @@ -129,4 +129,12 @@ class WorkflowIllumina { def Map json = (Map) new JsonSlurper().parseText(json_file.text).get('summary') return json['after_filtering']['total_reads'].toInteger() } + + // + // Function that parses fastp json output file to get total number of reads before trimming + // + public static Integer getFastpReadsBeforeFiltering(json_file) { + def Map json = (Map) new JsonSlurper().parseText(json_file.text).get('summary') + return json['before_filtering']['total_reads'].toInteger() + } } diff --git a/modules/local/multiqc_custom_onecol_txt.nf b/modules/local/multiqc_custom_onecol_txt.nf deleted file mode 100644 index 677768d4..00000000 --- a/modules/local/multiqc_custom_onecol_txt.nf +++ /dev/null @@ -1,35 +0,0 @@ -// Import generic module functions -include { saveFiles; getSoftwareName } from './functions' - -params.options = [:] - -process MULTIQC_CUSTOM_ONECOL_TXT { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } - - input: - val data - val out_prefix - - output: - path "*.txt" - - script: - if (data.size() > 0) { - """ - echo "${data.join('\n')}" >> ${out_prefix}_mqc.txt - """ - } else { - """ - touch ${out_prefix}_mqc.txt - """ - } -} diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 0f80ff90..d8cef239 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -70,7 +70,7 @@ include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_ONECOL_TXT } from '../modules/local/multiqc_custom_onecol_txt' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_READS } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) @@ -246,26 +246,29 @@ workflow ILLUMINA { .map { meta, reads, json -> pass = WorkflowIllumina.getFastpReadsAfterFiltering(json) > 0 - [ meta, reads, pass ] + [ meta, reads, json, pass ] } .set { ch_pass_fail_reads } ch_pass_fail_reads - .map { meta, reads, pass -> if (pass) [ meta, reads ] } + .map { meta, reads, json, pass -> if (pass) [ meta, reads ] } .set { ch_variants_fastq } ch_pass_fail_reads .map { - meta, reads, pass -> + meta, reads, json, pass -> if (!pass) { fail_mapped_reads[meta.id] = 0 - return [ "$meta.id" ] + num_reads = WorkflowIllumina.getFastpReadsBeforeFiltering(json) + return [ "$meta.id\t$num_reads" ] } } .set { ch_pass_fail_reads } - MULTIQC_CUSTOM_ONECOL_TXT ( + MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_READS ( ch_pass_fail_reads.collect(), + 'Sample', + 'Reads before trimming', 'fail_mapped_reads' ) .set { ch_fail_reads_multiqc } From d0da25ef3919704268e8341d54330eba89784cbe Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 20:45:03 +0100 Subject: [PATCH 27/68] Create empty channel if skipping trimming --- workflows/illumina.nf | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/workflows/illumina.nf b/workflows/illumina.nf index d8cef239..084c397e 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -240,6 +240,7 @@ workflow ILLUMINA { // // Filter empty FastQ files after adapter trimming // + ch_fail_reads_multiqc = Channel.empty() if (!params.skip_fastp) { ch_variants_fastq .join(FASTQC_FASTP.out.trim_json) @@ -249,11 +250,11 @@ workflow ILLUMINA { [ meta, reads, json, pass ] } .set { ch_pass_fail_reads } - + ch_pass_fail_reads .map { meta, reads, json, pass -> if (pass) [ meta, reads ] } .set { ch_variants_fastq } - + ch_pass_fail_reads .map { meta, reads, json, pass -> From 9c63f9ba6ddbec3e77487dbc19e2c7e0ec60fcf8 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Wed, 9 Jun 2021 23:33:29 +0100 Subject: [PATCH 28/68] Tweak tests --- CHANGELOG.md | 11 +++++++++++ conf/test.config | 2 +- lib/WorkflowMain.groovy | 14 ++++++-------- main.nf | 4 ++-- 4 files changed, 20 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ef45a67e..4cb824b4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * Added docs about using other primer sets for Illumina data * Added docs about overwriting default container definitions to use latest versions e.g. Pangolin +### Parameters + +| Old parameter | New parameter | +|-------------------------------|---------------------------------------| +| `--public_data_ids` | | +| `--skip_sra_fastq_download` | | + +> **NB:** Parameter has been __updated__ if both old and new parameter information is present. +> **NB:** Parameter has been __added__ if just the new parameter information is present. +> **NB:** Parameter has been __removed__ if new parameter information isn't present. + ### Software dependencies Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. diff --git a/conf/test.config b/conf/test.config index 28450b79..0b2295b8 100644 --- a/conf/test.config +++ b/conf/test.config @@ -32,5 +32,5 @@ params { // Other pipeline options callers = 'ivar,bcftools' - assemblers = 'spades,unicycler,minia' + assemblers = 'spades,unicycler' } diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy index b12aacc9..dddc3ea0 100755 --- a/lib/WorkflowMain.groovy +++ b/lib/WorkflowMain.groovy @@ -73,14 +73,12 @@ class WorkflowMain { // Check sequencing platform def platformList = ['illumina', 'nanopore'] - if (!params.public_data_ids) { - if (!params.platform) { - log.error "Platform not specified with e.g. '--platform illumina'. Valid options: ${platformList.join(', ')}." - System.exit(1) - } else if (!platformList.contains(params.platform)) { - log.error "Invalid platform option: '${params.platform}'. Valid options: ${platformList.join(', ')}." - System.exit(1) - } + if (!params.platform) { + log.error "Platform not specified with e.g. '--platform illumina'. Valid options: ${platformList.join(', ')}." + System.exit(1) + } else if (!platformList.contains(params.platform)) { + log.error "Invalid platform option: '${params.platform}'. Valid options: ${platformList.join(', ')}." + System.exit(1) } } diff --git a/main.nf b/main.nf index 27d6ea36..66e0377c 100644 --- a/main.nf +++ b/main.nf @@ -19,10 +19,10 @@ nextflow.enable.dsl = 2 def primer_set = '' def primer_set_version = 0 -if (!params.public_data_ids && params.platform == 'illumina' && params.protocol == 'amplicon') { +if (params.platform == 'illumina' && params.protocol == 'amplicon') { primer_set = params.primer_set primer_set_version = params.primer_set_version -} else if (!params.public_data_ids && params.platform == 'nanopore') { +} else if (params.platform == 'nanopore') { primer_set = 'artic' primer_set_version = params.primer_set_version params.artic_scheme = WorkflowMain.getGenomeAttribute(params, 'scheme', log, primer_set, primer_set_version) From 5fe1fc5b91c06a30997eb9314df2e947a56e4f04 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 00:00:46 +0100 Subject: [PATCH 29/68] Only add summary metrics if skip parameters arent provided --- modules/local/multiqc_illumina.nf | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index b25379f1..075dd796 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -57,8 +57,21 @@ process MULTIQC { def software = getSoftwareName(task.process) def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' """ + ## Run MultiQC once to parse tool logs multiqc -f $options.args $custom_config . + + ## Parse YAML files dumped by MultiQC to obtain metrics multiqc_to_custom_csv.py --platform illumina + + if grep -q skip_assembly workflow_summary_mqc.yaml; then + rm -f *assembly_metrics_mqc.csv + fi + + if grep -q skip_variants workflow_summary_mqc.yaml; then + rm -f *variants_metrics_mqc.csv + fi + + ## Run MultiQC a second time multiqc -f $options.args -e general_stats --ignore *pangolin_lineage_mqc.tsv $custom_config . """ } From 3f910a2e839a06184f0e6aca8761c1375070adb5 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 00:39:44 +0100 Subject: [PATCH 30/68] Update test.config --- conf/test.config | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/conf/test.config b/conf/test.config index 0b2295b8..5fc7629e 100644 --- a/conf/test.config +++ b/conf/test.config @@ -31,6 +31,7 @@ params { kraken2_db = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/genome/kraken2/kraken2_hs22.tar.gz' // Other pipeline options - callers = 'ivar,bcftools' - assemblers = 'spades,unicycler' + callers = 'ivar,bcftools' + assemblers = 'spades,unicycler,minia' + skip_plasmidid = true } From 71c8adc46bae39db0303d07768c602a242ced836 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 11:49:40 +0100 Subject: [PATCH 31/68] Get tests working --- .github/workflows/ci.yml | 1 + conf/test.config | 10 ++++++---- conf/test_sispa.config | 7 +++++-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 66867ad8..61bdec80 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -54,6 +54,7 @@ jobs: --skip_assembly, "--spades_mode corona", "--spades_mode metaviral", + "--skip_plasmidid false --skip_asciigenome", ] steps: - name: Check out pipeline code diff --git a/conf/test.config b/conf/test.config index 5fc7629e..ba5e0938 100644 --- a/conf/test.config +++ b/conf/test.config @@ -2,8 +2,8 @@ ======================================================================================== Nextflow config file for running minimal tests ======================================================================================== - Defines input files and everything required to run a fast and simple pipeline test. - + Defines input files and everything required to run a fast and simple pipeline test. + Use as follows: nextflow run nf-core/viralrecon -profile test, @@ -31,7 +31,9 @@ params { kraken2_db = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/genome/kraken2/kraken2_hs22.tar.gz' // Other pipeline options - callers = 'ivar,bcftools' - assemblers = 'spades,unicycler,minia' + callers = 'ivar,bcftools' + assemblers = 'spades,unicycler,minia' + + // Skip this by default to bypass Github Actions disk quota errors skip_plasmidid = true } diff --git a/conf/test_sispa.config b/conf/test_sispa.config index b631f985..bce9406d 100644 --- a/conf/test_sispa.config +++ b/conf/test_sispa.config @@ -2,8 +2,8 @@ ======================================================================================== Nextflow config file for running minimal tests ======================================================================================== - Defines input files and everything required to run a fast and simple pipeline test. - + Defines input files and everything required to run a fast and simple pipeline test. + Use as follows: nextflow run nf-core/viralrecon -profile test_sispa, @@ -31,4 +31,7 @@ params { // Other pipeline options callers = 'ivar,bcftools' assemblers = 'spades,unicycler,minia' + + // Skip this by default to bypass Github Actions disk quota errors + skip_plasmidid = true } From 5929e1200eb2e368159cd15cb16a8e1a66b88b4f Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 13:17:45 +0100 Subject: [PATCH 32/68] Remove QUAST section from MultiQC for variant calling --- assets/multiqc_config_illumina.yaml | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index 94c3069f..54a3663b 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -69,12 +69,6 @@ module_order: info: "This section of the report shows SnpEff results for variants called by iVar." path_filters: - "./variants_ivar/*.csv" - - quast: - name: "VARIANTS: QUAST (iVar)" - anchor: "quast_ivar" - info: "This section of the report shows QUAST results for consensus sequences generated from variants with iVar." - path_filters: - - "./variants_ivar/*.tsv" - bcftools: name: "VARIANTS: BCFTools (BCFTools)" anchor: "bcftools_bcftools" @@ -87,12 +81,6 @@ module_order: info: "This section of the report shows SnpEff results for variants called by BCFTools." path_filters: - "./variants_bcftools/*.csv" - - quast: - name: "VARIANTS: QUAST (BCFTools)" - anchor: "quast_bcftools" - info: "This section of the report shows QUAST results for consensus sequence generated from BCFTools variants." - path_filters: - - "./variants_bcftools/*.tsv" - cutadapt: name: "ASSEMBLY: Cutadapt (primer trimming)" info: "This section of the report shows Cutadapt results for reads after primer sequence trimming." From f889766353df3ef558967016d6d564f813c4a8a2 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 14:02:21 +0100 Subject: [PATCH 33/68] Fixes #201 --- main.nf | 5 +++-- workflows/illumina.nf | 12 ++++++++---- workflows/nanopore.nf | 25 ++++++++++++------------- 3 files changed, 23 insertions(+), 19 deletions(-) diff --git a/main.nf b/main.nf index 66e0377c..35f7dbd4 100644 --- a/main.nf +++ b/main.nf @@ -47,20 +47,21 @@ WorkflowMain.initialise(workflow, params, log) ======================================================================================== */ +include { ILLUMINA } from './workflows/illumina' +include { NANOPORE } from './workflows/nanopore' + workflow NFCORE_VIRALRECON { // // WORKFLOW: Variant and de novo assembly analysis for Illumina data // if (params.platform == 'illumina') { - include { ILLUMINA } from './workflows/illumina' ILLUMINA () // // WORKFLOW: Variant analysis for Nanopore data // } else if (params.platform == 'nanopore') { - include { NANOPORE } from './workflows/nanopore' NANOPORE () } } diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 084c397e..b3651912 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -14,7 +14,9 @@ def valid_params = [ def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) // Validate input parameters -WorkflowIllumina.initialise(params, log, valid_params) +if (params.platform == 'illumina') { + WorkflowIllumina.initialise(params, log, valid_params) +} // Check input path parameters to see if they exist def checkPathParamList = [ @@ -691,9 +693,11 @@ workflow ILLUMINA { ======================================================================================== */ -workflow.onComplete { - NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report, fail_mapped_reads) - NfcoreTemplate.summary(workflow, params, log, fail_mapped_reads, pass_mapped_reads) +if (params.platform == 'illumina') { + workflow.onComplete { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report, fail_mapped_reads) + NfcoreTemplate.summary(workflow, params, log, fail_mapped_reads, pass_mapped_reads) + } } /* diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 23b378a4..ec685360 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -12,7 +12,9 @@ def valid_params = [ def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) // Validate input parameters -WorkflowNanopore.initialise(params, log, valid_params) +if (params.platform == 'nanopore') { + WorkflowNanopore.initialise(params, log, valid_params) +} def checkPathParamList = [ params.input, params.fastq_dir, params.fast5_dir, @@ -23,6 +25,10 @@ for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true // Stage dummy file to be used as an optional input where required ch_dummy_file = file("$projectDir/assets/dummy_file.txt", checkIfExists: true) +// MultiQC config files +ch_multiqc_config = file("$projectDir/assets/multiqc_config_nanopore.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty() + if (params.input) { ch_input = file(params.input) } if (params.fast5_dir) { ch_fast5_dir = file(params.fast5_dir) } else { ch_fast5_dir = ch_dummy_file } if (params.sequencing_summary) { ch_sequencing_summary = file(params.sequencing_summary) } else { ch_sequencing_summary = ch_multiqc_config } @@ -35,15 +41,6 @@ if (params.artic_minion_caller == 'medaka') { } } -/* -======================================================================================== - CONFIG FILES -======================================================================================== -*/ - -ch_multiqc_config = file("$projectDir/assets/multiqc_config_nanopore.yaml", checkIfExists: true) -ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty() - /* ======================================================================================== IMPORT LOCAL MODULES/SUBWORKFLOWS @@ -511,9 +508,11 @@ workflow NANOPORE { ======================================================================================== */ -workflow.onComplete { - NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) - NfcoreTemplate.summary(workflow, params, log) +if (params.platform == 'nanopore') { + workflow.onComplete { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + NfcoreTemplate.summary(workflow, params, log) + } } /* From fff152d15ee36d47a4eaf9b25675d1a46723f357 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 14:13:16 +0100 Subject: [PATCH 34/68] Rename software versions from csv to tsv --- bin/scrape_software_versions.py | 4 ++-- docs/output.md | 2 +- modules/local/get_software_versions.nf | 2 +- workflows/illumina.nf | 2 +- workflows/nanopore.nf | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 4394e5af..fa933f47 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -30,7 +30,7 @@ print("
{}
{}
".format(k, v)) print(" ") -# Write out regexes as csv file: -with open("software_versions.csv", "w") as f: +# Write out as tsv file: +with open("software_versions.tsv", "w") as f: for k, v in sorted(results.items()): f.write("{}\t{}\n".format(k, v)) diff --git a/docs/output.md b/docs/output.md index cae772ae..c40add44 100644 --- a/docs/output.md +++ b/docs/output.md @@ -854,7 +854,7 @@ A number of genome-specific files are generated by the pipeline because they are * `pipeline_info/` * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`. * Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. * Documentation for interpretation of results in HTML format: `results_description.html`. diff --git a/modules/local/get_software_versions.nf b/modules/local/get_software_versions.nf index 7078394a..112258bf 100644 --- a/modules/local/get_software_versions.nf +++ b/modules/local/get_software_versions.nf @@ -21,7 +21,7 @@ process GET_SOFTWARE_VERSIONS { path versions output: - path "software_versions.csv" , emit: csv + path "software_versions.tsv" , emit: tsv path 'software_versions_mqc.yaml', emit: yaml script: diff --git a/workflows/illumina.nf b/workflows/illumina.nf index b3651912..eb9d430a 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -70,7 +70,7 @@ if (!params.skip_variants) { include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' addParams( options: modules['illumina_bcftools_isec'] ) include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_READS } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index ec685360..1e18e0f8 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -54,7 +54,7 @@ def multiqc_options = modules['nanopore_multiqc'] multiqc_options.args += params.multiqc_title ? Utils.joinModuleArgs(["--title \"$params.multiqc_title\""]) : '' include { ASCIIGENOME } from '../modules/local/asciigenome' addParams( options: modules['nanopore_asciigenome'] ) -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) From 85bfd4ed1938e698bd9b1e22b25dabefe2182371 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 14:40:15 +0100 Subject: [PATCH 35/68] Generalise MultiQC module to generate custom tsv --- ...om_twocol_tsv.nf => multiqc_custom_tsv.nf} | 7 ++-- workflows/illumina.nf | 32 ++++++++----------- workflows/nanopore.nf | 25 ++++++--------- 3 files changed, 27 insertions(+), 37 deletions(-) rename modules/local/{multiqc_custom_twocol_tsv.nf => multiqc_custom_tsv.nf} (87%) diff --git a/modules/local/multiqc_custom_twocol_tsv.nf b/modules/local/multiqc_custom_tsv.nf similarity index 87% rename from modules/local/multiqc_custom_twocol_tsv.nf rename to modules/local/multiqc_custom_tsv.nf index 89a603a9..dc0f0883 100644 --- a/modules/local/multiqc_custom_twocol_tsv.nf +++ b/modules/local/multiqc_custom_tsv.nf @@ -3,7 +3,7 @@ include { saveFiles; getSoftwareName } from './functions' params.options = [:] -process MULTIQC_CUSTOM_TWOCOL_TSV { +process MULTIQC_CUSTOM_TSV { publishDir "${params.outdir}", mode: params.publish_dir_mode, saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } @@ -17,8 +17,7 @@ process MULTIQC_CUSTOM_TWOCOL_TSV { input: val tsv_data - val col1_name - val col2_name + val col_names val out_prefix output: @@ -27,7 +26,7 @@ process MULTIQC_CUSTOM_TWOCOL_TSV { script: if (tsv_data.size() > 0) { """ - echo "${col1_name}\t${col2_name}" > ${out_prefix}_mqc.tsv + echo "${col_names}" > ${out_prefix}_mqc.tsv echo "${tsv_data.join('\n')}" >> ${out_prefix}_mqc.tsv """ } else { diff --git a/workflows/illumina.nf b/workflows/illumina.nf index eb9d430a..1ca9562c 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -72,12 +72,12 @@ include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_READS } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_TWOCOL_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules @@ -268,10 +268,9 @@ workflow ILLUMINA { } .set { ch_pass_fail_reads } - MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_READS ( + MULTIQC_CUSTOM_TSV_FAIL_READS ( ch_pass_fail_reads.collect(), - 'Sample', - 'Reads before trimming', + 'Sample\tReads before trimming', 'fail_mapped_reads' ) .set { ch_fail_reads_multiqc } @@ -349,10 +348,9 @@ workflow ILLUMINA { } .set { ch_pass_fail_mapped } - MULTIQC_CUSTOM_TWOCOL_TSV_FAIL_MAPPED ( + MULTIQC_CUSTOM_TSV_FAIL_MAPPED ( ch_pass_fail_mapped.fail.collect(), - 'Sample', - 'Mapped reads', + 'Sample\tMapped reads', 'fail_mapped_samples' ) .set { ch_fail_mapping_multiqc } @@ -476,10 +474,9 @@ workflow ILLUMINA { } .set { ch_ivar_pangolin_multiqc } - MULTIQC_CUSTOM_TWOCOL_TSV_IVAR_PANGOLIN ( + MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN ( ch_ivar_pangolin_multiqc.collect(), - 'Sample', - 'Lineage', + 'Sample\tLineage', 'ivar_pangolin_lineage' ) .set { ch_ivar_pangolin_multiqc } @@ -528,10 +525,9 @@ workflow ILLUMINA { } .set { ch_bcftools_pangolin_multiqc } - MULTIQC_CUSTOM_TWOCOL_TSV_BCFTOOLS_PANGOLIN ( + MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN ( ch_bcftools_pangolin_multiqc.collect(), - 'Sample', - 'Lineage', + 'Sample\tLineage', 'bcftools_pangolin_lineage' ) .set { ch_bcftools_pangolin_multiqc } diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 1e18e0f8..68af87f9 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -57,11 +57,11 @@ include { ASCIIGENOME } from '../modules/local/asciigenome' include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TWOCOL_TSV as MULTIQC_CUSTOM_PANGOLIN } from '../modules/local/multiqc_custom_twocol_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) @@ -189,8 +189,7 @@ workflow NANOPORE { MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME ( ch_barcodes_no_sample.collect(), - 'Barcode', - 'Read count', + 'Barcode\tRead count', 'fail_barcodes_no_sample' ) ch_custom_no_sample_name_multiqc = MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME.out @@ -205,8 +204,7 @@ workflow NANOPORE { MULTIQC_CUSTOM_FAIL_NO_BARCODES ( ch_samples_no_barcode.collect(), - 'Sample', - 'Missing barcode', + 'Sample\tMissing barcode', 'fail_no_barcode_samples' ) ch_custom_no_barcodes_multiqc = MULTIQC_CUSTOM_FAIL_NO_BARCODES.out @@ -247,8 +245,7 @@ workflow NANOPORE { MULTIQC_CUSTOM_FAIL_BARCODE_COUNT ( ch_pass_fail_barcode_count.fail.collect(), - 'Sample', - 'Barcode count', + 'Sample\tBarcode count', 'fail_barcode_count_samples' ) @@ -283,8 +280,7 @@ workflow NANOPORE { MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT ( ch_pass_fail_guppyplex_count.fail.collect(), - 'Sample', - 'Read count', + 'Sample\tRead count', 'fail_guppyplex_count_samples' ) @@ -381,8 +377,7 @@ workflow NANOPORE { MULTIQC_CUSTOM_PANGOLIN ( ch_pangolin_multiqc.collect(), - 'Sample', - 'Lineage', + 'Sample\tLineage', 'pangolin_lineage' ) .set { ch_pangolin_multiqc } From 40d520e3d642a0b85fcec8609af8fd1352c132a6 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 14:43:32 +0100 Subject: [PATCH 36/68] Revert to conditional include in main script --- main.nf | 7 +++++-- workflows/illumina.nf | 12 ++++-------- workflows/nanopore.nf | 12 ++++-------- 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/main.nf b/main.nf index 35f7dbd4..9d9eb79d 100644 --- a/main.nf +++ b/main.nf @@ -47,8 +47,11 @@ WorkflowMain.initialise(workflow, params, log) ======================================================================================== */ -include { ILLUMINA } from './workflows/illumina' -include { NANOPORE } from './workflows/nanopore' +if (params.platform == 'illumina') { + include { ILLUMINA } from './workflows/illumina' +} else if (params.platform == 'nanopore') { + include { NANOPORE } from './workflows/nanopore' +} workflow NFCORE_VIRALRECON { diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 1ca9562c..fabadb31 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -14,9 +14,7 @@ def valid_params = [ def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) // Validate input parameters -if (params.platform == 'illumina') { - WorkflowIllumina.initialise(params, log, valid_params) -} +WorkflowIllumina.initialise(params, log, valid_params) // Check input path parameters to see if they exist def checkPathParamList = [ @@ -689,11 +687,9 @@ workflow ILLUMINA { ======================================================================================== */ -if (params.platform == 'illumina') { - workflow.onComplete { - NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report, fail_mapped_reads) - NfcoreTemplate.summary(workflow, params, log, fail_mapped_reads, pass_mapped_reads) - } +workflow.onComplete { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report, fail_mapped_reads) + NfcoreTemplate.summary(workflow, params, log, fail_mapped_reads, pass_mapped_reads) } /* diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 68af87f9..977b7d5b 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -12,9 +12,7 @@ def valid_params = [ def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) // Validate input parameters -if (params.platform == 'nanopore') { - WorkflowNanopore.initialise(params, log, valid_params) -} +WorkflowNanopore.initialise(params, log, valid_params) def checkPathParamList = [ params.input, params.fastq_dir, params.fast5_dir, @@ -503,11 +501,9 @@ workflow NANOPORE { ======================================================================================== */ -if (params.platform == 'nanopore') { - workflow.onComplete { - NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) - NfcoreTemplate.summary(workflow, params, log) - } +workflow.onComplete { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + NfcoreTemplate.summary(workflow, params, log) } /* From ea3e475ae19b9dbc5ac51313ebd7e0f604208f57 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 16:25:14 +0100 Subject: [PATCH 37/68] Add Pangolin scorpio calls to default variant calling summary metrics --- CHANGELOG.md | 1 + assets/multiqc_config_illumina.yaml | 4 ++++ assets/multiqc_config_nanopore.yaml | 2 ++ bin/multiqc_to_custom_csv.py | 14 ++++++++++---- lib/WorkflowCommons.groovy | 24 +++++++++++++++++------ workflows/illumina.nf | 30 +++++++++++++++-------------- workflows/nanopore.nf | 21 ++++++++++---------- 7 files changed, 62 insertions(+), 34 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4cb824b4..646d9f24 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Enhancements & fixes * Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs) +* Added Pangolin VOC scorpio calls to default variant calling summary metrics * Dashes in sample names will be converted to underscores to avoid issues when creating the summary metrics via QUAST * Add warning to MultiQC report for samples that have no reads after adapter trimming * Added docs about structure of data required for running Nanopore data diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index 54a3663b..c407d97b 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -168,6 +168,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (iVar)": description: "Pangolin lineage inferred from the consensus sequence generated by iVar" + "Pangolin scorpio call (iVar)": + description: "Pangolin scorpio call inferred from the consensus sequence generated by iVar" "# SNPs (BCFTools)": description: "Total number of SNPs called by BCFTools" format: "{:,.0f}" @@ -182,6 +184,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (BCFTools)": description: "Pangolin lineage inferred from the consensus sequence generated by BCFTools" + "Pangolin scorpio call (BCFTools)": + description: "Pangolin scorpio call inferred from the consensus sequence generated by BCFTools" pconfig: id: "summary_variants_metrics_plot" table_title: "Variant calling metrics" diff --git a/assets/multiqc_config_nanopore.yaml b/assets/multiqc_config_nanopore.yaml index 339601f0..3d0135e0 100644 --- a/assets/multiqc_config_nanopore.yaml +++ b/assets/multiqc_config_nanopore.yaml @@ -112,6 +112,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage": description: "Pangolin lineage inferred from the consensus sequence generated by artic minion" + "Pangolin scorpio call": + description: "Pangolin scorpio call inferred from the consensus sequence generated by artic minion" pconfig: id: "summary_variants_metrics_plot_table" table_title: "Variant calling metrics" diff --git a/bin/multiqc_to_custom_csv.py b/bin/multiqc_to_custom_csv.py index f6f5bfb0..895e611f 100755 --- a/bin/multiqc_to_custom_csv.py +++ b/bin/multiqc_to_custom_csv.py @@ -109,7 +109,10 @@ def metrics_dict_to_file(file_field_list, multiqc_data_dir, out_file, valid_samp row_list = [k] for field in field_list: if field in metrics_dict[k]: - row_list.append(metrics_dict[k][field]) + if metrics_dict[k][field]: + row_list.append(metrics_dict[k][field]) + else: + row_list.append('NA') else: row_list.append('NA') fout.write('{}\n'.format(','.join(map(str,row_list)))) @@ -135,12 +138,14 @@ def main(args=None): ('# INDELs (iVar)', ['number_of_indels'])]), ('multiqc_snpeff_snpeff_ivar.yaml', [('# Missense variants (iVar)', ['MISSENSE'])]), ('multiqc_quast_quast_ivar.yaml', [('# Ns per 100kb consensus (iVar)', ["# N's per 100 kbp"])]), - ('multiqc_ivar_pangolin_lineage.yaml', [('Pangolin lineage (iVar)', ["Lineage"])]), + ('multiqc_ivar_pangolin_lineage.yaml', [('Pangolin lineage (iVar)', ["Lineage"]), + ('Pangolin scorpio call (iVar)', ["Scorpio call"])]), ('multiqc_bcftools_stats_bcftools_bcftools.yaml', [('# SNPs (BCFTools)', ['number_of_SNPs']), ('# INDELs (BCFTools)', ['number_of_indels'])]), ('multiqc_snpeff_snpeff_bcftools.yaml', [('# Missense variants (BCFTools)', ['MISSENSE'])]), ('multiqc_quast_quast_bcftools.yaml', [('# Ns per 100kb consensus (BCFTools)', ["# N's per 100 kbp"])]), - ('multiqc_bcftools_pangolin_lineage.yaml', [('Pangolin lineage (BCFTools)', ["Lineage"])]) + ('multiqc_bcftools_pangolin_lineage.yaml', [('Pangolin lineage (BCFTools)', ["Lineage"]), + ('Pangolin scorpio call (BCFTools)', ["Scorpio call"])]) ] illumina_assembly_files = [ @@ -170,7 +175,8 @@ def main(args=None): ('# INDELs', ['number_of_indels'])]), ('multiqc_snpeff.yaml', [('# Missense variants', ['MISSENSE'])]), ('multiqc_quast.yaml', [('# Ns per 100kb consensus', ["# N's per 100 kbp"])]), - ('multiqc_pangolin_lineage.yaml', [('Pangolin lineage', ["Lineage"])]) + ('multiqc_pangolin_lineage.yaml', [('Pangolin lineage', ["Lineage"]), + ('Pangolin scorpio call', ["Scorpio call"])]) ] if args.PLATFORM == 'illumina': diff --git a/lib/WorkflowCommons.groovy b/lib/WorkflowCommons.groovy index a9a94421..e095f81b 100755 --- a/lib/WorkflowCommons.groovy +++ b/lib/WorkflowCommons.groovy @@ -74,14 +74,26 @@ class WorkflowCommons { } // - // Function to get lineage from Pangolin output file + // Function to get field entry from Pangolin output file // - public static String getPangolinLineage(pangolin_report) { - def lineage = '' - pangolin_report.eachLine { line -> - lineage = line.split(',')[1] + // See: https://stackoverflow.com/a/67766919 + public static String getFieldFromPangolinReport(pangolin_report, col_name) { + def headers = [] + def field = '' + pangolin_report.readLines().eachWithIndex { row, row_index -> + if (row_index == 0) { + headers = row.split(',') + } else { + def col_map = [:] + def cells = row.split(',').eachWithIndex { cell, cell_index -> + col_map[headers[cell_index]] = cell + } + if (col_map.containsKey(col_name)) { + field = col_map[col_name] + } + } } - return lineage + return field } // diff --git a/workflows/illumina.nf b/workflows/illumina.nf index fabadb31..96853a74 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -66,14 +66,14 @@ if (!params.skip_variants) { multiqc_options.publish_files.put('variants_metrics_mqc.csv','') } -include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' addParams( options: modules['illumina_bcftools_isec'] ) -include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) -include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' addParams( options: modules['illumina_bcftools_isec'] ) +include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) @@ -467,14 +467,15 @@ workflow ILLUMINA { // ch_ivar_pangolin_report .map { meta, report -> - def lineage = WorkflowCommons.getPangolinLineage(report) - return [ "$meta.id\t$lineage" ] + def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') + def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') + return [ "$meta.id\t$lineage\t$scorpio_call" ] } .set { ch_ivar_pangolin_multiqc } MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN ( ch_ivar_pangolin_multiqc.collect(), - 'Sample\tLineage', + 'Sample\tLineage\tScorpio call', 'ivar_pangolin_lineage' ) .set { ch_ivar_pangolin_multiqc } @@ -518,14 +519,15 @@ workflow ILLUMINA { // ch_bcftools_pangolin_report .map { meta, report -> - def lineage = WorkflowCommons.getPangolinLineage(report) - return [ "$meta.id\t$lineage" ] + def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') + def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') + return [ "$meta.id\t$lineage\t$scorpio_call" ] } .set { ch_bcftools_pangolin_multiqc } MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN ( ch_bcftools_pangolin_multiqc.collect(), - 'Sample\tLineage', + 'Sample\tLineage\tScorpio call', 'bcftools_pangolin_lineage' ) .set { ch_bcftools_pangolin_multiqc } diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 977b7d5b..7d01b88a 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -55,13 +55,13 @@ include { ASCIIGENOME } from '../modules/local/asciigenome' include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) -include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) +include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules @@ -368,14 +368,15 @@ workflow NANOPORE { .out .report .map { meta, report -> - def lineage = WorkflowCommons.getPangolinLineage(report) - return [ "$meta.id\t$lineage" ] + def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') + def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') + return [ "$meta.id\t$lineage\t$scorpio_call" ] } .set { ch_pangolin_multiqc } MULTIQC_CUSTOM_PANGOLIN ( ch_pangolin_multiqc.collect(), - 'Sample\tLineage', + 'Sample\tLineage\tScorpio call', 'pangolin_lineage' ) .set { ch_pangolin_multiqc } From f08e1826beed64862e98820f36547ef9c61836dc Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 16:31:50 +0100 Subject: [PATCH 38/68] Fix EClint --- lib/WorkflowCommons.groovy | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/WorkflowCommons.groovy b/lib/WorkflowCommons.groovy index e095f81b..218f9bf1 100755 --- a/lib/WorkflowCommons.groovy +++ b/lib/WorkflowCommons.groovy @@ -81,8 +81,8 @@ class WorkflowCommons { def headers = [] def field = '' pangolin_report.readLines().eachWithIndex { row, row_index -> - if (row_index == 0) { - headers = row.split(',') + if (row_index == 0) { + headers = row.split(',') } else { def col_map = [:] def cells = row.split(',').eachWithIndex { cell, cell_index -> From 3035d3228c61e17c2eba703393ca1f92d96646fd Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:42:49 +0100 Subject: [PATCH 39/68] Fix review comments --- bin/check_samplesheet.py | 12 ++++-------- docs/usage.md | 10 +++++----- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py index 7e002e50..e9ee86b7 100755 --- a/bin/check_samplesheet.py +++ b/bin/check_samplesheet.py @@ -78,10 +78,8 @@ def check_illumina_samplesheet(file_in, file_out): ## Check sample name entries sample, fastq_1, fastq_2 = lspl[: len(HEADER)] sample = sample.replace('-', '_') - if sample: - if sample.find(" ") != -1: - print_error("Sample entry contains spaces!", "Line", line) - else: + sample = sample.replace(' ', '_') + if not sample: print_error("Sample entry has not been specified!", "Line", line) ## Check FastQ file extension @@ -178,10 +176,8 @@ def check_nanopore_samplesheet(file_in, file_out): ## Check sample entry sample, barcode = lspl[: len(HEADER)] sample = sample.replace('-', '_') - if sample: - if sample.find(" ") != -1: - print_error("Sample entry contains spaces!", "Line", line) - else: + sample = sample.replace(' ', '_') + if not sample: print_error("Sample entry has not been specified!", "Line", line) ## Check barcode entry diff --git a/docs/usage.md b/docs/usage.md index 25e18c65..f7ec1a1d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -4,9 +4,9 @@ > _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ -## Introduction +## Samplesheet format -### Illumina samplesheet format +### Illumina You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. @@ -31,9 +31,9 @@ SAMPLE_2,AEG588A2_S4_L003_R1_001.fastq.gz, | `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | | `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | -> **NB:** Dashes (`-`) in sample names are converted to underscores (`_`) when running [QUAST](http://quast.sourceforge.net/quast) and this causes issues when creating the summary metrics for the pipeline. As a result, dashes in sample names will automatically be replaced with underscores to bypass this issue. +> **NB:** Dashes (`-`) and spaces in sample names are automatically converted to underscores (`_`) to avoid downstream issues in the pipeline. -### Nanopore samplesheet format +### Nanopore You have the option to provide a samplesheet to the pipeline that maps sample ids to barcode ids. This allows you to associate barcode ids to clinical/public database identifiers that can be used to QC or pre-process the data with more appropriate sample names. @@ -56,7 +56,7 @@ sample,barcode | `sample` | Custom sample name, one per barcode. | | `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | -> **NB:** Dashes (`-`) in sample names are converted to underscores (`_`) when running [QUAST](http://quast.sourceforge.net/quast) and this causes issues when creating the summary metrics for the pipeline. As a result, dashes in sample names will automatically be replaced with underscores to bypass this issue. +> **NB:** Dashes (`-`) and spaces in sample names are automatically converted to underscores (`_`) to avoid downstream issues in the pipeline. ## Nanopore input format From 0c835671b516a6c6558d833074dbc5db3c7279a1 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:49:10 +0100 Subject: [PATCH 40/68] Update MultiQC config for Illumina --- assets/multiqc_config_illumina.yaml | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index c407d97b..53ec5f3d 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -106,6 +106,8 @@ module_order: report_section_order: summary_assembly_metrics: before: summary_variants_metrics + amplicon_heatmap: + before: summary_assembly_metrics ivar_variants: before: mosdepth software_versions: @@ -122,6 +124,39 @@ extra_fn_clean_exts: # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: + amplicon_heatmap: + section_name: "Amplicon coverage heatmap" + description: "Heatmap to show median log10(coverage+1) per amplicon across samples." + plot_type: "heatmap" + pconfig: + id: "amplicon_heatmap" + xTitle: "Amplicon" + namespace: "Heatmap to show median log10(coverage+1) per amplicon across samples" + square: False + colstops: + [ + [0, "#440154"], + [0.05, "#471365"], + [0.1, "#482475"], + [0.15, "#463480"], + [0.2, "#414487"], + [0.25, "#3b528b"], + [0.3, "#355f8d"], + [0.35, "#2f6c8e"], + [0.4, "#2a788e"], + [0.45, "#25848e"], + [0.5, "#21918c"], + [0.55, "#1e9c89"], + [0.6, "#22a884"], + [0.65, "#2fb47c"], + [0.7, "#44bf70"], + [0.75, "#5ec962"], + [0.8, "#7ad151"], + [0.85, "#9bd93c"], + [0.9, "#bddf26"], + [0.95, "#dfe318"], + [1, "#fde725"], + ] summary_variants_metrics: section_name: "Variant calling metrics" description: "generated by the nf-core/viralrecon pipeline" From 3505e2801f74b9250d2de4bc5acd67a3a9f34b90 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:49:17 +0100 Subject: [PATCH 41/68] Update MultiQC config for Nanopore --- assets/multiqc_config_nanopore.yaml | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/assets/multiqc_config_nanopore.yaml b/assets/multiqc_config_nanopore.yaml index 3d0135e0..b94bbf88 100644 --- a/assets/multiqc_config_nanopore.yaml +++ b/assets/multiqc_config_nanopore.yaml @@ -33,6 +33,8 @@ module_order: - "./quast/*.tsv" report_section_order: + amplicon_heatmap: + before: summary_variants_metrics software_versions: order: -1001 nf-core-viralrecon-summary: @@ -46,6 +48,39 @@ extra_fn_clean_exts: # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: + amplicon_heatmap: + section_name: "Amplicon coverage heatmap" + description: "Heatmap to show median log10(coverage+1) per amplicon across samples." + plot_type: "heatmap" + pconfig: + id: "amplicon_heatmap" + xTitle: "Amplicon" + namespace: "Heatmap to show median log10(coverage+1) per amplicon across samples" + square: False + colstops: + [ + [0, "#440154"], + [0.05, "#471365"], + [0.1, "#482475"], + [0.15, "#463480"], + [0.2, "#414487"], + [0.25, "#3b528b"], + [0.3, "#355f8d"], + [0.35, "#2f6c8e"], + [0.4, "#2a788e"], + [0.45, "#25848e"], + [0.5, "#21918c"], + [0.55, "#1e9c89"], + [0.6, "#22a884"], + [0.65, "#2fb47c"], + [0.7, "#44bf70"], + [0.75, "#5ec962"], + [0.8, "#7ad151"], + [0.85, "#9bd93c"], + [0.9, "#bddf26"], + [0.95, "#dfe318"], + [1, "#fde725"], + ] fail_barcodes_no_sample: section_name: "WARNING: Barcodes without sample id" description: "List of barcodes that appear to have reads in the '--fastq_dir' folder but were not specified in mappings samplesheet via '--input'." From e55c776043c279127f3265e8c1bac8349a90976b Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:49:33 +0100 Subject: [PATCH 42/68] Update MultiQC module for Illumina --- modules/local/multiqc_illumina.nf | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index 075dd796..f09b498f 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -24,6 +24,7 @@ process MULTIQC { path workflow_summary path fail_reads_summary path fail_mapping_summary + path 'amplicon_heatmap_mqc.tsv' path ('fastqc/*') path ('fastp/*') path ('kraken2/*') From 56d5adca824a9836d0a46b2bde91c10fa1c1f2e8 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:49:41 +0100 Subject: [PATCH 43/68] Update MultiQC module for Nanopore --- modules/local/multiqc_nanopore.nf | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/local/multiqc_nanopore.nf b/modules/local/multiqc_nanopore.nf index c4421d27..6ab8794e 100644 --- a/modules/local/multiqc_nanopore.nf +++ b/modules/local/multiqc_nanopore.nf @@ -26,6 +26,7 @@ process MULTIQC { path fail_no_barcode_samples path fail_barcode_count_samples path fail_guppyplex_count_samples + path 'amplicon_heatmap_mqc.tsv' path ('pycoqc/*') path ('artic_minion/*') path ('samtools_stats/*') From 03c73602fc3e564605781695b882b0b0e3eb87d1 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:49:57 +0100 Subject: [PATCH 44/68] Export amplicon heatmap from R script --- bin/plot_mosdepth_regions.r | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/bin/plot_mosdepth_regions.r b/bin/plot_mosdepth_regions.r index 558d1733..7db55824 100755 --- a/bin/plot_mosdepth_regions.r +++ b/bin/plot_mosdepth_regions.r @@ -170,6 +170,11 @@ if (ncol(dat) == 6 && length(INPUT_FILES) > 1) { pdf(file=outfile, height=height, width=width) draw(heatmap, heatmap_legend_side="bottom") dev.off() + + ## Write heatmap to file + mat <- mat[row_order(heatmap),] + outfile <- paste(OUTDIR,"all_samples.",OUTSUFFIX,".heatmap.tsv", sep='') + write.table(cbind(sample = rownames(mat), mat), file=outfile, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE) } ################################################ From dd1cdf4b9611b7b35108813db062de5964fb5422 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:50:08 +0100 Subject: [PATCH 45/68] Change output channel names --- modules/local/plot_mosdepth_regions.nf | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/modules/local/plot_mosdepth_regions.nf b/modules/local/plot_mosdepth_regions.nf index 2af85c52..66dc082d 100644 --- a/modules/local/plot_mosdepth_regions.nf +++ b/modules/local/plot_mosdepth_regions.nf @@ -21,8 +21,10 @@ process PLOT_MOSDEPTH_REGIONS { path beds output: - path '*.pdf', emit: pdf - path '*.tsv', emit: tsv + path '*coverage.pdf', emit: coverage_pdf + path '*coverage.tsv', emit: coverage_tsv + path '*heatmap.pdf' , optional:true, emit: heatmap_pdf + path '*heatmap.tsv' , optional:true, emit: heatmap_tsv script: def prefix = options.suffix ?: "mosdepth" From 0597bad79815dd0575c45bb659b35c71d69f80a1 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:50:19 +0100 Subject: [PATCH 46/68] Pass heatmap to MultiQC --- workflows/illumina.nf | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 96853a74..fcbbf1b7 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -397,7 +397,8 @@ workflow ILLUMINA { // // MODULE: Genome-wide and amplicon-specific coverage QC plots // - ch_mosdepth_multiqc = Channel.empty() + ch_mosdepth_multiqc = Channel.empty() + ch_amplicon_heatmap_multiqc = Channel.empty() if (!params.skip_variants && !params.skip_mosdepth) { MOSDEPTH_GENOME ( @@ -422,6 +423,7 @@ workflow ILLUMINA { PLOT_MOSDEPTH_REGIONS_AMPLICON ( MOSDEPTH_AMPLICON.out.regions_bed.collect { it[1] } ) + ch_amplicon_heatmap_multiqc = PLOT_MOSDEPTH_REGIONS_AMPLICON.out.heatmap_tsv } } @@ -657,6 +659,7 @@ workflow ILLUMINA { ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'), ch_fail_reads_multiqc.ifEmpty([]), ch_fail_mapping_multiqc.ifEmpty([]), + ch_amplicon_heatmap_multiqc.ifEmpty([]), FASTQC_FASTP.out.fastqc_raw_zip.collect{it[1]}.ifEmpty([]), FASTQC_FASTP.out.trim_json.collect{it[1]}.ifEmpty([]), ch_kraken2_multiqc.collect{it[1]}.ifEmpty([]), From 140e6eea8bb73da582edc1d23d744499aea1ddea Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 21:50:27 +0100 Subject: [PATCH 47/68] Pass heatmap to MultiQC --- workflows/nanopore.nf | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 7d01b88a..29296115 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -325,7 +325,8 @@ workflow NANOPORE { // // MODULE: Genome-wide and amplicon-specific coverage QC plots // - ch_mosdepth_multiqc = Channel.empty() + ch_mosdepth_multiqc = Channel.empty() + ch_amplicon_heatmap_multiqc = Channel.empty() if (!params.skip_mosdepth) { MOSDEPTH_GENOME ( @@ -349,6 +350,7 @@ workflow NANOPORE { PLOT_MOSDEPTH_REGIONS_AMPLICON ( MOSDEPTH_AMPLICON.out.regions_bed.collect { it[1] } ) + ch_amplicon_heatmap_multiqc = PLOT_MOSDEPTH_REGIONS_AMPLICON.out.heatmap_tsv } // @@ -483,6 +485,7 @@ workflow NANOPORE { ch_custom_no_barcodes_multiqc.ifEmpty([]), MULTIQC_CUSTOM_FAIL_BARCODE_COUNT.out.ifEmpty([]), MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT.out.ifEmpty([]), + ch_amplicon_heatmap_multiqc.ifEmpty([]), PYCOQC.out.json.collect().ifEmpty([]), ARTIC_MINION.out.json.collect{it[1]}.ifEmpty([]), FILTER_BAM_SAMTOOLS.out.flagstat.collect{it[1]}.ifEmpty([]), From 1ff7724b1388a9bee348da1ad2e1ed5816f7772d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 22:04:48 +0100 Subject: [PATCH 48/68] Update CHANGELOG --- CHANGELOG.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 646d9f24..675c6c66 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * Added docs about structure of data required for running Nanopore data * Added docs about using other primer sets for Illumina data * Added docs about overwriting default container definitions to use latest versions e.g. Pangolin +* [[#196](https://github.com/nf-core/viralrecon/issues/196)] - Add mosdepth heatmap to MultiQC report +* [[#198](https://github.com/nf-core/viralrecon/issues/198)] - ASCIIGenome failing during analysis +* [[#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work ### Parameters From 403cfcd47b5b6a5a68e88a5ca9b91318c9c5ecd7 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:09:20 +0100 Subject: [PATCH 49/68] Adjust asciigenome module to take genome sizes file --- modules/local/asciigenome.nf | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/modules/local/asciigenome.nf b/modules/local/asciigenome.nf index d9c6acb3..6f83c915 100644 --- a/modules/local/asciigenome.nf +++ b/modules/local/asciigenome.nf @@ -11,16 +11,17 @@ process ASCIIGENOME { mode: params.publish_dir_mode, saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::asciigenome=1.16.0" : null) + conda (params.enable_conda ? "bioconda::asciigenome=1.16.0 bioconda::bedtools=2.30.0" : null) if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/asciigenome:1.16.0--0" + container "https://depot.galaxyproject.org/singularity/mulled-v2-093691b47d719890dc19ac0c13c4528e9776897f:27211b8c38006480d69eb1be3ef09a7bf0a49d76-0" } else { - container "quay.io/biocontainers/asciigenome:1.16.0--0" + container "quay.io/biocontainers/mulled-v2-093691b47d719890dc19ac0c13c4528e9776897f:27211b8c38006480d69eb1be3ef09a7bf0a49d76-0" } input: tuple val(meta), path(bam), path(vcf) path fasta + path sizes path gff path bed val window @@ -39,13 +40,20 @@ process ASCIIGENOME { """ zcat $vcf \\ | grep -v '#' \\ - | awk -v FS='\t' -v OFS='\t' '{print \$1, (\$2-$window-1), (\$2+$window)}' \\ + | awk -v FS='\t' -v OFS='\t' '{print \$1, (\$2-1), (\$2)}' \\ > variants.bed + bedtools \\ + slop \\ + -i variants.bed \\ + -g $sizes \\ + -b $window \\ + > variants.slop.bed + ASCIIGenome \\ -ni \\ -x "trackHeight 0 bam#1 && trackHeight $track_height bam@2 $paired_end && filterVariantReads && save ${prefix}.%r.pdf" \\ - --batchFile variants.bed \\ + --batchFile variants.slop.bed \\ --fasta $fasta \\ $bam \\ $vcf \\ From 9379631a9e5858fed67465c2be4b3f0b012879fc Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:09:50 +0100 Subject: [PATCH 50/68] Create genome sizes file if running ASCIIGenome --- subworkflows/local/prepare_genome_illumina.nf | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/subworkflows/local/prepare_genome_illumina.nf b/subworkflows/local/prepare_genome_illumina.nf index 35f61fd6..1d1053d7 100644 --- a/subworkflows/local/prepare_genome_illumina.nf +++ b/subworkflows/local/prepare_genome_illumina.nf @@ -23,6 +23,7 @@ include { UNTAR as UNTAR_BLAST_DB } from '../../modules/nf-core/software/un include { BOWTIE2_BUILD } from '../../modules/nf-core/software/bowtie2/build/main' addParams( options: params.bowtie2_build_options ) include { BLAST_MAKEBLASTDB } from '../../modules/nf-core/software/blast/makeblastdb/main' addParams( options: params.makeblastdb_options ) include { BEDTOOLS_GETFASTA } from '../../modules/nf-core/software/bedtools/getfasta/main' addParams( options: params.bedtools_getfasta_options ) +include { GET_CHROM_SIZES } from '../../modules/local/get_chrom_sizes' addParams( options: params.genome_options ) include { COLLAPSE_PRIMERS } from '../../modules/local/collapse_primers' addParams( options: params.collapse_primers_options ) include { KRAKEN2_BUILD } from '../../modules/local/kraken2_build' addParams( options: params.kraken2_build_options ) include { SNPEFF_BUILD } from '../../modules/local/snpeff_build' addParams( options: params.snpeff_build_options ) @@ -55,6 +56,14 @@ workflow PREPARE_GENOME { ch_gff = dummy_file } + // + // Create chromosome sizes file + // + ch_chrom_sizes = Channel.empty() + if (!params.skip_asciigenome) { + ch_chrom_sizes = GET_CHROM_SIZES ( ch_fasta ).sizes + } + // // Prepare reference files required for variant calling // @@ -151,6 +160,7 @@ workflow PREPARE_GENOME { emit: fasta = ch_fasta // path: genome.fasta gff = ch_gff // path: genome.gff + chrom_sizes = ch_chrom_sizes // path: genome.sizes bowtie2_index = ch_bowtie2_index // path: bowtie2/index/ primer_bed = ch_primer_bed // path: primer.bed primer_collapsed_bed = ch_primer_collapsed_bed // path: primer.collapsed.bed From a0c97d906ffc88bd32d534205f8d2cb27028fdfd Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:09:56 +0100 Subject: [PATCH 51/68] Create genome sizes file if running ASCIIGenome --- subworkflows/local/prepare_genome_nanopore.nf | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/subworkflows/local/prepare_genome_nanopore.nf b/subworkflows/local/prepare_genome_nanopore.nf index dfe7442a..8d780467 100644 --- a/subworkflows/local/prepare_genome_nanopore.nf +++ b/subworkflows/local/prepare_genome_nanopore.nf @@ -10,6 +10,7 @@ include { GUNZIP as GUNZIP_FASTA GUNZIP as GUNZIP_GFF GUNZIP as GUNZIP_PRIMER_BED } from '../../modules/nf-core/software/gunzip/main' addParams( options: params.genome_options ) +include { GET_CHROM_SIZES } from '../../modules/local/get_chrom_sizes' addParams( options: params.genome_options ) include { COLLAPSE_PRIMERS } from '../../modules/local/collapse_primers' addParams( options: params.collapse_primers_options ) include { SNPEFF_BUILD } from '../../modules/local/snpeff_build' addParams( options: params.snpeff_build_options ) @@ -41,6 +42,14 @@ workflow PREPARE_GENOME { ch_gff = dummy_file } + // + // Create chromosome sizes file + // + ch_chrom_sizes = Channel.empty() + if (!params.skip_asciigenome) { + ch_chrom_sizes = GET_CHROM_SIZES ( ch_fasta ).sizes + } + // // Uncompress primer BED file // @@ -75,6 +84,7 @@ workflow PREPARE_GENOME { emit: fasta = ch_fasta // path: genome.fasta gff = ch_gff // path: genome.gff + chrom_sizes = ch_chrom_sizes // path: genome.sizes primer_bed = ch_primer_bed // path: primer.bed primer_collapsed_bed = ch_primer_collapsed_bed // path: primer.collapsed.bed snpeff_db = ch_snpeff_db // path: snpeff_db From c74b33735b86ca1024d54f2287b51168272b1c38 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:10:10 +0100 Subject: [PATCH 52/68] Adjust channels to add sizes file --- workflows/illumina.nf | 2 ++ 1 file changed, 2 insertions(+) diff --git a/workflows/illumina.nf b/workflows/illumina.nf index fcbbf1b7..e6bcb72f 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -441,6 +441,7 @@ workflow ILLUMINA { VARIANTS_IVAR ( ch_bam, PREPARE_GENOME.out.fasta, + PREPARE_GENOME.out.chrom_sizes, params.gff ? PREPARE_GENOME.out.gff : [], (params.protocol == 'amplicon' && params.primer_bed) ? PREPARE_GENOME.out.primer_bed : [], PREPARE_GENOME.out.snpeff_db, @@ -496,6 +497,7 @@ workflow ILLUMINA { VARIANTS_BCFTOOLS ( ch_bam, PREPARE_GENOME.out.fasta, + PREPARE_GENOME.out.chrom_sizes, params.gff ? PREPARE_GENOME.out.gff : [], (params.protocol == 'amplicon' && params.primer_bed) ? PREPARE_GENOME.out.primer_bed : [], PREPARE_GENOME.out.snpeff_db, From 91b540d012405a108b8aad92116ead7546812d9b Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:10:13 +0100 Subject: [PATCH 53/68] Adjust channels to add sizes file --- workflows/nanopore.nf | 1 + 1 file changed, 1 insertion(+) diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 29296115..336d0e9e 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -446,6 +446,7 @@ workflow NANOPORE { ASCIIGENOME ( ch_asciigenome, PREPARE_GENOME.out.fasta, + PREPARE_GENOME.out.chrom_sizes, params.gff ? PREPARE_GENOME.out.gff : [], PREPARE_GENOME.out.primer_bed, params.asciigenome_window_size, From c0336b4fb89650058b85c40db3441298ebfdccdf Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:10:31 +0100 Subject: [PATCH 54/68] Use sizes file when calling ASCIIGenome --- subworkflows/local/variants_bcftools.nf | 2 ++ 1 file changed, 2 insertions(+) diff --git a/subworkflows/local/variants_bcftools.nf b/subworkflows/local/variants_bcftools.nf index 22f84c07..12717936 100644 --- a/subworkflows/local/variants_bcftools.nf +++ b/subworkflows/local/variants_bcftools.nf @@ -31,6 +31,7 @@ workflow VARIANTS_BCFTOOLS { take: bam // channel: [ val(meta), [ bam ] ] fasta // channel: /path/to/genome.fasta + sizes // channel: /path/to/genome.sizes gff // channel: /path/to/genome.gff bed // channel: /path/to/primers.bed snpeff_db // channel: /path/to/snpeff_db/ @@ -128,6 +129,7 @@ workflow VARIANTS_BCFTOOLS { ASCIIGENOME ( ch_asciigenome, fasta, + sizes, gff, bed, params.asciigenome_window_size, From 58a67c41e5197648e6929015638f4861f0531b44 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:10:36 +0100 Subject: [PATCH 55/68] Use sizes file when calling ASCIIGenome --- subworkflows/local/variants_ivar.nf | 2 ++ 1 file changed, 2 insertions(+) diff --git a/subworkflows/local/variants_ivar.nf b/subworkflows/local/variants_ivar.nf index 66afa779..26f5e449 100644 --- a/subworkflows/local/variants_ivar.nf +++ b/subworkflows/local/variants_ivar.nf @@ -34,6 +34,7 @@ workflow VARIANTS_IVAR { take: bam // channel: [ val(meta), [ bam ] ] fasta // channel: /path/to/genome.fasta + sizes // channel: /path/to/genome.sizes gff // channel: /path/to/genome.gff bed // channel: /path/to/primers.bed snpeff_db // channel: /path/to/snpeff_db/ @@ -141,6 +142,7 @@ workflow VARIANTS_IVAR { ASCIIGENOME ( ch_asciigenome, fasta, + sizes, gff, bed, params.asciigenome_window_size, From ec5dce504f5bdcfb03c06eff12c8f3cc7c263da2 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 10 Jun 2021 23:10:54 +0100 Subject: [PATCH 56/68] Add local module to get chromosome sizes --- modules/local/get_chrom_sizes.nf | 34 ++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 modules/local/get_chrom_sizes.nf diff --git a/modules/local/get_chrom_sizes.nf b/modules/local/get_chrom_sizes.nf new file mode 100644 index 00000000..e20fb395 --- /dev/null +++ b/modules/local/get_chrom_sizes.nf @@ -0,0 +1,34 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process GET_CHROM_SIZES { + tag "$fasta" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'genome', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::samtools=1.10" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/samtools:1.10--h9402c20_2" + } else { + container "quay.io/biocontainers/samtools:1.10--h9402c20_2" + } + + input: + path fasta + + output: + path '*.sizes' , emit: sizes + path '*.fai' , emit: fai + path "*.version.txt", emit: version + + script: + def software = 'samtools' + """ + samtools faidx $fasta + cut -f 1,2 ${fasta}.fai > ${fasta}.sizes + echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' > ${software}.version.txt + """ +} From 5c131acf4e9b32a9389a67610eb0acfbe392b15d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Fri, 11 Jun 2021 17:27:54 +0100 Subject: [PATCH 57/68] Add Pangolin results to MultiQC report --- assets/headers/ivar_variants_header_mqc.txt | 2 +- assets/multiqc_config_illumina.yaml | 26 ++++++- assets/multiqc_config_nanopore.yaml | 79 +++++++++++--------- bin/multiqc_to_custom_csv.py | 9 +-- lib/WorkflowCommons.groovy | 22 +++--- modules/local/multiqc_custom_csv_from_map.nf | 27 +++++++ modules/local/multiqc_illumina.nf | 4 +- modules/local/multiqc_nanopore.nf | 9 ++- workflows/illumina.nf | 24 +++--- workflows/nanopore.nf | 16 ++-- 10 files changed, 133 insertions(+), 85 deletions(-) create mode 100644 modules/local/multiqc_custom_csv_from_map.nf diff --git a/assets/headers/ivar_variants_header_mqc.txt b/assets/headers/ivar_variants_header_mqc.txt index 8dcb12ef..a4678706 100644 --- a/assets/headers/ivar_variants_header_mqc.txt +++ b/assets/headers/ivar_variants_header_mqc.txt @@ -1,5 +1,5 @@ #id: 'ivar_variants' -#section_name: 'VARIANTS: iVar variant counts' +#section_name: 'VARIANTS: Total variants (iVar)' #description: "is calculated from the total number of variants called by # iVar." #plot_type: 'bargraph' diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index 53ec5f3d..124894c5 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -110,6 +110,10 @@ report_section_order: before: summary_assembly_metrics ivar_variants: before: mosdepth + ivar_pangolin_lineage: + after: bcftools_ivar + bcftools_pangolin_lineage: + after: bcftools_bcftools software_versions: order: -1001 nf-core-viralrecon-summary: @@ -124,6 +128,24 @@ extra_fn_clean_exts: # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: + ivar_pangolin_lineage: + section_name: "VARIANTS: Pangolin (iVar)" + description: "This section of the report shows Pangolin lineage analysis results for variants called by iVar." + plot_type: "table" + pconfig: + id: "ivar_pangolin_lineage_table" + table_title: "Pangolin lineage assignment" + namespace: "Pangolin lineage assignment" + scale: False + bcftools_pangolin_lineage: + section_name: "VARIANTS: Pangolin (BCFTools)" + description: "This section of the report shows Pangolin lineage analysis results for variants called by BCFTools." + plot_type: "table" + pconfig: + id: "bcftools_pangolin_lineage_table" + table_title: "Pangolin lineage assignment" + namespace: "Pangolin lineage assignment" + scale: False amplicon_heatmap: section_name: "Amplicon coverage heatmap" description: "Heatmap to show median log10(coverage+1) per amplicon across samples." @@ -203,8 +225,6 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (iVar)": description: "Pangolin lineage inferred from the consensus sequence generated by iVar" - "Pangolin scorpio call (iVar)": - description: "Pangolin scorpio call inferred from the consensus sequence generated by iVar" "# SNPs (BCFTools)": description: "Total number of SNPs called by BCFTools" format: "{:,.0f}" @@ -219,8 +239,6 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (BCFTools)": description: "Pangolin lineage inferred from the consensus sequence generated by BCFTools" - "Pangolin scorpio call (BCFTools)": - description: "Pangolin scorpio call inferred from the consensus sequence generated by BCFTools" pconfig: id: "summary_variants_metrics_plot" table_title: "Variant calling metrics" diff --git a/assets/multiqc_config_nanopore.yaml b/assets/multiqc_config_nanopore.yaml index b94bbf88..c8e39ba4 100644 --- a/assets/multiqc_config_nanopore.yaml +++ b/assets/multiqc_config_nanopore.yaml @@ -35,6 +35,8 @@ module_order: report_section_order: amplicon_heatmap: before: summary_variants_metrics + pangolin_lineage: + after: amplicon_heatmap software_versions: order: -1001 nf-core-viralrecon-summary: @@ -48,39 +50,6 @@ extra_fn_clean_exts: # See https://github.com/ewels/MultiQC_TestData/blob/master/data/custom_content/with_config/table_headerconfig/multiqc_config.yaml custom_data: - amplicon_heatmap: - section_name: "Amplicon coverage heatmap" - description: "Heatmap to show median log10(coverage+1) per amplicon across samples." - plot_type: "heatmap" - pconfig: - id: "amplicon_heatmap" - xTitle: "Amplicon" - namespace: "Heatmap to show median log10(coverage+1) per amplicon across samples" - square: False - colstops: - [ - [0, "#440154"], - [0.05, "#471365"], - [0.1, "#482475"], - [0.15, "#463480"], - [0.2, "#414487"], - [0.25, "#3b528b"], - [0.3, "#355f8d"], - [0.35, "#2f6c8e"], - [0.4, "#2a788e"], - [0.45, "#25848e"], - [0.5, "#21918c"], - [0.55, "#1e9c89"], - [0.6, "#22a884"], - [0.65, "#2fb47c"], - [0.7, "#44bf70"], - [0.75, "#5ec962"], - [0.8, "#7ad151"], - [0.85, "#9bd93c"], - [0.9, "#bddf26"], - [0.95, "#dfe318"], - [1, "#fde725"], - ] fail_barcodes_no_sample: section_name: "WARNING: Barcodes without sample id" description: "List of barcodes that appear to have reads in the '--fastq_dir' folder but were not specified in mappings samplesheet via '--input'." @@ -116,6 +85,48 @@ custom_data: table_title: "Samples failed artic guppyplex read count threshold" namespace: "Samples failed artic guppyplex read count threshold" format: "{:,.0f}" + pangolin_lineage: + section_name: "Pangolin" + description: "Results generated from Pangolin lineage analysis" + plot_type: "table" + pconfig: + id: "pangolin_lineage_table" + table_title: "Pangolin lineage assignment" + namespace: "Pangolin lineage assignment" + scale: False + amplicon_heatmap: + section_name: "Amplicon coverage heatmap" + description: "Heatmap to show median log10(coverage+1) per amplicon across samples." + plot_type: "heatmap" + pconfig: + id: "amplicon_heatmap" + xTitle: "Amplicon" + namespace: "Heatmap to show median log10(coverage+1) per amplicon across samples" + square: False + colstops: + [ + [0, "#440154"], + [0.05, "#471365"], + [0.1, "#482475"], + [0.15, "#463480"], + [0.2, "#414487"], + [0.25, "#3b528b"], + [0.3, "#355f8d"], + [0.35, "#2f6c8e"], + [0.4, "#2a788e"], + [0.45, "#25848e"], + [0.5, "#21918c"], + [0.55, "#1e9c89"], + [0.6, "#22a884"], + [0.65, "#2fb47c"], + [0.7, "#44bf70"], + [0.75, "#5ec962"], + [0.8, "#7ad151"], + [0.85, "#9bd93c"], + [0.9, "#bddf26"], + [0.95, "#dfe318"], + [1, "#fde725"], + ] summary_variants_metrics: section_name: "Variant calling metrics" description: "generated by the nf-core/viralrecon pipeline" @@ -147,8 +158,6 @@ custom_data: format: "{:,.2f}" "Pangolin lineage": description: "Pangolin lineage inferred from the consensus sequence generated by artic minion" - "Pangolin scorpio call": - description: "Pangolin scorpio call inferred from the consensus sequence generated by artic minion" pconfig: id: "summary_variants_metrics_plot_table" table_title: "Variant calling metrics" diff --git a/bin/multiqc_to_custom_csv.py b/bin/multiqc_to_custom_csv.py index 895e611f..c4d02586 100755 --- a/bin/multiqc_to_custom_csv.py +++ b/bin/multiqc_to_custom_csv.py @@ -138,14 +138,12 @@ def main(args=None): ('# INDELs (iVar)', ['number_of_indels'])]), ('multiqc_snpeff_snpeff_ivar.yaml', [('# Missense variants (iVar)', ['MISSENSE'])]), ('multiqc_quast_quast_ivar.yaml', [('# Ns per 100kb consensus (iVar)', ["# N's per 100 kbp"])]), - ('multiqc_ivar_pangolin_lineage.yaml', [('Pangolin lineage (iVar)', ["Lineage"]), - ('Pangolin scorpio call (iVar)', ["Scorpio call"])]), + ('multiqc_variants:_pangolin_(ivar).yaml', [('Pangolin lineage (iVar)', ["lineage"])]), ('multiqc_bcftools_stats_bcftools_bcftools.yaml', [('# SNPs (BCFTools)', ['number_of_SNPs']), ('# INDELs (BCFTools)', ['number_of_indels'])]), ('multiqc_snpeff_snpeff_bcftools.yaml', [('# Missense variants (BCFTools)', ['MISSENSE'])]), ('multiqc_quast_quast_bcftools.yaml', [('# Ns per 100kb consensus (BCFTools)', ["# N's per 100 kbp"])]), - ('multiqc_bcftools_pangolin_lineage.yaml', [('Pangolin lineage (BCFTools)', ["Lineage"]), - ('Pangolin scorpio call (BCFTools)', ["Scorpio call"])]) + ('multiqc_variants:_pangolin_(bcftools).yaml', [('Pangolin lineage (BCFTools)', ["lineage"])]) ] illumina_assembly_files = [ @@ -175,8 +173,7 @@ def main(args=None): ('# INDELs', ['number_of_indels'])]), ('multiqc_snpeff.yaml', [('# Missense variants', ['MISSENSE'])]), ('multiqc_quast.yaml', [('# Ns per 100kb consensus', ["# N's per 100 kbp"])]), - ('multiqc_pangolin_lineage.yaml', [('Pangolin lineage', ["Lineage"]), - ('Pangolin scorpio call', ["Scorpio call"])]) + ('multiqc_pangolin.yaml', [('Pangolin lineage', ["lineage"])]) ] if args.PLATFORM == 'illumina': diff --git a/lib/WorkflowCommons.groovy b/lib/WorkflowCommons.groovy index 218f9bf1..801824c1 100755 --- a/lib/WorkflowCommons.groovy +++ b/lib/WorkflowCommons.groovy @@ -74,26 +74,24 @@ class WorkflowCommons { } // - // Function to get field entry from Pangolin output file + // Function to read in all fields into a Groovy Map from Pangolin output file // // See: https://stackoverflow.com/a/67766919 - public static String getFieldFromPangolinReport(pangolin_report, col_name) { - def headers = [] - def field = '' + public static Map getPangolinFieldMap(pangolin_report, log) { + def headers = [] + def field_map = [:] pangolin_report.readLines().eachWithIndex { row, row_index -> + def vals = row.split(',') if (row_index == 0) { - headers = row.split(',') + headers = vals } else { - def col_map = [:] - def cells = row.split(',').eachWithIndex { cell, cell_index -> - col_map[headers[cell_index]] = cell - } - if (col_map.containsKey(col_name)) { - field = col_map[col_name] + def cells = headers.eachWithIndex { header, header_index -> + def val = (header_index <= vals.size()-1) ? vals[header_index] : '' + field_map[header] = val ?: 'NA' } } } - return field + return field_map } // diff --git a/modules/local/multiqc_custom_csv_from_map.nf b/modules/local/multiqc_custom_csv_from_map.nf new file mode 100644 index 00000000..49968fd4 --- /dev/null +++ b/modules/local/multiqc_custom_csv_from_map.nf @@ -0,0 +1,27 @@ +// Import generic module functions +include { saveFiles; getSoftwareName } from './functions' + +params.options = [:] + +process MULTIQC_CUSTOM_CSV_FROM_MAP { + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + memory 100.MB + + input: + val csv_data + val out_prefix + + output: + path "*.csv" + + exec: + // Write to file + def file = task.workDir.resolve("${out_prefix}_mqc.csv") + file.write csv_data[0].keySet().join(",") + '\n' + csv_data.each { data -> + file.append(data.values().join(",") + '\n') + } +} diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index f09b498f..de9fce4c 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -73,6 +73,6 @@ process MULTIQC { fi ## Run MultiQC a second time - multiqc -f $options.args -e general_stats --ignore *pangolin_lineage_mqc.tsv $custom_config . + multiqc -f $options.args -e general_stats $custom_config . """ -} +} \ No newline at end of file diff --git a/modules/local/multiqc_nanopore.nf b/modules/local/multiqc_nanopore.nf index 6ab8794e..1da091cd 100644 --- a/modules/local/multiqc_nanopore.nf +++ b/modules/local/multiqc_nanopore.nf @@ -34,7 +34,7 @@ process MULTIQC { path ('mosdepth/*') path ('quast/*') path ('snpeff/*') - path ('pangolin/*') + path pangolin_lineage output: path "*multiqc_report.html", emit: report @@ -46,8 +46,13 @@ process MULTIQC { def software = getSoftwareName(task.process) def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' """ + ## Run MultiQC once to parse tool logs multiqc -f $options.args $custom_config . + + ## Parse YAML files dumped by MultiQC to obtain metrics multiqc_to_custom_csv.py --platform nanopore - multiqc -f $options.args -e general_stats --ignore *pangolin_lineage_mqc.tsv $custom_config . + + ## Run MultiQC a second time + multiqc -f $options.args -e general_stats $custom_config . """ } diff --git a/workflows/illumina.nf b/workflows/illumina.nf index e6bcb72f..4f50dfd5 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -70,12 +70,12 @@ include { BCFTOOLS_ISEC } from '../modules/local/bcftools_isec' include { CUTADAPT } from '../modules/local/cutadapt' addParams( options: modules['illumina_cutadapt'] ) include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules @@ -470,15 +470,13 @@ workflow ILLUMINA { // ch_ivar_pangolin_report .map { meta, report -> - def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') - def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') - return [ "$meta.id\t$lineage\t$scorpio_call" ] + def fields = WorkflowCommons.getPangolinFieldMap(report, log) + return [sample:meta.id] << fields } .set { ch_ivar_pangolin_multiqc } - MULTIQC_CUSTOM_TSV_IVAR_PANGOLIN ( + MULTIQC_CUSTOM_CSV_IVAR_PANGOLIN ( ch_ivar_pangolin_multiqc.collect(), - 'Sample\tLineage\tScorpio call', 'ivar_pangolin_lineage' ) .set { ch_ivar_pangolin_multiqc } @@ -523,15 +521,13 @@ workflow ILLUMINA { // ch_bcftools_pangolin_report .map { meta, report -> - def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') - def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') - return [ "$meta.id\t$lineage\t$scorpio_call" ] + def fields = WorkflowCommons.getPangolinFieldMap(report, log) + return [sample:meta.id] << fields } .set { ch_bcftools_pangolin_multiqc } - MULTIQC_CUSTOM_TSV_BCFTOOLS_PANGOLIN ( + MULTIQC_CUSTOM_CSV_BCFTOOLS_PANGOLIN ( ch_bcftools_pangolin_multiqc.collect(), - 'Sample\tLineage\tScorpio call', 'bcftools_pangolin_lineage' ) .set { ch_bcftools_pangolin_multiqc } diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index 336d0e9e..d220e1d7 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -51,15 +51,15 @@ def modules = params.modules.clone() def multiqc_options = modules['nanopore_multiqc'] multiqc_options.args += params.multiqc_title ? Utils.joinModuleArgs(["--title \"$params.multiqc_title\""]) : '' -include { ASCIIGENOME } from '../modules/local/asciigenome' addParams( options: modules['nanopore_asciigenome'] ) -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) -include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) +include { ASCIIGENOME } from '../modules/local/asciigenome' addParams( options: modules['nanopore_asciigenome'] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['tsv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) +include { MULTIQC_CUSTOM_CSV_FROM_MAP } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_PANGOLIN } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) @@ -370,15 +370,13 @@ workflow NANOPORE { .out .report .map { meta, report -> - def lineage = WorkflowCommons.getFieldFromPangolinReport(report, 'lineage') - def scorpio_call = WorkflowCommons.getFieldFromPangolinReport(report, 'scorpio_call') - return [ "$meta.id\t$lineage\t$scorpio_call" ] + def fields = WorkflowCommons.getPangolinFieldMap(report, log) + return [sample:meta.id] << fields } .set { ch_pangolin_multiqc } - MULTIQC_CUSTOM_PANGOLIN ( + MULTIQC_CUSTOM_CSV_FROM_MAP ( ch_pangolin_multiqc.collect(), - 'Sample\tLineage\tScorpio call', 'pangolin_lineage' ) .set { ch_pangolin_multiqc } From aaec38e34095741bafe18d71410fc747b56ab8e5 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Fri, 11 Jun 2021 17:29:35 +0100 Subject: [PATCH 58/68] Update CHANGELOG --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 675c6c66..1691e9e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,17 +3,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-11 +## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-14 ### Enhancements & fixes * Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs) -* Added Pangolin VOC scorpio calls to default variant calling summary metrics -* Dashes in sample names will be converted to underscores to avoid issues when creating the summary metrics via QUAST +* Added Pangolin results to MultiQC report * Add warning to MultiQC report for samples that have no reads after adapter trimming * Added docs about structure of data required for running Nanopore data * Added docs about using other primer sets for Illumina data * Added docs about overwriting default container definitions to use latest versions e.g. Pangolin +* Dashes and spaces in sample names will be converted to underscores to avoid issues when creating the summary metrics * [[#196](https://github.com/nf-core/viralrecon/issues/196)] - Add mosdepth heatmap to MultiQC report * [[#198](https://github.com/nf-core/viralrecon/issues/198)] - ASCIIGenome failing during analysis * [[#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work From 0820b1701e897dc0e7cf332adf532d3dc27e9b82 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Sun, 13 Jun 2021 09:48:21 +0100 Subject: [PATCH 59/68] Add QUAST metrics to MultiQC report --- assets/multiqc_config_illumina.yaml | 12 ++++++++++++ modules/local/multiqc_illumina.nf | 6 +++++- modules/local/multiqc_nanopore.nf | 3 +++ 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index 124894c5..c920d605 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -69,6 +69,12 @@ module_order: info: "This section of the report shows SnpEff results for variants called by iVar." path_filters: - "./variants_ivar/*.csv" + - quast: + name: "VARIANTS: QUAST (iVar)" + anchor: "quast_ivar" + info: "This section of the report shows QUAST results for consensus sequences generated from variants with iVar." + path_filters: + - "./variants_ivar/*.tsv" - bcftools: name: "VARIANTS: BCFTools (BCFTools)" anchor: "bcftools_bcftools" @@ -81,6 +87,12 @@ module_order: info: "This section of the report shows SnpEff results for variants called by BCFTools." path_filters: - "./variants_bcftools/*.csv" + - quast: + name: "VARIANTS: QUAST (BCFTools)" + anchor: "quast_bcftools" + info: "This section of the report shows QUAST results for consensus sequence generated from BCFTools variants." + path_filters: + - "./variants_bcftools/*.tsv" - cutadapt: name: "ASSEMBLY: Cutadapt (primer trimming)" info: "This section of the report shows Cutadapt results for reads after primer sequence trimming." diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index de9fce4c..fb45939a 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -64,6 +64,7 @@ process MULTIQC { ## Parse YAML files dumped by MultiQC to obtain metrics multiqc_to_custom_csv.py --platform illumina + ## Manually remove files that we don't want in the report if grep -q skip_assembly workflow_summary_mqc.yaml; then rm -f *assembly_metrics_mqc.csv fi @@ -72,7 +73,10 @@ process MULTIQC { rm -f *variants_metrics_mqc.csv fi + rm -f variants_ivar/report.tsv + rm -f variants_bcftools/report.tsv + ## Run MultiQC a second time multiqc -f $options.args -e general_stats $custom_config . """ -} \ No newline at end of file +} diff --git a/modules/local/multiqc_nanopore.nf b/modules/local/multiqc_nanopore.nf index 1da091cd..89ac15cd 100644 --- a/modules/local/multiqc_nanopore.nf +++ b/modules/local/multiqc_nanopore.nf @@ -52,6 +52,9 @@ process MULTIQC { ## Parse YAML files dumped by MultiQC to obtain metrics multiqc_to_custom_csv.py --platform nanopore + ## Manually remove files that we don't want in the report + rm -rf quast + ## Run MultiQC a second time multiqc -f $options.args -e general_stats $custom_config . """ From 2496da0062adc8f1c38552bb9c5ac41ffc61f734 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 11:49:51 +0100 Subject: [PATCH 60/68] Fix snpEff OutofMemory errors --- CHANGELOG.md | 1 + modules/local/snpeff_ann.nf | 16 +++++++++++++--- modules/local/snpeff_build.nf | 19 ++++++++++++++++--- modules/local/snpsift_extractfields.nf | 11 +++++++++-- 4 files changed, 39 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1691e9e0..9b902220 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * [[#196](https://github.com/nf-core/viralrecon/issues/196)] - Add mosdepth heatmap to MultiQC report * [[#198](https://github.com/nf-core/viralrecon/issues/198)] - ASCIIGenome failing during analysis * [[#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work +* [[#204](https://github.com/nf-core/viralrecon/issues/204)] - Memory errors for SNP_EFF step ### Parameters diff --git a/modules/local/snpeff_ann.nf b/modules/local/snpeff_ann.nf index 19ee442a..6c18142f 100644 --- a/modules/local/snpeff_ann.nf +++ b/modules/local/snpeff_ann.nf @@ -18,6 +18,8 @@ process SNPEFF_ANN { container 'quay.io/biocontainers/snpeff:5.0--0' } + cache false + input: tuple val(meta), path(vcf) path db @@ -32,10 +34,18 @@ process SNPEFF_ANN { path '*.version.txt' , emit: version script: - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def avail_mem = 4 + if (!task.memory) { + log.info '[snpEff] Available memory not known - defaulting to 4GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } """ - snpEff ${fasta.baseName} \\ + snpEff \\ + -Xmx${avail_mem}g \\ + ${fasta.baseName} \\ -config $config \\ -dataDir $db \\ $options.args \\ diff --git a/modules/local/snpeff_build.nf b/modules/local/snpeff_build.nf index 0bbddffe..5e61d105 100644 --- a/modules/local/snpeff_build.nf +++ b/modules/local/snpeff_build.nf @@ -28,8 +28,14 @@ process SNPEFF_BUILD { path '*.version.txt', emit: version script: - def software = getSoftwareName(task.process) - def basename = fasta.baseName + def software = getSoftwareName(task.process) + def basename = fasta.baseName + def avail_mem = 4 + if (!task.memory) { + log.info '[snpEff] Available memory not known - defaulting to 4GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } """ mkdir -p snpeff_db/genomes/ cd snpeff_db/genomes/ @@ -43,7 +49,14 @@ process SNPEFF_BUILD { cd ../../ echo "${basename}.genome : ${basename}" > snpeff.config - snpEff build -config snpeff.config -dataDir ./snpeff_db -gff3 -v ${basename} + snpEff \\ + -Xmx${avail_mem}g \\ + build \\ + -config snpeff.config \\ + -dataDir ./snpeff_db \\ + -gff3 \\ + -v \\ + ${basename} echo \$(snpEff -version 2>&1) | sed 's/^.*SnpEff //; s/ .*\$//' > ${software}.version.txt """ diff --git a/modules/local/snpsift_extractfields.nf b/modules/local/snpsift_extractfields.nf index 8e2cf30e..a02c42a5 100644 --- a/modules/local/snpsift_extractfields.nf +++ b/modules/local/snpsift_extractfields.nf @@ -26,10 +26,17 @@ process SNPSIFT_EXTRACTFIELDS { path '*.version.txt' , emit: version script: - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def avail_mem = 4 + if (!task.memory) { + log.info '[SnpSift] Available memory not known - defaulting to 4GB. Specify process memory requirements to change this.' + } else { + avail_mem = task.memory.giga + } """ SnpSift \\ + -Xmx${avail_mem}g \\ extractFields \\ -s "," \\ -e "." \\ From 9c2ca298f7915f84255c3698ac64411fc2472c1a Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 11:51:50 +0100 Subject: [PATCH 61/68] Remove cache statement used for testing --- modules/local/snpeff_ann.nf | 2 -- 1 file changed, 2 deletions(-) diff --git a/modules/local/snpeff_ann.nf b/modules/local/snpeff_ann.nf index 6c18142f..9bd6c15b 100644 --- a/modules/local/snpeff_ann.nf +++ b/modules/local/snpeff_ann.nf @@ -18,8 +18,6 @@ process SNPEFF_ANN { container 'quay.io/biocontainers/snpeff:5.0--0' } - cache false - input: tuple val(meta), path(vcf) path db From cfbd52a018ed0bef25525fcfa6d28499c70c2c73 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 12:06:51 +0100 Subject: [PATCH 62/68] Rename module to create custom MultiQC content --- ...qc_custom_tsv.nf => multiqc_custom_tsv_from_string.nf} | 2 +- workflows/illumina.nf | 8 ++++---- workflows/nanopore.nf | 8 ++++---- 3 files changed, 9 insertions(+), 9 deletions(-) rename modules/local/{multiqc_custom_tsv.nf => multiqc_custom_tsv_from_string.nf} (96%) diff --git a/modules/local/multiqc_custom_tsv.nf b/modules/local/multiqc_custom_tsv_from_string.nf similarity index 96% rename from modules/local/multiqc_custom_tsv.nf rename to modules/local/multiqc_custom_tsv_from_string.nf index dc0f0883..c2ece8bc 100644 --- a/modules/local/multiqc_custom_tsv.nf +++ b/modules/local/multiqc_custom_tsv_from_string.nf @@ -3,7 +3,7 @@ include { saveFiles; getSoftwareName } from './functions' params.options = [:] -process MULTIQC_CUSTOM_TSV { +process MULTIQC_CUSTOM_TSV_FROM_STRING { publishDir "${params.outdir}", mode: params.publish_dir_mode, saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 4f50dfd5..5ec1514c 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -72,10 +72,10 @@ include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index d220e1d7..692477e0 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -56,10 +56,10 @@ include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_vers include { MULTIQC } from '../modules/local/multiqc_nanopore' addParams( options: multiqc_options ) include { MULTIQC_CUSTOM_CSV_FROM_MAP } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) From 2f391085783925b0c10a9de20d541b15390cad79 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 12:51:57 +0100 Subject: [PATCH 63/68] Add Nextclade clade info to summary metrics and MultiQC report - nanopore --- assets/multiqc_config_nanopore.yaml | 2 ++ bin/multiqc_to_custom_csv.py | 3 ++- lib/WorkflowCommons.groovy | 23 ++++++++++++++++++++++- modules/local/multiqc_nanopore.nf | 3 ++- workflows/illumina.nf | 4 ++-- workflows/nanopore.nf | 26 ++++++++++++++++++++++++-- 6 files changed, 54 insertions(+), 7 deletions(-) diff --git a/assets/multiqc_config_nanopore.yaml b/assets/multiqc_config_nanopore.yaml index c8e39ba4..9ca49951 100644 --- a/assets/multiqc_config_nanopore.yaml +++ b/assets/multiqc_config_nanopore.yaml @@ -158,6 +158,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage": description: "Pangolin lineage inferred from the consensus sequence generated by artic minion" + "Nextclade clade": + description: "Nextclade clade inferred from the consensus sequence generated by artic minion" pconfig: id: "summary_variants_metrics_plot_table" table_title: "Variant calling metrics" diff --git a/bin/multiqc_to_custom_csv.py b/bin/multiqc_to_custom_csv.py index c4d02586..1f28a8a9 100755 --- a/bin/multiqc_to_custom_csv.py +++ b/bin/multiqc_to_custom_csv.py @@ -173,7 +173,8 @@ def main(args=None): ('# INDELs', ['number_of_indels'])]), ('multiqc_snpeff.yaml', [('# Missense variants', ['MISSENSE'])]), ('multiqc_quast.yaml', [('# Ns per 100kb consensus', ["# N's per 100 kbp"])]), - ('multiqc_pangolin.yaml', [('Pangolin lineage', ["lineage"])]) + ('multiqc_pangolin.yaml', [('Pangolin lineage', ["lineage"])]), + ('multiqc_nextclade_clade.yaml', [('Nextclade clade', ["clade"])]) ] if args.PLATFORM == 'illumina': diff --git a/lib/WorkflowCommons.groovy b/lib/WorkflowCommons.groovy index 801824c1..a432d841 100755 --- a/lib/WorkflowCommons.groovy +++ b/lib/WorkflowCommons.groovy @@ -77,7 +77,7 @@ class WorkflowCommons { // Function to read in all fields into a Groovy Map from Pangolin output file // // See: https://stackoverflow.com/a/67766919 - public static Map getPangolinFieldMap(pangolin_report, log) { + public static Map getPangolinFieldMap(pangolin_report) { def headers = [] def field_map = [:] pangolin_report.readLines().eachWithIndex { row, row_index -> @@ -94,6 +94,27 @@ class WorkflowCommons { return field_map } + // + // Function to read in all fields into a Groovy Map from Nextclade CSV output file + // + // See: https://stackoverflow.com/a/67766919 + public static Map getNextcladeFieldMapFromCsv(nextclade_report) { + def headers = [] + def field_map = [:] + nextclade_report.readLines().eachWithIndex { row, row_index -> + def vals = row.split(';') + if (row_index == 0) { + headers = vals + } else { + def cells = headers.eachWithIndex { header, header_index -> + def val = (header_index <= vals.size()-1) ? vals[header_index] : '' + field_map[header] = val ?: 'NA' + } + } + } + return field_map + } + // // Function to get number of variants reported in BCFTools stats file // diff --git a/modules/local/multiqc_nanopore.nf b/modules/local/multiqc_nanopore.nf index 89ac15cd..bc6c196d 100644 --- a/modules/local/multiqc_nanopore.nf +++ b/modules/local/multiqc_nanopore.nf @@ -35,6 +35,7 @@ process MULTIQC { path ('quast/*') path ('snpeff/*') path pangolin_lineage + path nextclade_clade output: path "*multiqc_report.html", emit: report @@ -56,6 +57,6 @@ process MULTIQC { rm -rf quast ## Run MultiQC a second time - multiqc -f $options.args -e general_stats $custom_config . + multiqc -f $options.args -e general_stats --ignore *nextclade_clade_mqc.tsv $custom_config . """ } diff --git a/workflows/illumina.nf b/workflows/illumina.nf index 4f50dfd5..77e6fec4 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -470,7 +470,7 @@ workflow ILLUMINA { // ch_ivar_pangolin_report .map { meta, report -> - def fields = WorkflowCommons.getPangolinFieldMap(report, log) + def fields = WorkflowCommons.getPangolinFieldMap(report) return [sample:meta.id] << fields } .set { ch_ivar_pangolin_multiqc } @@ -521,7 +521,7 @@ workflow ILLUMINA { // ch_bcftools_pangolin_report .map { meta, report -> - def fields = WorkflowCommons.getPangolinFieldMap(report, log) + def fields = WorkflowCommons.getPangolinFieldMap(report) return [sample:meta.id] << fields } .set { ch_bcftools_pangolin_multiqc } diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index d220e1d7..9fe4915f 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -60,6 +60,7 @@ include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_SAMPLE_NAME } from '../m include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_NO_BARCODES } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_FAIL_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV as MULTIQC_CUSTOM_NEXTCLADE } from '../modules/local/multiqc_custom_tsv' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) @@ -370,7 +371,7 @@ workflow NANOPORE { .out .report .map { meta, report -> - def fields = WorkflowCommons.getPangolinFieldMap(report, log) + def fields = WorkflowCommons.getPangolinFieldMap(report) return [sample:meta.id] << fields } .set { ch_pangolin_multiqc } @@ -385,12 +386,32 @@ workflow NANOPORE { // // MODULE: Clade assignment, mutation calling, and sequence quality checks with Nextclade // + ch_nextclade_multiqc = Channel.empty() if (!params.skip_nextclade) { NEXTCLADE ( ARTIC_MINION.out.fasta, 'csv' ) ch_software_versions = ch_software_versions.mix(NEXTCLADE.out.version.ifEmpty(null)) + + // + // MODULE: Get Nextclade clade information for MultiQC report + // + NEXTCLADE + .out + .csv + .map { meta, csv -> + def clade = WorkflowCommons.getNextcladeFieldMapFromCsv(csv)['clade'] + return [ "$meta.id\t$clade" ] + } + .set { ch_nextclade_multiqc } + + MULTIQC_CUSTOM_NEXTCLADE ( + ch_nextclade_multiqc.collect(), + 'Sample\tclade', + 'nextclade_clade' + ) + .set { ch_nextclade_multiqc } } // @@ -492,7 +513,8 @@ workflow NANOPORE { ch_mosdepth_multiqc.collect{it[1]}.ifEmpty([]), ch_quast_multiqc.collect().ifEmpty([]), ch_snpeff_multiqc.collect{it[1]}.ifEmpty([]), - ch_pangolin_multiqc.collect().ifEmpty([]) + ch_pangolin_multiqc.collect().ifEmpty([]), + ch_nextclade_multiqc.collect().ifEmpty([]) ) multiqc_report = MULTIQC.out.report.toList() } From 0681d0d82933e5d7d8d6b8209396138f872b2d69 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 12:57:25 +0100 Subject: [PATCH 64/68] Update CHANGELOG --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9b902220..7d374d8e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * Added docs about overwriting default container definitions to use latest versions e.g. Pangolin * Dashes and spaces in sample names will be converted to underscores to avoid issues when creating the summary metrics * [[#196](https://github.com/nf-core/viralrecon/issues/196)] - Add mosdepth heatmap to MultiQC report +* [[#197](https://github.com/nf-core/viralrecon/issues/197)] - Output a .tsv comprising the Nextclade and Pangolin results for all samples processed * [[#198](https://github.com/nf-core/viralrecon/issues/198)] - ASCIIGenome failing during analysis * [[#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work * [[#204](https://github.com/nf-core/viralrecon/issues/204)] - Memory errors for SNP_EFF step From af854a489472e6203baa8af7761f4fa88e53cda3 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 13:22:50 +0100 Subject: [PATCH 65/68] Add Nextclade clade info to summary metrics and MultiQC report - illumina --- assets/multiqc_config_illumina.yaml | 4 ++ bin/multiqc_to_custom_csv.py | 4 +- modules/local/multiqc_illumina.nf | 4 +- workflows/illumina.nf | 86 +++++++++++++++++++++-------- 4 files changed, 74 insertions(+), 24 deletions(-) diff --git a/assets/multiqc_config_illumina.yaml b/assets/multiqc_config_illumina.yaml index c920d605..fcb68808 100644 --- a/assets/multiqc_config_illumina.yaml +++ b/assets/multiqc_config_illumina.yaml @@ -237,6 +237,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (iVar)": description: "Pangolin lineage inferred from the consensus sequence generated by iVar" + "Nextclade clade (iVar)": + description: "Nextclade clade inferred from the consensus sequence generated by iVar" "# SNPs (BCFTools)": description: "Total number of SNPs called by BCFTools" format: "{:,.0f}" @@ -251,6 +253,8 @@ custom_data: format: "{:,.2f}" "Pangolin lineage (BCFTools)": description: "Pangolin lineage inferred from the consensus sequence generated by BCFTools" + "Nextclade clade (BCFTools)": + description: "Nextclade clade inferred from the consensus sequence generated by BCFTools" pconfig: id: "summary_variants_metrics_plot" table_title: "Variant calling metrics" diff --git a/bin/multiqc_to_custom_csv.py b/bin/multiqc_to_custom_csv.py index 1f28a8a9..87486393 100755 --- a/bin/multiqc_to_custom_csv.py +++ b/bin/multiqc_to_custom_csv.py @@ -139,11 +139,13 @@ def main(args=None): ('multiqc_snpeff_snpeff_ivar.yaml', [('# Missense variants (iVar)', ['MISSENSE'])]), ('multiqc_quast_quast_ivar.yaml', [('# Ns per 100kb consensus (iVar)', ["# N's per 100 kbp"])]), ('multiqc_variants:_pangolin_(ivar).yaml', [('Pangolin lineage (iVar)', ["lineage"])]), + ('multiqc_ivar_nextclade_clade.yaml', [('Nextclade clade (iVar)', ["clade"])]), ('multiqc_bcftools_stats_bcftools_bcftools.yaml', [('# SNPs (BCFTools)', ['number_of_SNPs']), ('# INDELs (BCFTools)', ['number_of_indels'])]), ('multiqc_snpeff_snpeff_bcftools.yaml', [('# Missense variants (BCFTools)', ['MISSENSE'])]), ('multiqc_quast_quast_bcftools.yaml', [('# Ns per 100kb consensus (BCFTools)', ["# N's per 100 kbp"])]), - ('multiqc_variants:_pangolin_(bcftools).yaml', [('Pangolin lineage (BCFTools)', ["lineage"])]) + ('multiqc_variants:_pangolin_(bcftools).yaml', [('Pangolin lineage (BCFTools)', ["lineage"])]), + ('multiqc_bcftools_nextclade_clade.yaml', [('Nextclade clade (BCFTools)', ["clade"])]) ] illumina_assembly_files = [ diff --git a/modules/local/multiqc_illumina.nf b/modules/local/multiqc_illumina.nf index fb45939a..26952dc6 100644 --- a/modules/local/multiqc_illumina.nf +++ b/modules/local/multiqc_illumina.nf @@ -38,6 +38,8 @@ process MULTIQC { path ('variants_ivar/*') path ('variants_ivar/*') path ('variants_ivar/*') + path ('variants_ivar/*') + path ('variants_bcftools/*') path ('variants_bcftools/*') path ('variants_bcftools/*') path ('variants_bcftools/*') @@ -77,6 +79,6 @@ process MULTIQC { rm -f variants_bcftools/report.tsv ## Run MultiQC a second time - multiqc -f $options.args -e general_stats $custom_config . + multiqc -f $options.args -e general_stats --ignore *nextclade_clade_mqc.tsv $custom_config . """ } diff --git a/workflows/illumina.nf b/workflows/illumina.nf index e3097dcb..dc8662d3 100644 --- a/workflows/illumina.nf +++ b/workflows/illumina.nf @@ -72,8 +72,10 @@ include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' include { MULTIQC } from '../modules/local/multiqc_illumina' addParams( options: multiqc_options ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['illumina_plot_mosdepth_regions_amplicon'] ) -include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) -include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_READS } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_FAIL_MAPPED } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_IVAR_NEXTCLADE } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_BCFTOOLS_NEXTCLADE } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_IVAR_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_CSV_FROM_MAP as MULTIQC_CUSTOM_CSV_BCFTOOLS_PANGOLIN } from '../modules/local/multiqc_custom_csv_from_map' addParams( options: [publish_files: false] ) @@ -430,13 +432,14 @@ workflow ILLUMINA { // // SUBWORKFLOW: Call variants with IVar // - ch_ivar_vcf = Channel.empty() - ch_ivar_tbi = Channel.empty() - ch_ivar_counts_multiqc = Channel.empty() - ch_ivar_stats_multiqc = Channel.empty() - ch_ivar_snpeff_multiqc = Channel.empty() - ch_ivar_quast_multiqc = Channel.empty() - ch_ivar_pangolin_multiqc = Channel.empty() + ch_ivar_vcf = Channel.empty() + ch_ivar_tbi = Channel.empty() + ch_ivar_counts_multiqc = Channel.empty() + ch_ivar_stats_multiqc = Channel.empty() + ch_ivar_snpeff_multiqc = Channel.empty() + ch_ivar_quast_multiqc = Channel.empty() + ch_ivar_pangolin_multiqc = Channel.empty() + ch_ivar_nextclade_multiqc = Channel.empty() if (!params.skip_variants && 'ivar' in callers) { VARIANTS_IVAR ( ch_bam, @@ -448,13 +451,14 @@ workflow ILLUMINA { PREPARE_GENOME.out.snpeff_config, ch_ivar_variants_header_mqc ) - ch_ivar_vcf = VARIANTS_IVAR.out.vcf - ch_ivar_tbi = VARIANTS_IVAR.out.tbi - ch_ivar_counts_multiqc = VARIANTS_IVAR.out.multiqc_tsv - ch_ivar_stats_multiqc = VARIANTS_IVAR.out.stats - ch_ivar_snpeff_multiqc = VARIANTS_IVAR.out.snpeff_csv - ch_ivar_quast_multiqc = VARIANTS_IVAR.out.quast_tsv - ch_ivar_pangolin_report = VARIANTS_IVAR.out.pangolin_report + ch_ivar_vcf = VARIANTS_IVAR.out.vcf + ch_ivar_tbi = VARIANTS_IVAR.out.tbi + ch_ivar_counts_multiqc = VARIANTS_IVAR.out.multiqc_tsv + ch_ivar_stats_multiqc = VARIANTS_IVAR.out.stats + ch_ivar_snpeff_multiqc = VARIANTS_IVAR.out.snpeff_csv + ch_ivar_quast_multiqc = VARIANTS_IVAR.out.quast_tsv + ch_ivar_pangolin_report = VARIANTS_IVAR.out.pangolin_report + ch_ivar_nextclade_report = VARIANTS_IVAR.out.nextclade_report ch_software_versions = ch_software_versions.mix(VARIANTS_IVAR.out.ivar_version.first().ifEmpty(null)) ch_software_versions = ch_software_versions.mix(VARIANTS_IVAR.out.tabix_version.first().ifEmpty(null)) ch_software_versions = ch_software_versions.mix(VARIANTS_IVAR.out.bcftools_version.first().ifEmpty(null)) @@ -480,17 +484,35 @@ workflow ILLUMINA { 'ivar_pangolin_lineage' ) .set { ch_ivar_pangolin_multiqc } + + // + // MODULE: Get Nextclade clade information for MultiQC report + // + ch_ivar_nextclade_report + .map { meta, csv -> + def clade = WorkflowCommons.getNextcladeFieldMapFromCsv(csv)['clade'] + return [ "$meta.id\t$clade" ] + } + .set { ch_ivar_nextclade_multiqc } + + MULTIQC_CUSTOM_TSV_IVAR_NEXTCLADE ( + ch_ivar_nextclade_multiqc.collect(), + 'Sample\tclade', + 'ivar_nextclade_clade' + ) + .set { ch_ivar_nextclade_multiqc } } // // SUBWORKFLOW: Call variants with BCFTools // - ch_bcftools_vcf = Channel.empty() - ch_bcftools_tbi = Channel.empty() - ch_bcftools_stats_multiqc = Channel.empty() - ch_bcftools_snpeff_multiqc = Channel.empty() - ch_bcftools_quast_multiqc = Channel.empty() - ch_bcftools_pangolin_multiqc = Channel.empty() + ch_bcftools_vcf = Channel.empty() + ch_bcftools_tbi = Channel.empty() + ch_bcftools_stats_multiqc = Channel.empty() + ch_bcftools_snpeff_multiqc = Channel.empty() + ch_bcftools_quast_multiqc = Channel.empty() + ch_bcftools_pangolin_multiqc = Channel.empty() + ch_bcftools_nextclade_multiqc = Channel.empty() if (!params.skip_variants && 'bcftools' in callers) { VARIANTS_BCFTOOLS ( ch_bam, @@ -507,6 +529,7 @@ workflow ILLUMINA { ch_bcftools_snpeff_multiqc = VARIANTS_BCFTOOLS.out.snpeff_csv ch_bcftools_quast_multiqc = VARIANTS_BCFTOOLS.out.quast_tsv ch_bcftools_pangolin_report = VARIANTS_BCFTOOLS.out.pangolin_report + ch_bcftools_nextclade_report = VARIANTS_BCFTOOLS.out.nextclade_report ch_software_versions = ch_software_versions.mix(VARIANTS_BCFTOOLS.out.bcftools_version.first().ifEmpty(null)) ch_software_versions = ch_software_versions.mix(VARIANTS_BCFTOOLS.out.bedtools_version.first().ifEmpty(null)) ch_software_versions = ch_software_versions.mix(VARIANTS_BCFTOOLS.out.quast_version.ifEmpty(null)) @@ -531,6 +554,23 @@ workflow ILLUMINA { 'bcftools_pangolin_lineage' ) .set { ch_bcftools_pangolin_multiqc } + + // + // MODULE: Get Nextclade clade information for MultiQC report + // + ch_bcftools_nextclade_report + .map { meta, csv -> + def clade = WorkflowCommons.getNextcladeFieldMapFromCsv(csv)['clade'] + return [ "$meta.id\t$clade" ] + } + .set { ch_bcftools_nextclade_multiqc } + + MULTIQC_CUSTOM_TSV_BCFTOOLS_NEXTCLADE ( + ch_bcftools_nextclade_multiqc.collect(), + 'Sample\tclade', + 'bcftools_nextclade_clade' + ) + .set { ch_bcftools_nextclade_multiqc } } // @@ -671,10 +711,12 @@ workflow ILLUMINA { ch_ivar_snpeff_multiqc.collect{it[1]}.ifEmpty([]), ch_ivar_quast_multiqc.collect().ifEmpty([]), ch_ivar_pangolin_multiqc.collect().ifEmpty([]), + ch_ivar_nextclade_multiqc.collect().ifEmpty([]), ch_bcftools_stats_multiqc.collect{it[1]}.ifEmpty([]), ch_bcftools_snpeff_multiqc.collect{it[1]}.ifEmpty([]), ch_bcftools_quast_multiqc.collect().ifEmpty([]), ch_bcftools_pangolin_multiqc.collect().ifEmpty([]), + ch_bcftools_nextclade_multiqc.collect().ifEmpty([]), ch_cutadapt_multiqc.collect{it[1]}.ifEmpty([]), ch_spades_quast_multiqc.collect().ifEmpty([]), ch_unicycler_quast_multiqc.collect().ifEmpty([]), From 513146b3db9740c8561c2b32453114c1c7487a6d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 13:27:26 +0100 Subject: [PATCH 66/68] Update CHANGELOG --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7d374d8e..d7b9c543 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-14 +## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-15 ### Enhancements & fixes From cbfc3d52a83014a5dfdc4df414ee94a49b4ef47d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 13:30:40 +0100 Subject: [PATCH 67/68] Add correct channel name after erroneous merge conflict fix --- workflows/nanopore.nf | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/workflows/nanopore.nf b/workflows/nanopore.nf index daed1e9c..a02c63a9 100644 --- a/workflows/nanopore.nf +++ b/workflows/nanopore.nf @@ -60,6 +60,7 @@ include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_NO_SAMPLE_NAME } include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_NO_BARCODES } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_BARCODE_COUNT } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_GUPPYPLEX_COUNT } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) +include { MULTIQC_CUSTOM_TSV_FROM_STRING as MULTIQC_CUSTOM_TSV_NEXTCLADE } from '../modules/local/multiqc_custom_tsv_from_string' addParams( options: [publish_files: false] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_GENOME } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_genome'] ) include { PLOT_MOSDEPTH_REGIONS as PLOT_MOSDEPTH_REGIONS_AMPLICON } from '../modules/local/plot_mosdepth_regions' addParams( options: modules['nanopore_plot_mosdepth_regions_amplicon'] ) @@ -405,7 +406,7 @@ workflow NANOPORE { } .set { ch_nextclade_multiqc } - MULTIQC_CUSTOM_NEXTCLADE ( + MULTIQC_CUSTOM_TSV_NEXTCLADE ( ch_nextclade_multiqc.collect(), 'Sample\tclade', 'nextclade_clade' From 70cd8db4aa3eec79d7d44b0f668b48589dbf762f Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Mon, 14 Jun 2021 14:45:42 +0100 Subject: [PATCH 68/68] Remove braces in docs --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index f7ec1a1d..c4c4e0de 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -150,7 +150,7 @@ nextflow run nf-core/viralrecon \ ### SWIFT primer sets -The [SWIFT amplicon panel](https://swiftbiosci.com/swift-amplicon-sars-cov-2-panel/) is another commonly used method used to prep and sequence SARS-CoV-2 samples. We haven't been able to obtain explicit permission to host standard SWIFT primer sets but you can obtain a masterfile which is freely available from their website that contains the primer sequences as well as genomic co-ordinates. You just need to convert this file to [BED6](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format and provide it to the pipeline with `--primer_bed swift_primers.bed`. Be sure to check the values provided to [`--primer_left_suffix`] and [`--primer_right_suffix`] match the primer names defined in the BED file as highlighted in [this issue](https://github.com/nf-core/viralrecon/issues/169). For an explanation behind the usage of the `--ivar_trim_offset 5` for SWIFT primer sets see [this issue](https://github.com/nf-core/viralrecon/issues/170). +The [SWIFT amplicon panel](https://swiftbiosci.com/swift-amplicon-sars-cov-2-panel/) is another commonly used method used to prep and sequence SARS-CoV-2 samples. We haven't been able to obtain explicit permission to host standard SWIFT primer sets but you can obtain a masterfile which is freely available from their website that contains the primer sequences as well as genomic co-ordinates. You just need to convert this file to [BED6](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format and provide it to the pipeline with `--primer_bed swift_primers.bed`. Be sure to check the values provided to `--primer_left_suffix` and `--primer_right_suffix` match the primer names defined in the BED file as highlighted in [this issue](https://github.com/nf-core/viralrecon/issues/169). For an explanation behind the usage of the `--ivar_trim_offset 5` for SWIFT primer sets see [this issue](https://github.com/nf-core/viralrecon/issues/170). An example command using SWIFT primers with "MN908947.3":