Skip to content

Commit

Permalink
Merge pull request #203 from nf-core/dev
Browse files Browse the repository at this point in the history
Dev -> Master for 2.1 release
  • Loading branch information
drpatelh committed Jun 15, 2021
2 parents a85d596 + 9fd5ef2 commit f017132
Show file tree
Hide file tree
Showing 53 changed files with 1,141 additions and 1,379 deletions.
27 changes: 1 addition & 26 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ jobs:
--skip_assembly,
"--spades_mode corona",
"--spades_mode metaviral",
"--skip_plasmidid false --skip_asciigenome",
]
steps:
- name: Check out pipeline code
Expand All @@ -70,32 +71,6 @@ jobs:
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }}
test_sra:
name: Test SRA workflow
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/viralrecon') }}
runs-on: ubuntu-latest
env:
NXF_VER: ${{ matrix.nxf_ver }}
NXF_ANSI_LOG: false
strategy:
matrix:
parameters: [--skip_sra_fastq_download, ""]

steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Nextflow
env:
CAPSULE_LOG: none
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline to download SRA ids and various options
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_sra,docker ${{ matrix.parameters }}
test_sispa:
name: Test SISPA workflow
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/viralrecon') }}
Expand Down
41 changes: 41 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,47 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[2.1](https://github.com/nf-core/rnaseq/releases/tag/2.1)] - 2021-06-15

### Enhancements & fixes

* Removed workflow to download data from public databases in favour of using [nf-core/fetchngs](https://nf-co.re/fetchngs)
* Added Pangolin results to MultiQC report
* Add warning to MultiQC report for samples that have no reads after adapter trimming
* Added docs about structure of data required for running Nanopore data
* Added docs about using other primer sets for Illumina data
* Added docs about overwriting default container definitions to use latest versions e.g. Pangolin
* Dashes and spaces in sample names will be converted to underscores to avoid issues when creating the summary metrics
* [[#196](https://github.com/nf-core/viralrecon/issues/196)] - Add mosdepth heatmap to MultiQC report
* [[#197](https://github.com/nf-core/viralrecon/issues/197)] - Output a .tsv comprising the Nextclade and Pangolin results for all samples processed
* [[#198](https://github.com/nf-core/viralrecon/issues/198)] - ASCIIGenome failing during analysis
* [[#201](https://github.com/nf-core/viralrecon/issues/201)] - Conditional include are not expected to work
* [[#204](https://github.com/nf-core/viralrecon/issues/204)] - Memory errors for SNP_EFF step

### Parameters

| Old parameter | New parameter |
|-------------------------------|---------------------------------------|
| `--public_data_ids` | |
| `--skip_sra_fastq_download` | |

> **NB:** Parameter has been __updated__ if both old and new parameter information is present.
> **NB:** Parameter has been __added__ if just the new parameter information is present.
> **NB:** Parameter has been __removed__ if new parameter information isn't present.
### Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Dependency | Old version | New version |
|-------------------------------|-------------|-------------|
| `nextclade_js` | 0.14.2 | 0.14.4 |
| `pangolin` | 2.4.2 | 3.0.5 |

> **NB:** Dependency has been __updated__ if both old and new version information is present.
> **NB:** Dependency has been __added__ if just the new version information is present.
> **NB:** Dependency has been __removed__ if new version information isn't present.
## [[2.0](https://github.com/nf-core/rnaseq/releases/tag/2.0)] - 2021-05-13

### :warning: Major enhancements
Expand Down
34 changes: 12 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

## Introduction

**nfcore/viralrecon** is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: [ARTIC SARS-CoV-2 enrichment protocol](https://artic.network/ncov-2019); or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/).
**nf-core/viralrecon** is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports both Illumina and Nanopore sequencing data. For Illumina short-reads the pipeline is able to analyse metagenomics data typically obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: [ARTIC SARS-CoV-2 enrichment protocol](https://artic.network/ncov-2019); or probe-capture-based). For Nanopore data the pipeline only supports amplicon-based analysis obtained from primer sets created and maintained by the [ARTIC Network](https://artic.network/).

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from running the full-sized tests individually for each `--platform` option can be viewed on the [nf-core website](https://nf-co.re/viralrecon/results) and the output directories will be named accordingly i.e. `platform_illumina/` and `platform_nanopore/`.

Expand All @@ -26,14 +26,15 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

The pipeline has numerous options to allow you to run only specific aspects of the workflow if you so wish. For example, for Illumina data you can skip the host read filtering step with Kraken 2 with `--skip_kraken2` or you can skip all of the assembly steps with the `--skip_assembly` parameter. See the [usage](https://nf-co.re/viralrecon/usage) and [parameter](https://nf-co.re/viralrecon/parameters) docs for all of the available options when running the pipeline.

The SRA download functionality has been removed from the pipeline (`>=2.1`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline viralrecon` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly by the Illumina processing mode of nf-core/viralrecon.

### Illumina

1. Download samples via SRA, ENA or GEO ids ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html); *if required*)
2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Adapter trimming ([`fastp`](https://github.com/OpenGene/fastp))
5. Removal of host reads ([`Kraken 2`](http://ccb.jhu.edu/software/kraken2/); *optional*)
6. Variant calling
1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Adapter trimming ([`fastp`](https://github.com/OpenGene/fastp))
4. Removal of host reads ([`Kraken 2`](http://ccb.jhu.edu/software/kraken2/); *optional*)
5. Variant calling
1. Read alignment ([`Bowtie 2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
2. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
3. Primer sequence removal ([`iVar`](https://github.com/andersen-lab/ivar); *amplicon data only*)
Expand All @@ -47,14 +48,14 @@ The pipeline has numerous options to allow you to run only specific aspects of t
* Clade assignment, mutation calling and sequence quality checks ([`Nextclade`](https://github.com/nextstrain/nextclade))
* Individual variant screenshots with annotation tracks ([`ASCIIGenome`](https://asciigenome.readthedocs.io/en/latest/))
8. Intersect variants across callers ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html))
7. _De novo_ assembly
6. _De novo_ assembly
1. Primer trimming ([`Cutadapt`](https://cutadapt.readthedocs.io/en/stable/guide.html); *amplicon data only*)
2. Choice of multiple assembly tools ([`SPAdes`](http://cab.spbu.ru/software/spades/) *||* [`Unicycler`](https://github.com/rrwick/Unicycler) *||* [`minia`](https://github.com/GATB/minia))
* Blast to reference genome ([`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch))
* Contiguate assembly ([`ABACAS`](https://www.sanger.ac.uk/science/tools/pagit))
* Assembly report ([`PlasmidID`](https://github.com/BU-ISCIII/plasmidID))
* Assembly assessment report ([`QUAST`](http://quast.sourceforge.net/quast))
8. Present QC and visualisation for raw read, alignment, assembly and variant calling results ([`MultiQC`](http://multiqc.info/))
7. Present QC and visualisation for raw read, alignment, assembly and variant calling results ([`MultiQC`](http://multiqc.info/))

### Nanopore

Expand Down Expand Up @@ -130,28 +131,17 @@ The pipeline has numerous options to allow you to run only specific aspects of t
-profile <docker/singularity/podman/conda/institute>
```

* Typical command for downloading public data:

```bash
nextflow run nf-core/viralrecon \
--public_data_ids ids.txt \
-profile <docker/singularity/podman/conda/institute>
```

* An executable Python script called [`fastq_dir_to_samplesheet.py`](https://github.com/nf-core/viralrecon/blob/master/bin/fastq_dir_to_samplesheet.py) has been provided if you are using `--platform illumina` and would like to auto-create an input samplesheet based on a directory containing FastQ files **before** you run the pipeline (requires Python 3 installed locally) e.g.

```console
~/.nextflow/assets/nf-core/viralrecon/bin/fastq_dir_to_samplesheet.py <FASTQ_DIR> samplesheet.csv
```

* You can find the default keys used to specify `--genome` in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config). Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys. If you are able to get permissions from the vendor/supplier to share the primer information then we would be more than happy to support it within the pipeline.
* The commands to obtain public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the raw data in an initial pipeline run (`results/public_data/`) and then to curate the auto-created samplesheet based on the available sample metadata before you run the pipeline again properly.

See [usage](https://nf-co.re/viralrecon/usage) and [parameter](https://nf-co.re/viralrecon/parameters) docs for all of the available options when running the pipeline.
* You can find the default keys used to specify `--genome` in the [genomes config file](https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config). Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys; see [usage docs](https://nf-co.re/viralrecon/usage#illumina-primer-sets).

## Documentation

The nf-core/viralrecon pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/viralrecon/usage) and [output](https://nf-co.re/viralrecon/output).
The nf-core/viralrecon pipeline comes with documentation about the pipeline [usage](https://nf-co.re/viralrecon/usage), [parameters](https://nf-co.re/viralrecon/parameters) and [output](https://nf-co.re/viralrecon/output).

## Credits

Expand Down
2 changes: 1 addition & 1 deletion assets/headers/ivar_variants_header_mqc.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#id: 'ivar_variants'
#section_name: 'VARIANTS: iVar variant counts'
#section_name: 'VARIANTS: Total variants (iVar)'
#description: "is calculated from the total number of variants called by
# <a href='https://andersen-lab.github.io/ivar/html/manualpage.html' target='_blank'>iVar</a>."
#plot_type: 'bargraph'
Expand Down
Loading

0 comments on commit f017132

Please sign in to comment.