GitHub - EBI-Metagenomics/miassembler

Introduction

ebi-metagenomics/miassembler is a bioinformatics pipeline for the assembly of short metagenomic reads using SPAdes or MEGAHIT.

Read QC using FastQC
Present QC for raw reads and assembly MultiQC
Performs assembly using MEGAHIT and SPAdes, and checks assembly quality using Quast
Removes contaminated contigs using BLASTN and SeqKit
Calculates assembly coverage using MetaBAT2 metabat2_jgi_summarizebamcontigdepths for per contig depth and Samtools idxstats for alignment summary statistics.

This pipeline is still in early development. It's mostly a direct port of the mi-automation assembly generation pipeline. Some of the bespoke scripts used to remove contaminated contigs or to calculate the coverage of the assembly were replaced with tools provided by the community (SeqKit and quast respectively).

Note

This pipeline uses the nf-core template with some tweaks, but it's not part of nf-core.

Usage

Warning

It only runs in Codon using Slurm ATM.

Pipeline help:

Typical pipeline command:

  nextflow run ebi-metagenomics/miassembler --help

Input/output options
  --study_accession                       [string]  The ENA Study secondary accession
  --reads_accession                       [string]  The ENA Run primary accession
  --private_study                         [boolean] To use if the ENA study is private
  --assembler                             [string]  The short reads assembler (accepted: spades, metaspades, megahit)
  --single_end                            [boolean] Force the single_end value for the study / reads
  --library_strategy                      [string]  Force the library_strategy value for the study / reads (accepted: metagenomic, metatranscriptomic,
                                                    genomic, transcriptomic, other)
  --library_layout                        [string]  Force the library_layout value for the study / reads (accepted: single, paired)
  --spades_version                        [string]  null [default: 3.15.5]
  --megahit_version                       [string]  null [default: 1.2.9]
  --reference_genome                      [string]  The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics
                                                    internal directory (accepted: chicken.fna, salmon.fna, cod.fna, pig.fna, cow.fna, mouse.fna,
                                                    honeybee.fna, rainbow_trout.fna, ...)
  --blast_reference_genomes_folder        [string]  The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
                                                    directory.
  --bwamem2_reference_genomes_folder      [string]  The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal
                                                    directory.
  --remove_human_phix                     [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
  --human_phix_blast_index_name           [string]  Combined Human and phiX BLAST db. [default: human_phix]
  --human_phix_bwamem2_index_name         [string]  Combined Human and phiX bwa-mem2 index. [default: human_phix]
  --min_contig_length                     [integer] Minimum contig length filter. [default: 500]
  --min_contig_length_metatranscriptomics [integer] Minimum contig length filter for metaT. [default: 200]
  --assembly_memory                       [integer] Default memory allocated for the assembly process. [default: 100]
  --spades_only_assembler                 [boolean] Run SPAdes/metaSPAdes without the error correction step. [default: true]
  --outdir                                [string]  The output directory where the results will be saved. You have to use absolute paths to storage on Cloud
                                                    infrastructure. [default: results]
  --email                                 [string]  Email address for completion summary.
  --multiqc_title                         [string]  MultiQC report title. Printed as page header, used for filename if not otherwise specified.

Generic options
  --multiqc_methods_description           [string]  Custom MultiQC yaml file containing HTML including a methods description.

Example:

nextflow run ebi-metagenomics/miassembler \
  -profile codon_slurm \
  --assembler metaspades \
  --reference_genome human \
  --outdir testing_results \
  --study_accession SRP002480 \
  --reads_accession SRR1631361

Outputs

The outputs of the pipeline are organized as follows:

results/SRP1154
└── SRP115494
    └── SRR6180
        └── SRR6180434
            ├── assembly
            │   └── metaspades
            │       └── 3.15.5
            │           ├── coverage
            │           ├── decontamination
            │           └── qc
            │               ├── multiqc
            │               └── quast
            └── qc
                ├── fastp
                └── fastqc

The nested structure based on ENA Study and Reads accessions was created to suit the Microbiome Informatics team’s needs. The benefit of this structure is that results from different runs of the same study won’t overwrite any results.

Tests

There is a very small test data set ready to use:

nextflow run main.nf -resume -profile test,docker

End to end tests

Two end-to-end tests can be launched (with megahit and metaspades) with the following command:

pytest tests/workflows/ --verbose

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
conf		conf
docs		docs
lib		lib
modules		modules
subworkflows/local		subworkflows/local
tests		tests
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

Outputs

Tests

End to end tests

About

Releases 2

Packages

Contributors 5

Languages

License

EBI-Metagenomics/miassembler

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Outputs

Tests

End to end tests

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages