Skip to content
/ Ardetype Public

Pipeline allows to perform reference-guided de-novo assembly of bacterial genomes, starting from Illumina PE150 fastq files.

Notifications You must be signed in to change notification settings

NMRL/Ardetype

Repository files navigation

ARDETYPE

Latest Version

NGS data processing pipeline designed to perform species-agnostic and species-specific analysis of short paired-end (PE) bacterial reads.

Pipeline is structured in terms of modules. Each module corresponds to a Snakemake script (snakefile) and a python module object. Snakefiles are used to define data processing rules.

Module class objects store information about inputs and expected outputs for these rules, handle file placement operations, module configuration, information transfer between modules, and other processes that are not directly involved in running analysis on files.

Module Type Description Tools
bact_core agnostic QC, host filtering, denovo assembly, taxonomic classification fastp, kraken2, shovill, krakentools, krona
bact_shell agnostic assembly QC, resistance profiling, plasmid reconstruction & typing quast, rgi-card, amr++v2.0, resfinder, mob-suite
bact_tip specific species-dependent sub-typing hicap, meningotype, legsta, Kleborate, AgrVATE, spaTyper, Staphopia-sccmec, emmtyper, seqsero, sistr, lissero, PubMLST database API, Institute Pasteur MLST database API, Legionella pneumophila in silico Serogroup Prediction, ectyper, seroba

Configuration

Pipeline is designed to be run by NMRL users on RTU HPC, where HPC-level configuration is available out-of-the-box.

Configuration level Dependencies
HPC Torque/PBS, Conda, Singularity
Conda Snakemake (should be installed for each user using -s flag to the ardetype.py script)
Python numpy==1.22.3, pandas==1.4.2, PyYAML==6.0, requests==2.27.1, bs4==0.0.1
Kraken2 Pre-built or custom databases for human and bacteria
Resfinder4 Database

Installation

To install from scratch, you will need a Linux system with root access and installed singularity to build containers. WSL or Virtual Machine should also work.

Clone the repository to your local machine and use singularity recipe files to build containers,
then copy to HPC cluster so that they can be accessed by the pipeline scripts.

Clone the repository to the cluster and edit files found in config_files folder to match your local setup:

File Scope
module_data.json paths to cluster_config file and snakefiles
config_modular.yaml paths to singularity image files, kraken2 databases, resfinder database, path to Legionella pneumophila in silico Serogroup Prediction tool

Using the pipeline

  • Note: pipeline accepts only fastq files that are named according to illumina conventions (sample_id_R{1,2}_001.fastq.gz).

  • Testing (to see what jobs will be executed):

    python ardetype.py -t -i path_to_folder_with_fastq/ -o path_to_output_folder -m all

  • Running:

    python ardetype.py -i path_to_folder_with_fastq/ -o path_to_output_folder -m all

About

Pipeline allows to perform reference-guided de-novo assembly of bacterial genomes, starting from Illumina PE150 fastq files.

Resources

Stars

Watchers

Forks

Packages

No packages published