ARDETYPE

NGS data processing pipeline designed to perform species-agnostic and species-specific analysis of short paired-end (PE) bacterial reads.

Pipeline is structured in terms of modules. Each module corresponds to a Snakemake script (snakefile) and a python module object. Snakefiles are used to define data processing rules.

Module class objects store information about inputs and expected outputs for these rules, handle file placement operations, module configuration, information transfer between modules, and other processes that are not directly involved in running analysis on files.

Module	Type	Description	Tools
bact_core	agnostic	QC, host filtering, denovo assembly, taxonomic classification	fastp, kraken2, shovill, krakentools, krona
bact_shell	agnostic	assembly QC, resistance profiling, plasmid reconstruction & typing	quast, rgi-card, amr++v2.0, resfinder, mob-suite
bact_tip	specific	species-dependent sub-typing	hicap, meningotype, legsta, Kleborate, AgrVATE, spaTyper, Staphopia-sccmec, emmtyper, seqsero, sistr, lissero, PubMLST database API, Institute Pasteur MLST database API, Legionella pneumophila in silico Serogroup Prediction, ectyper, seroba

Configuration

Pipeline is designed to be run by NMRL users on RTU HPC, where HPC-level configuration is available out-of-the-box.

Configuration level	Dependencies
HPC	Torque/PBS, Conda, Singularity
Conda	Snakemake (should be installed for each user using -s flag to the ardetype.py script)
Python	numpy==1.22.3, pandas==1.4.2, PyYAML==6.0, requests==2.27.1, bs4==0.0.1
Kraken2	Pre-built or custom databases for human and bacteria
Resfinder4	Database

Installation

To install from scratch, you will need a Linux system with root access and installed singularity to build containers. WSL or Virtual Machine should also work.

Clone the repository to your local machine and use singularity recipe files to build containers,
then copy to HPC cluster so that they can be accessed by the pipeline scripts.

Clone the repository to the cluster and edit files found in config_files folder to match your local setup:

File	Scope
module_data.json	paths to cluster_config file and snakefiles
config_modular.yaml	paths to singularity image files, kraken2 databases, resfinder database, path to Legionella pneumophila in silico Serogroup Prediction tool

Using the pipeline

Note: pipeline accepts only fastq files that are named according to illumina conventions (sample_id_R{1,2}_001.fastq.gz).
Testing (to see what jobs will be executed):

python ardetype.py -t -i path_to_folder_with_fastq/ -o path_to_output_folder -m all
Running:

python ardetype.py -i path_to_folder_with_fastq/ -o path_to_output_folder -m all

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
.github/workflows		.github/workflows
config_files		config_files
historical		historical
notes		notes
snakefiles		snakefiles
subscripts		subscripts
unittests		unittests
.gitignore		.gitignore
README.md		README.md
ardetype.py		ardetype.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARDETYPE

Configuration

Installation

Using the pipeline

About

Releases

Packages

Contributors 2

Languages

NMRL/Ardetype

Folders and files

Latest commit

History

Repository files navigation

ARDETYPE

Configuration

Installation

Using the pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages