C-BIRD

CT-PHL Bacterial Identification and Resistance Detection

Overview

C-BIRD is a small and accurate pipeline for rapid bacterial identification and antimicrobial resistance gene detection for common pathogenic bacteria. It works with Illumina paired-end reads using de novo assembly approach. C-BIRD can run on Terra.Bio platform as well as any Linux machine that has a WDL workflow engine such as miniwdl. Producing clinically meaningful results and generating individual reports for each sample is within this project's scope. So, auto-update of any tool or database is deliberately avoided for strict control and validation purposes.

C-BIRD now uses machine learning models (CheckM2) to asses completeness and contamination of the assembly. A new custom sketch was created to identfy bacteria for selected genera via Mash. It includes all the species of Acinetobacter, Burkholderia, Citrobacter, Enterobacter, Escherichia, Klebsiella, Kluyvera, Metapseudomonas, Morganella, Neisseria, Proteus, Providencia, Pseudomonas, Raoultella, Salmonella, Serratia, Streptococcus. Kraken2 and Braken will still be used for taxonomic profiling of reads and the organism that are out of the scope. Detection of AMR genes depends on NCBI's AMRFinderPlus program and its database.

Example Outputs

The current programs and tools are used in the C-BIRD pipeline.

Tools	Version	Comments
FastP	0.24.0	QC, adapter removal, quality filtering and trimming
BBTools	39.13	phiX removal & normalization (non-random downsampling)
Kraken2 / Bracken	2.1.3 / 2.9	Taxonomic profiling and abundance estimation of reads
SPAdes	4.1.0	De novo assembly
Mash	2.3	Bacterial identification
QUAST	5.3.0	Genome assembly evaluation
CheckM2	1.0.2	Completeness and contamination
mlst	2.23.0	MLST typing
AMRFinderPlus	4.0.15	AMR gene identification
BLAST+	2.15.0	Target gene search
PlasmidFinder	2.1.6	Plasmid detection
Cbird-Util	2.0	Scripts for summary report generation

Quick Start

C-BIRD is available in Dockstore for Terra. The following inputs are required to run the C-BIRD pipeline.

Input	Description
`read1`	first FASTQ file (paired-end)
`read2`	second FASTQ file (paired-end)
`samplename`	Name of the sample being processed
`kraken2_db`	Kraken2/Bracken database (Download )
`checkm2_db`	CheckM2 database (Download )

Please check wiki for optional inputs, additional details and reports.

Installation

You can obtain C-BIRD via git but it is advised to download a release version to avoid any developmental changes.

# Download a C-BIRD release
wget https://github.com/Kincekara/C-BIRD/archive/refs/tags/2.0.0.tar.gz
tar -xvf 2.0.0.tar.gz

# Download required databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240904.tar.gz
wget https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content

Running C-BIRD

You need to install miniwdl to run C-BIRD in your linux pc. Miniwdl can be obtained:

via PyPI: pip3 install miniwdl or
via conda: conda install -c conda-forge miniwdl

Plese see the miniwdl repo for further instructions.

# single sample
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_c-bird.wdl samplename="samplename" read1="read1.fastq.gz" read2="read2.fastq.gz" kraken2_db="k2_standard_20240904.tar.gz" checkm2_db="checkm2_database.tar.gz"

multiBIRD

A wrapper workflow is available to make easier to run multiple samples on local machines. multiBIRD allows users to give an input list as a tab-separeted file like below.

samples.tsv

samplename1 /path/to/sample1_read1.fastq.gz /path/to/sample1_read2.fastq.gz
samplename2 /path/to/sample2_read1.fastq.gz /path/to/sample2_read2.fastq.gz
samplename3 /path/to/sample3_read1.fastq.gz /path/to/sample3_read2.fastq.gz
...

inputs.json

{
  "multibird.inputSamplesFile": "/path/to/samples.tsv",
  "multibird.kraken2_db": "/path/to/k2_standard_20240904.tar.gz",
  "multibird.checkm2_db": "/path/to/checkm2_database.tar.gz"
}

After preparing samples.tsv and inputs.json file, you can run multiple samples at once.

# multiple samples
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_multibird.wdl -i inputs.json

Disclaimer

The results generated by this pipeline should not be used as the sole basis for any clinical decision-making. Users are responsible for ensuring that the pipeline is used in compliance with all applicable laws, regulations, and guidelines. The authors and contributors of this pipeline do not assume any liability for any direct, indirect, incidental, or consequential damages arising from the use of the pipeline or the information generated by it. Additionally, please note that genotypic results obtained from this pipeline may not always correlate with phenotypic resistance profiles. It is essential to confirm any findings with appropriate phenotypic testing and clinical correlation.

Additional Notes

C-BIRD includes code traces of Theiagen's Public Health Bacterial Genomics workflows. If you need a universal and more sophisticated pipeline, please check Theiagen's TheiaProk workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github/workflows		.github/workflows
assets		assets
tasks		tasks
workflows		workflows
.dockstore.yml		.dockstore.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C-BIRD

Overview

Example Outputs

Quick Start

Installation

Running C-BIRD

multiBIRD

Disclaimer

Additional Notes

About

Releases 8

Packages

Languages

License

Kincekara/C-BIRD

Folders and files

Latest commit

History

Repository files navigation

C-BIRD

Overview

Example Outputs

Quick Start

Installation

Running C-BIRD

multiBIRD

Disclaimer

Additional Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages