Skip to content

Kincekara/C-BIRD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C-BIRD

CT-PHL Bacterial Identification and Resistance Detection

Dockstore Terra.bio Cromwell MiniWDL

Overview

C-BIRD is a small and accurate pipeline for rapid bacterial identification and antimicrobial resistance gene detection for common pathogenic bacteria. It works with Illumina paired-end reads using de novo assembly approach. C-BIRD can run on Terra.Bio platform as well as any Linux machine that has a WDL workflow engine such as miniwdl. Producing clinically meaningful results and generating individual reports for each sample is within this project's scope. So, auto-update of any tool or database is deliberately avoided for strict control and validation purposes.

C-BIRD now uses machine learning models (CheckM2) to asses completeness and contamination of the assembly. A new custom sketch was created to identfy bacteria for selected genera via Mash. It includes all the species of Acinetobacter, Burkholderia, Citrobacter, Enterobacter, Escherichia, Klebsiella, Kluyvera, Metapseudomonas, Morganella, Neisseria, Proteus, Providencia, Pseudomonas, Raoultella, Salmonella, Serratia, Streptococcus. Kraken2 and Braken will still be used for taxonomic profiling of reads and the organism that are out of the scope. Detection of AMR genes depends on NCBI's AMRFinderPlus program and its database.

Example Outputs

The current programs and tools are used in the C-BIRD pipeline.

Tools Version Comments
FastP 0.24.0 QC, adapter removal, quality filtering and trimming
BBTools 39.13 phiX removal & normalization (non-random downsampling)
Kraken2 / Bracken 2.1.3 / 2.9 Taxonomic profiling and abundance estimation of reads
SPAdes 4.1.0 De novo assembly
Mash 2.3 Bacterial identification
QUAST 5.3.0 Genome assembly evaluation
CheckM2 1.0.2 Completeness and contamination
mlst 2.23.0 MLST typing
AMRFinderPlus 4.0.15 AMR gene identification
BLAST+ 2.15.0 Target gene search
PlasmidFinder 2.1.6 Plasmid detection
Cbird-Util 2.0 Scripts for summary report generation

Quick Start

C-BIRD is available in Dockstore for Terra. The following inputs are required to run the C-BIRD pipeline.

Input Description
read1 first FASTQ file (paired-end)
read2 second FASTQ file (paired-end)
samplename Name of the sample being processed
kraken2_db Kraken2/Bracken database (Download )
checkm2_db CheckM2 database (Download )

Please check wiki for optional inputs, additional details and reports.

Installation

You can obtain C-BIRD via git but it is advised to download a release version to avoid any developmental changes.

# Download a C-BIRD release
wget https://github.com/Kincekara/C-BIRD/archive/refs/tags/2.0.0.tar.gz
tar -xvf 2.0.0.tar.gz

# Download required databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240904.tar.gz
wget https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content

Running C-BIRD

You need to install miniwdl to run C-BIRD in your linux pc. Miniwdl can be obtained:

  • via PyPI: pip3 install miniwdl or
  • via conda: conda install -c conda-forge miniwdl

Plese see the miniwdl repo for further instructions.

# single sample
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_c-bird.wdl samplename="samplename" read1="read1.fastq.gz" read2="read2.fastq.gz" kraken2_db="k2_standard_20240904.tar.gz" checkm2_db="checkm2_database.tar.gz"

multiBIRD

A wrapper workflow is available to make easier to run multiple samples on local machines. multiBIRD allows users to give an input list as a tab-separeted file like below.

samples.tsv

samplename1 /path/to/sample1_read1.fastq.gz /path/to/sample1_read2.fastq.gz
samplename2 /path/to/sample2_read1.fastq.gz /path/to/sample2_read2.fastq.gz
samplename3 /path/to/sample3_read1.fastq.gz /path/to/sample3_read2.fastq.gz
...

inputs.json

{
  "multibird.inputSamplesFile": "/path/to/samples.tsv",
  "multibird.kraken2_db": "/path/to/k2_standard_20240904.tar.gz",
  "multibird.checkm2_db": "/path/to/checkm2_database.tar.gz"
}

After preparing samples.tsv and inputs.json file, you can run multiple samples at once.

# multiple samples
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_multibird.wdl -i inputs.json

Disclaimer

The results generated by this pipeline should not be used as the sole basis for any clinical decision-making. Users are responsible for ensuring that the pipeline is used in compliance with all applicable laws, regulations, and guidelines. The authors and contributors of this pipeline do not assume any liability for any direct, indirect, incidental, or consequential damages arising from the use of the pipeline or the information generated by it. Additionally, please note that genotypic results obtained from this pipeline may not always correlate with phenotypic resistance profiles. It is essential to confirm any findings with appropriate phenotypic testing and clinical correlation.

Additional Notes

C-BIRD includes code traces of Theiagen's Public Health Bacterial Genomics workflows. If you need a universal and more sophisticated pipeline, please check Theiagen's TheiaProk workflow.

About

Bacterial Identification and Antimicrobial Resistance

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published