CT-PHL Bacterial Identification and Resistance Detection
C-BIRD is a small and accurate pipeline for rapid bacterial identification and antimicrobial resistance gene detection for common pathogenic bacteria. It works with Illumina paired-end reads using de novo assembly approach. C-BIRD can run on Terra.Bio platform as well as any Linux machine that has a WDL workflow engine such as miniwdl. Producing clinically meaningful results and generating individual reports for each sample is within this project's scope. So, auto-update of any tool or database is deliberately avoided for strict control and validation purposes.
C-BIRD now uses machine learning models (CheckM2) to asses completeness and contamination of the assembly.
A new custom sketch was created to identfy bacteria for selected genera via Mash.
It includes all the species of Acinetobacter, Burkholderia, Citrobacter, Enterobacter, Escherichia, Klebsiella, Kluyvera, Metapseudomonas, Morganella, Neisseria, Proteus, Providencia, Pseudomonas, Raoultella, Salmonella, Serratia, Streptococcus
.
Kraken2 and Braken will still be used for taxonomic profiling of reads and the organism that are out of the scope.
Detection of AMR genes depends on NCBI's AMRFinderPlus program and its database.
The current programs and tools are used in the C-BIRD pipeline.
Tools | Version | Comments |
---|---|---|
FastP | 0.24.0 | QC, adapter removal, quality filtering and trimming |
BBTools | 39.13 | phiX removal & normalization (non-random downsampling) |
Kraken2 / Bracken | 2.1.3 / 2.9 | Taxonomic profiling and abundance estimation of reads |
SPAdes | 4.1.0 | De novo assembly |
Mash | 2.3 | Bacterial identification |
QUAST | 5.3.0 | Genome assembly evaluation |
CheckM2 | 1.0.2 | Completeness and contamination |
mlst | 2.23.0 | MLST typing |
AMRFinderPlus | 4.0.15 | AMR gene identification |
BLAST+ | 2.15.0 | Target gene search |
PlasmidFinder | 2.1.6 | Plasmid detection |
Cbird-Util | 2.0 | Scripts for summary report generation |
C-BIRD is available in Dockstore for Terra. The following inputs are required to run the C-BIRD pipeline.
Input | Description |
---|---|
read1 |
first FASTQ file (paired-end) |
read2 |
second FASTQ file (paired-end) |
samplename |
Name of the sample being processed |
kraken2_db |
Kraken2/Bracken database (Download ) |
checkm2_db |
CheckM2 database (Download ) |
Please check wiki for optional inputs, additional details and reports.
You can obtain C-BIRD via git
but it is advised to download a release version to avoid any developmental changes.
# Download a C-BIRD release
wget https://github.com/Kincekara/C-BIRD/archive/refs/tags/2.0.0.tar.gz
tar -xvf 2.0.0.tar.gz
# Download required databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240904.tar.gz
wget https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content
You need to install miniwdl to run C-BIRD in your linux pc. Miniwdl can be obtained:
- via PyPI:
pip3 install miniwdl
or - via conda:
conda install -c conda-forge miniwdl
Plese see the miniwdl repo for further instructions.
# single sample
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_c-bird.wdl samplename="samplename" read1="read1.fastq.gz" read2="read2.fastq.gz" kraken2_db="k2_standard_20240904.tar.gz" checkm2_db="checkm2_database.tar.gz"
A wrapper workflow is available to make easier to run multiple samples on local machines. multiBIRD allows users to give an input list as a tab-separeted file like below.
samples.tsv
samplename1 /path/to/sample1_read1.fastq.gz /path/to/sample1_read2.fastq.gz
samplename2 /path/to/sample2_read1.fastq.gz /path/to/sample2_read2.fastq.gz
samplename3 /path/to/sample3_read1.fastq.gz /path/to/sample3_read2.fastq.gz
...
inputs.json
{
"multibird.inputSamplesFile": "/path/to/samples.tsv",
"multibird.kraken2_db": "/path/to/k2_standard_20240904.tar.gz",
"multibird.checkm2_db": "/path/to/checkm2_database.tar.gz"
}
After preparing samples.tsv and inputs.json file, you can run multiple samples at once.
# multiple samples
miniwdl run ~/C-BIRD-2.0.0/workflows/wf_multibird.wdl -i inputs.json
The results generated by this pipeline should not be used as the sole basis for any clinical decision-making. Users are responsible for ensuring that the pipeline is used in compliance with all applicable laws, regulations, and guidelines. The authors and contributors of this pipeline do not assume any liability for any direct, indirect, incidental, or consequential damages arising from the use of the pipeline or the information generated by it. Additionally, please note that genotypic results obtained from this pipeline may not always correlate with phenotypic resistance profiles. It is essential to confirm any findings with appropriate phenotypic testing and clinical correlation.
C-BIRD includes code traces of Theiagen's Public Health Bacterial Genomics workflows. If you need a universal and more sophisticated pipeline, please check Theiagen's TheiaProk workflow.