CAMITAX: Taxon labels for microbial genomes

The CAMITAX taxonomic assignment workflow. CAMITAX assigns one NCBI Taxonomy ID (taxID) to an input genome G by combining genome distance-, 16S rRNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. (A) Genome distance-based assignment. CAMITAX uses Mash to estimate the average nucleotide identity (ANI) between G and more than a hundred thousand microbial genomes in RefSeq, and assigns the lowest common ancestor (LCA) of genomes showing >95% ANI, which was found to be a clear species boundary. (B) 16S rRNA gene-based assignment. CAMITAX uses Dada2 to label G's 16S rRNA gene sequences using the naïve Bayesian classifier method to assign taxonomy across multiple ranks (down to genus level), and exact sequence matching for species-level assignments, against the SILVA or RDP database. (C) Gene homology-based assignments. CAMITAX uses Centrifuge and Kaiju to perform gene homology searches against nucleotide and amino acid sequences in NCBI's nr and nt (or proGenomes' genes and proteins datasets), respectively. CAMITAX determines the interval-union LCA (iuLCA) of gene-level assignments and places G on the lowest taxonomic node with at least 50% coverage. (D) Phylogenetic placement. CAMITAX uses Pplacer to place G onto a fixed reference tree, as implemented in CheckM, and estimates genome completeness and contamination using lineage-specific marker genes. (E) Classification algorithm. CAMITAX considers the lowest consistent assignment as the longest unambiguous root-to-node path in the taxonomic tree spanned by the five taxIDs derived in (A)–(D), i.e. it retains the most specific, yet consistent taxonomic label among all tools.

Requirements

All you need is Nextflow and Docker (or Singularity). This is the recommended way to run CAMITAX and (by default) CAMITAX requires 8 CPU cores and 24 GB of memory.

Plan B: You may run CAMITAX without software containers. However, this is not recommended and you have to install all software dependencies by yourself.

If you need any help or further guidance: Please get in touch!

User Guide

Installation

CAMITAX relies on multiple reference databases (which we do not bundle by default, due to their sheer size). You can either build them from scratch or simply use the latest of our "official" releases. To do so, please run:

nextflow pull CAMI-challenge/CAMITAX
nextflow run CAMI-challenge/CAMITAX/init.nf --db /path/to/db/folder

Warning: This will download ~30 GB of data, expect this to run a while! /path/to/db/folder should have >100 GB of available disk space. Note that you have to do this only once; specify the location in all future CAMITAX runs.

Warning: To foster reproducibility, we strongly recommend that you use our "official" releases and we will continue to provide stable and versioned updates in the future.

Input

CAMITAX expects all input genomes in (genomic/nucleotide multi-)FASTA format. If your input genomes are in the folder input/ with file extension .fasta, please run:

nextflow run CAMI-challenge/CAMITAX -profile docker --db /path/to/db/folder --i input --x fasta

If you want to use Singularity instead of Docker (without sudo), please replace docker with singularity as profile.

Output

CAMITAX outputs a tab-seperated file camitax.tsv containing the individual taxon assignments in the data folder.

Warning: While CAMITAX is built around computational reproducibility, results might sometimes be slightly different from run to run (but of comparable quality) because software used within (e.g. CheckM) are non-deterministic.

Again, if you need any help or further guidance: Please get in touch!

Citation

Bremges, Fritz & McHardy (2020). CAMITAX: Taxon labels for microbial genomes. GigaScience, 9, 1:1–7. doi:10.1093/gigascience/giz154
Sczyrba, Hofmann, Belmann, et al. (2017). Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nature Methods, 14, 11:1063–1071. doi:10.1038/nmeth.4458

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
bin		bin
db		db
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checkm.nf		checkm.nf
init.nf		init.nf
main.nf		main.nf
nextflow.config		nextflow.config
requirements.txt		requirements.txt
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAMITAX: Taxon labels for microbial genomes

Requirements

User Guide

Installation

Input

Output

Citation

About

Releases

Packages

Contributors 2

Languages

License

CAMI-challenge/CAMITAX

Folders and files

Latest commit

History

Repository files navigation

CAMITAX: Taxon labels for microbial genomes

Requirements

User Guide

Installation

Input

Output

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages