BatchBlaster is a bioinformatics pipeline that employs BLAST (Basic Local Alignment Search Tool), an essential algorithm for comparing primary biological sequence information, to perform efficient and high-throughput taxonomic identification searches.
BatchBlaster is built using the Nextflow workflow management system, ensuring portability and reproducibility across multiple platforms. The pipeline is primarily designed for use on High Performance Computing (HPC) clusters, including the capability to submit tasks to the SLURM job scheduling system.
The name 'BatchBlaster' originates from its robust capability to submit and process BLAST tasks in batches, optimizing for speed and performance in large-scale sequence analysis tasks.
- High throughput BLAST search
- Scalable and reproducible analysis with Nextflow
- Multi-platform compatibility (Linux, MacOS, Windows)
-
Install Nextflow
curl -s https://get.nextflow.io | bash
-
Run BatchBlaster
nextflow run vmikk/BatchBlaster -r main --input 'path/to/your/input' ...
--input
: Path to the input file containing the sequences (Required)--outdir
: Path to the output directory (Default:./results
)--blast_taxdb
: Path to the BLAST database- ...
The results will be saved in the specified output directory (./results
, by default). Output includes:
- BLAST search results in tabular format (
m8
a.k.a.-outfmt 6
) - A table with best BLAST hits reshaped into wide format
- Summary report
- Nextflow (>=23.04.0)
- Singularity or Docker
- Integration of additional sequence analysis methods (e.g., MMSeqs2, SINTAX, etc.)
- Inclusion of Lowest Common Ancestor (LCA) estimation
- Implementation of domain-specific threshold filtering for taxonomic annotation (e.g., for fungal sequences)
- Adding advanced machine learning algorithms for more accurate taxonomic classification (e.g., deep learning models that have been trained on the UNITE database)
- Implementation of a hybrid annotation approach (e.g., integration of classification results from various methods to enhance accuracy and reliability of taxonomic identification)
We are excited to share these enhancements in our forthcoming updates, so stay tuned!
This project is licensed under the terms of the Apache-2.0 license.
Please feel free to submit issues and pull requests, your contributions are welcome!