BatchBlaster

BatchBlaster is a bioinformatics pipeline that employs BLAST (Basic Local Alignment Search Tool), an essential algorithm for comparing primary biological sequence information, to perform efficient and high-throughput taxonomic identification searches.

BatchBlaster is built using the Nextflow workflow management system, ensuring portability and reproducibility across multiple platforms. The pipeline is primarily designed for use on High Performance Computing (HPC) clusters, including the capability to submit tasks to the SLURM job scheduling system.

The name 'BatchBlaster' originates from its robust capability to submit and process BLAST tasks in batches, optimizing for speed and performance in large-scale sequence analysis tasks.

Features

High throughput BLAST search
Scalable and reproducible analysis with Nextflow
Multi-platform compatibility (Linux, MacOS, Windows)

Quick Start

Install Nextflow
```
curl -s https://get.nextflow.io | bash
```

Run BatchBlaster

nextflow run vmikk/BatchBlaster -r main --input 'path/to/your/input' ...

Parameters

--input : Path to the input file containing the sequences (Required)
--outdir : Path to the output directory (Default: ./results)
--blast_taxdb : Path to the BLAST database
...

Output

The results will be saved in the specified output directory (./results, by default). Output includes:

BLAST search results in tabular format (m8 a.k.a. -outfmt 6)
A table with best BLAST hits reshaped into wide format
Summary report

Dependencies

Nextflow (>=23.04.0)
Singularity or Docker

Future Plans

Integration of additional sequence analysis methods (e.g., MMSeqs2, SINTAX, etc.)
Inclusion of Lowest Common Ancestor (LCA) estimation
Implementation of domain-specific threshold filtering for taxonomic annotation (e.g., for fungal sequences)
Adding advanced machine learning algorithms for more accurate taxonomic classification (e.g., deep learning models that have been trained on the UNITE database)
Implementation of a hybrid annotation approach (e.g., integration of classification results from various methods to enhance accuracy and reliability of taxonomic identification)

We are excited to share these enhancements in our forthcoming updates, so stay tuned!

License

This project is licensed under the terms of the Apache-2.0 license.

Please feel free to submit issues and pull requests, your contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
assets		assets
bin		bin
conf		conf
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BatchBlaster

Features

Quick Start

Parameters

Output

Dependencies

Future Plans

License

About

Releases

Packages

Languages

License

vmikk/BatchBlaster

Folders and files

Latest commit

History

Repository files navigation

BatchBlaster

Features

Quick Start

Parameters

Output

Dependencies

Future Plans

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages