PCJ-blast is small piece of software which allows to run sequence alignment in parallel in highly scalable manner. PCJ-blast reads input sequence and compares it with the reference database using NCBI-BLAST. Due to the dynamic load balancing PCJ-blast is couple of times faster than solutions based on the static partitioning of input data and reference database. Moreover PCJ-blast can be run efficiently without partitioning reference database which significantly simplifies installation and usage. The PCJ-blast allows to run analysis on different hardware, starting from workstation, thorugh Hadoop clusters up to large supercomputers with thousands of cores. The observed speedup is almost linear which can reduce analysis time from weeks to single hours.
PCJ-blast requires NCBI-BLAST installed and PCJ library. To obtain the library visit PCJ Homepage or GitHub repository. The NCBI-BLAST can be obtained form NCBI repository.
java <JVM_PARAMS> -jar PCJ-blast.jar <BLAST_PARAMS>
There are some parameters for PCJ-blast that can be used as JVM parameters (-D<parameter>=<value>
):
nodes=<path>
- path to nodes file with description of nodes (and threads) to use. It is necessary to have at least 2 lines in the file (first for dispatcher, next for processors). Default: nodes.txtinput=<path>
- path to FASTA input file. Default: blast-test.fastaoutput=<path>
- path to output directory. Default: .blast=<path>
- path to BLAST executable file. Default: blastnblastDb=<path>
- path to BLAST database file. Default: nt. Can be overriden by BLAST -db parameterhdfsConf=<path>[:<path>...]
- paths for HDFS configurations (separated by path separator character, i.e. colon (:) for Linux). Default: nonesequenceCount=<int>
- number of sequences in one block to submit to processors. Default: 1blastThreads=<int>
- number of BLAST threads. Default: 1. Can be overriden by BLAST -num_threads parameter
If BLAST -outfmt
parameter is not set, the PCJ-blast will process it using its output processor.
To compile PCJ-blast the Gradle Build Tool is required. It will download all necessary dependencies.
The gradle wrapper is available as gradlew
(or gradlew.bat
for Microsoft Windows systems) executable file.
To compile and build jar file with PCJ-blast just execute ./gradlew build
command - the output will be stored in build/dist package as PCJ-blast.jar
file and all dependencies in build/dist/libs directory;
To create fat jar with PCJ-blast and all its dependencies as one file just execute ./gradlew makeFatJar
command - the output will be stored in build/dist as PCJ-blast-fatjar.jar
file.
The usage should be acknowledged by reference to the papers:
- Marek Nowicki, Davit Bzhalava, and Piotr Bała. "Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library." Journal of Computational Biology (2018).
- Marek Nowicki, Davit Bzhalava, and Piotr Bała. "Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented using PCJ Library." International Conference on Algorithms and Architectures for Parallel Processing. Springer, Cham, 2017, p. 503-512.