Skip to content

hpdcj/PCJ-blast

Repository files navigation

PCJ-blast

PCJ-blast is small piece of software which allows to run sequence alignment in parallel in highly scalable manner. PCJ-blast reads input sequence and compares it with the reference database using NCBI-BLAST. Due to the dynamic load balancing PCJ-blast is couple of times faster than solutions based on the static partitioning of input data and reference database. Moreover PCJ-blast can be run efficiently without partitioning reference database which significantly simplifies installation and usage. The PCJ-blast allows to run analysis on different hardware, starting from workstation, thorugh Hadoop clusters up to large supercomputers with thousands of cores. The observed speedup is almost linear which can reduce analysis time from weeks to single hours.

PCJ-blast requires NCBI-BLAST installed and PCJ library. To obtain the library visit PCJ Homepage or GitHub repository. The NCBI-BLAST can be obtained form NCBI repository.

Usage

java <JVM_PARAMS> -jar PCJ-blast.jar <BLAST_PARAMS>

There are some parameters for PCJ-blast that can be used as JVM parameters (-D<parameter>=<value>):

  • nodes=<path> - path to nodes file with description of nodes (and threads) to use. It is necessary to have at least 2 lines in the file (first for dispatcher, next for processors). Default: nodes.txt
  • input=<path> - path to FASTA input file. Default: blast-test.fasta
  • output=<path> - path to output directory. Default: .
  • blast=<path> - path to BLAST executable file. Default: blastn
  • blastDb=<path> - path to BLAST database file. Default: nt. Can be overriden by BLAST -db parameter
  • hdfsConf=<path>[:<path>...] - paths for HDFS configurations (separated by path separator character, i.e. colon (:) for Linux). Default: none
  • sequenceCount=<int> - number of sequences in one block to submit to processors. Default: 1
  • blastThreads=<int> - number of BLAST threads. Default: 1. Can be overriden by BLAST -num_threads parameter

If BLAST -outfmt parameter is not set, the PCJ-blast will process it using its output processor.

Compilation

To compile PCJ-blast the Gradle Build Tool is required. It will download all necessary dependencies. The gradle wrapper is available as gradlew (or gradlew.bat for Microsoft Windows systems) executable file.

Create jar

To compile and build jar file with PCJ-blast just execute ./gradlew build command - the output will be stored in build/dist package as PCJ-blast.jar file and all dependencies in build/dist/libs directory;

Create fat jar

To create fat jar with PCJ-blast and all its dependencies as one file just execute ./gradlew makeFatJar command - the output will be stored in build/dist as PCJ-blast-fatjar.jar file.

Reference

The usage should be acknowledged by reference to the papers: