Skip to content

COMHIS/text-reuse-blast-custom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

blast-custom-text-reuse

Custom version of BLAST for textreuse detection.

These are modifications to BLAST source code originally made by Aleksi Vesanto (https://github.com/avjves) for the purpose of text reuse detection. They have been ported over to a later version of BLAST, which was made necessary by later GCC compilers refusing to compile the original modified source.

Installation

The sourcecode has to be compiled so it will run in which ever environment you are running locally / on a cluster:

cd ncbi-blast-2.13.0+-src_modified/c++  ## Now inside the BLAST source directory
./configure       ## Run configure file
make              ## Compile the program

Binary files are now in in ncbi-blast-2.13.0+-src_modified/c++/ReleaseMT/bin

The software expects BLAST binaries to be in PATH. This can be done by:

export PATH="/path/to/ncbi-blast-2.13.0+-src_modified/c++/ReleaseMT/bin:$PATH"

This should most likely be added to your .bashrc file, so it remains in different sessions as well.

Modifications

Compared to standard BLAST, the following two modifications have been made:

  • /src/algo/blast/core/blast_stat.c has a line (line 270 in 2.13.0 source) changed in BLOSUM62_VALUES_MAX to match the gap_open and gap_extend values needed for ECCO. The new line is as follows: {3, 11, (double) INT2_MAX, 0.201, 0.012, 0.061, 3.3, -58, 0.740802, 140.417000, 141.882000},
  • The BLOSUM scoring matrix in /src/util/tables/sm_blosum62.c has been rewritten to be uniform.

The modified files can be found in /modifications

Other versions

GPU BLAST might be one solution to the speed problems with BLAST and ECCO. There are a few implementations, but none have been updated in a while:

Articles:

About

Custom version of BLAST for textreuse detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published