Skip to content

Latest commit

 

History

History
32 lines (17 loc) · 1.67 KB

README.md

File metadata and controls

32 lines (17 loc) · 1.67 KB

#cp-blast README File

This is a blast wrapper for identifying novel proteins by BLAST with a seed query and comparison to a database of existing intellecual property.

#Objective

In many cases, existing enzymes will have sub-optimal characteristics for the desired final pathway. Mining existing databases for alternatives is an inexpensive way to generate new options that can be tested in vivo. However, standard command line blast makes it difficult to determine which hits are closely related to existing, potentially patented homologs.

The direct_blast script is designed to search using BLAST and then screen the resulting hits against a secondary database to allow for quick and dirty identification of possible IP problems. Results are outputted as a TSV file.

The shift_blast script is designed to take advantage of bacterial operons, and to identify potential pathway components by BLAST against a genomic DNA database with a seed sequence and then identification of candidate ORFs nearby. Again, it can screen the resulting hits against a secondary database to allow for quick and dirty IP analysis. Results are outputted as a TSV file.

The pairwise_identity script is designed to do repeated pairwise alignments between all sequences in a file, outputting a TSV file which contains the percent identity for each unique combination.

#Requirements