Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: blast_seq

Description

blast_seq uses NCBI's BLAST to BLAST all sequences in the stream against a specified database created either with create_blast_index. The sequence type of the query sequence is guessed automagically and the sequences type of the BLAST database is determined based on the suffix (.n* equals nucleotide while .p* equals protein). Based on the query sequence type and the index sequence type the BLAST program is guessed, but can be overridden using the -p switch (only necessary for tblastx).

blast_seq emit records like the below example:

STRAND: -
Q_ID: Eserichia_coli_plasmid_R6K
ALIGN_LEN: 14
S_ID: Eserichia_coli_plasmid_R6K
REC_TYPE: BLAST
IDENT: 100.00
E_VAL: 5.0
S_BEG: 37226
Q_BEG: 6939
MISMATCHES: 0
BIT_SCORE: 28.2
Q_END: 6952
GAPS: 0
S_END: 37239
---

NCBI BLAST/formatdb must be installed in order for blast_seq to work.

Read more here:

ftp://ftp.ncbi.nih.gov/blast/

Usage

... | blast_seq [options] -d <database>

or

... | blast_seq [options] -g <genome>

Options

[-?           | --help]                    #  Print full usage description.
[-d <file!>   | --database=<file!>]        #  Path to database.
[-g <genome!> | --genome=<genome!>]        #  Choose genome instead of database.
[-p <string>  | --program=<string>         #  blastn|blastp|tblastn|blastx|tblastx     -  Default=guessed!
[-e <float>   | --e_val=<float>]           #  Expectation value                        -  Default=10
[-f <string>  | --filter=<string>]         #  Filter low complexity sequence (yes|no)  -  Default=no
[-c <uint>    | --cpus=<uint>]             #  Number of CPUs to use                    -  Default=1
[-m           | --megablast]               #  Enable megablast.
[-G           | --no_gaps]                 #  Disable gapped BLAST.
[-E <uint>    | --extend_threshold=<uint>] #  Threshold for extending hits             -  Default=0
[-W <uint>    | --word_size=<uint>]        #  Use words of the specified size          -  Default=0
[-s           | --single_hit]              #  Single-hit mode (vs. multiple-hit mode)
[-I <file!>   | --stream_in=<file!>]       #  Read input from stream file              -  Default=STDIN
[-O <file>    | --stream_out=<file>]       #  Write output to stream file              -  Default=STDOUT
[-v           | --verbose]                 #  Verbose output.

Examples

To BLAST sequence in a FASTA file against a BLAST index previously created with create_blast_index, do:

read_fasta -i <FASTA file(s)> | blast_seq -d ~/my_blast_dir/my_blast_index

To BLAST sequences against a genome previously formatted with format_genome, do:

read_fasta -i <FASTA file(s)> | blast_seq -g <genome>

To list avalible genomes use list_genomes.

See also

read_fasta

create_blast_index

format_genome

list_genomes

write_blast

read_blast_tab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

blast_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally