-
Notifications
You must be signed in to change notification settings - Fork 23
blast_seq
blast_seq uses NCBI's BLAST to BLAST all sequences in the stream against a
specified database created either with create_blast_index.
The sequence type of the query sequence is guessed automagically and the
sequences type of the BLAST database is determined based on the suffix
(.n*
equals nucleotide while .p*
equals protein). Based on the query
sequence type and the index sequence type the BLAST program is guessed, but can
be overridden using the -p
switch (only necessary for tblastx).
blast_seq emit records like the below example:
STRAND: -
Q_ID: Eserichia_coli_plasmid_R6K
ALIGN_LEN: 14
S_ID: Eserichia_coli_plasmid_R6K
REC_TYPE: BLAST
IDENT: 100.00
E_VAL: 5.0
S_BEG: 37226
Q_BEG: 6939
MISMATCHES: 0
BIT_SCORE: 28.2
Q_END: 6952
GAPS: 0
S_END: 37239
---
NCBI BLAST/formatdb must be installed in order for blast_seq to work.
Read more here:
... | blast_seq [options] -d <database>
or
... | blast_seq [options] -g <genome>
[-? | --help] # Print full usage description.
[-d <file!> | --database=<file!>] # Path to database.
[-g <genome!> | --genome=<genome!>] # Choose genome instead of database.
[-p <string> | --program=<string> # blastn|blastp|tblastn|blastx|tblastx - Default=guessed!
[-e <float> | --e_val=<float>] # Expectation value - Default=10
[-f <string> | --filter=<string>] # Filter low complexity sequence (yes|no) - Default=no
[-c <uint> | --cpus=<uint>] # Number of CPUs to use - Default=1
[-m | --megablast] # Enable megablast.
[-G | --no_gaps] # Disable gapped BLAST.
[-E <uint> | --extend_threshold=<uint>] # Threshold for extending hits - Default=0
[-W <uint> | --word_size=<uint>] # Use words of the specified size - Default=0
[-s | --single_hit] # Single-hit mode (vs. multiple-hit mode)
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To BLAST sequence in a FASTA file against a BLAST index previously created with create_blast_index, do:
read_fasta -i <FASTA file(s)> | blast_seq -d ~/my_blast_dir/my_blast_index
To BLAST sequences against a genome previously formatted with format_genome, do:
read_fasta -i <FASTA file(s)> | blast_seq -g <genome>
To list avalible genomes use list_genomes.
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
blast_seq is part of the Biopieces framework.