-
Notifications
You must be signed in to change notification settings - Fork 23
blat_seq
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
blat_seq uses the UCSC Genome Browser's BLAT to search all sequences in the stream for matches in a specified database . The sequence type of the query sequences is guessed automagically.
Resulting records look like this:
S_BEGS: 83755,
Q_ID: 5_gECOjxwXsN1/1
S_LEN: 159109
Q_LEN: 35
REPMATCHES: 0
MATCHES: 34
S_ID: M1_c17
NCOUNT: 1
SPAN: 34
Q_END: 34
STRAND: -
SCORE: 34
BLOCK_LENS: 35,
REC_TYPE: PSL
QNUMINSERT: 0
Q_BEG: 0
S_BEG: 83755
MISMATCHES: 0
SBASEINSERT: 0
Q_BEGS: 0,
SNUMINSERT: 0
BLOCK_COUNT: 1
QBASEINSERT: 0
S_END: 83789
---
BLAT must be installed in order for blat_seq to work.
Read more here:
http://genome.ucsc.edu/FAQ/FAQblat.html#blat3
... | blat_seq [options] -d <database> | -g <genome>
[-? | --help] # Print full usage description.
[-d <file> | --database=<file>] # BLAT against FASTA file.
[-g <genome> | --genome=<genome>] # BLAT against genome.
[-f | --fast_map] # Fast DNA/DNA mapping with high %ID and without introns.
[-c | --ooc] # Use overused tile file (faster, but less sensitive).
[-i <uint> | --intron_max=<uint>] # Maximum intron size - Default=750000
[-t <uint> | --tile_size=<uint>] # Size of match that triggers an alignment - Default=11
[-s <uint> | --step_size=<uint>] # Spacing between tiles - Default=11
[-m <uint> | --min_identity=<uint>] # Minimum sequence identity in percent - Default=90
[-M <uint> | --min_score=<uint>] # Minimum score - Default=0
[-N | --allow_N_blocks] # Allow alignment extension through N blocks.
[-o <uint> | --one_off=<uint>] # Allows one mismatch in tile - Default=0
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To BLAT sequences against a FASTA file do:
read_fasta -i query_sequences.fna | blat_seq -d subject_sequences.fna
To BLAT sequences against a genome previously formatted with format_genome, do:
read_fasta -i query_sequences.fna | blat_seq -g <genome>
Use write_psl to output data in BLATs native format.
To list avalible genomes use list_genomes.
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
blat_seq is part of the Biopieces framework.