read_kiss

Biopiece: read_kiss

Description

KISS .

The KISS format (Keep it Simple Stupid) is a text based data format for describing generic feature information in a simple format with one feature per line in 12 tab-separated columns:

S_ID: Subject ID - e.g. chr12.
S_BEG: Begin position of a feature relating to the subject sequence. 0-based.
S_END: End position of a feature relating to the subject sequence.
Q_ID: Query ID - e.g. a Solexa read ID e.g. a3_2VCOjxwXsN1
SCORE: A float that can describe e.g. a BLAT score.
STRAND: Denotes which strand a feature relates to. + or -.
HITS: Number of times a feature is found in the subject sequence.
ALIGN: Comma-separated list of alignment descriptors for mismatches, insertions, and deletions *).
BLOCK_COUNT: Number of blocks in a feature (i.e. exons).
BLOCK_BEGS: Comma-separated list of block begin positions. Offset is S_BEG.
BLOCK_LENS: Comma-separated list of block lengths.
BLOCK_TYPE: Comma-separated list of block types (0=Gap,1=Non-gap,2=CDS,3=5'UTR,4=3'UTR).

Values in fields 4-12 are optional and empty fields must contain a '.'.

*) Alignment descriptors:

mismatch: (offset:S-base>Q-base) - e.g. 0:C>T,13:G>C
insertion: (offset:->Q-base) - e.g. 8:->G,18:->A
deletions: (offset:S-base>-) - e.g. 5:A>-,16:T>-

The offset position is based on S_BEG and do not change with insertions or deletions. Alignment descriptors are based on the + strand.

Descriptors should be sorted by offset postion.

Usage

read_kiss [options] -i <KISS file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read all KISS entries from a file:

read_kiss -i test.kiss

To read in only 10 records from a KISS file:

read_kiss -n 10 -i test.kiss

To read all KISS entries from multiple files:

read_kiss -i test1.kiss,test2.kiss

To read KISS entries from multiple files using a glob expression:

read_kiss -i '*.kiss'

Author

[email protected]

October 2009

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_kiss is part of the Biopieces framework.

http://www.biopieces.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly