Biopiece: read_psl


read_psl read PSL data from file. The PSL format consists of up to 21 columns:

  1. MATCHES - Number of non-repeat matches.
  2. MISMATCHES - Number of mismatches.
  3. REPMATCHES - Number of repeat matches.
  4. NCOUNT - Number of Ns.
  5. QNUMINSERT - Number of inserts in query.
  6. QBASEINSERT - Number of bases inserted in query.
  7. SNUMINSERT - Number of inserts in subject.
  8. SBASEINSERT - Number of bases inserted in subject.
  9. STRAND - Strand.
  10. Q_ID - Query ID.
  11. Q_LEN - Query length.
  12. Q_BEG - Query begin.
  13. Q_END - Query end.
  14. S_ID - Subject ID.
  15. S_LEN - Subject length.
  16. S_BEG - Subject begin.
  17. S_END - Subject end.
  18. BLOCKCOUNT - Block count.
  19. BLOCKSIZES - Block sizes.
  20. Q_BEGS - Query sequence blocks begins.
  21. S_BEGS - Subject sequence blocks begins.

read_psl adds an additional two keys:

  1. SCORE - Score calculated as in web BLAT results.
  2. SPAN - The span of the hit.
  3. REC_TYPE - Record type.

For more about the PSL format:


read_psl [options] -i <PSL file(s)>


[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.


To read all PSL entries from a file:

read_psl -i test.psl

To read in only 10 records from a PSL file:

read_psl -n 10 -i test.psl

To read all PSL entries from multiple files:

read_psl -i test1.psl,test2.psl

To read PSL entries from multiple files using a glob expression:

read_psl -i '*.psl'

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007


GNU General Public License version 2


read_psl is part of the Biopieces framework.

