Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 8 revisions

Biopiece: read_soft

Description

read_soft read in NCBI's SOFT format containing deep sequencing data in another NCBI style crap format. Parsing SOFT files with read_soft results in the following type of Biopiece records:

SEQ: TGCTTGGACTACATATGGTTGAGGGTTGTA
SAMPLE_TITLE: 23-29 nucleotide RNAs from Drosophila melanogaster ovaries
SEQ_NAME: GPL4738_GSM154618_1_1250
---
  • SEQ is the sequence.
  • SAMPLE_TITLE is the title of the experiment.
  • SEQ_NAME is composed of the following four tokens joined by _:
  1. Platform (GPL) which indicates what type of instrument was used.
  2. Sample ID (GSM) indicating what experiment this sequence belong to.
  3. The number (1-based) of the sequence in this experiment.
  4. The clone count or read count of the sequence.

For more about the SOFT format:

http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html

Usage

read_soft [options] -i <SOFT file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-s <list>   | --samples=<list>]     #  Comma separated list of samples to get  -  Default=all.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read all SOFT entries from a file:

read_soft -i test.soft

To read in only 10 records from a SOFT file:

read_soft -n 10 -i test.soft

To read all SOFT entries from multiple files:

read_soft -i test1.soft,test2.soft

To read SOFT entries from multiple files using a glob expression:

read_fasta -i '*.soft'

To read only data from a single sample use the -s switch:

read_soft -i test1.soft -s GSM123123

See also

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_soft is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally