-
Notifications
You must be signed in to change notification settings - Fork 23
read_soft
Martin Asser Hansen edited this page Oct 2, 2015
·
8 revisions
read_soft read in NCBI's SOFT format containing deep sequencing data in another NCBI style crap format. Parsing SOFT files with read_soft results in the following type of Biopiece records:
SEQ: TGCTTGGACTACATATGGTTGAGGGTTGTA
SAMPLE_TITLE: 23-29 nucleotide RNAs from Drosophila melanogaster ovaries
SEQ_NAME: GPL4738_GSM154618_1_1250
---
- SEQ is the sequence.
- SAMPLE_TITLE is the title of the experiment.
- SEQ_NAME is composed of the following four tokens joined by
_
:
- Platform (GPL) which indicates what type of instrument was used.
- Sample ID (GSM) indicating what experiment this sequence belong to.
- The number (1-based) of the sequence in this experiment.
- The clone count or read count of the sequence.
For more about the SOFT format:
http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html
read_soft [options] -i <SOFT file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Comma separated list of files or glob expression to read.
[-s <list> | --samples=<list>] # Comma separated list of samples to get - Default=all.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-I <file!> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
To read all SOFT entries from a file:
read_soft -i test.soft
To read in only 10 records from a SOFT file:
read_soft -n 10 -i test.soft
To read all SOFT entries from multiple files:
read_soft -i test1.soft,test2.soft
To read SOFT entries from multiple files using a glob expression:
read_fasta -i '*.soft'
To read only data from a single sample use the -s
switch:
read_soft -i test1.soft -s GSM123123
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_soft is part of the Biopieces framework.