Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 5 revisions

Biopiece: oligo_freq

Description

Use [oligo_freq] if you want to determine the frequencies of subsequences or oligo of a sequence - or of all sequence in the stream. This is useful if you e.g. want to determine the di-nucleotide frequency or a codon usage frequence table.

Usage

... | oligo_freq [options]

Options

[-?         | --help]               #  Print full usage description.
[-w <uint>  | --word_size=<uint>]   #  Size of oligos                 -  Default=7
[-a         | --all]                #  Accumulate oligos for all sequences in stream.
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file    -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file    -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entries in the file test.fna:

>test1
AAATG
>test2
TGAAA

To read the sequence use [read_fasta] using the -w switch to chose a word size of 3:

read_fasta -i test.fna | oligo_freq -w 3  

OLIGO: AAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: AAT
COUNT: 1
FREQ: 0.3333
---
OLIGO: ATG
COUNT: 1
FREQ: 0.3333
---
SEQ: AAATG
SEQ_NAME: test1
SEQ_LEN: 5
---
OLIGO: AAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: GAA
COUNT: 1
FREQ: 0.3333
---
OLIGO: TGA
COUNT: 1
FREQ: 0.3333
---
SEQ: TGAAA
SEQ_NAME: test2
SEQ_LEN: 5
---

The result is an oligo frequency of the oligoes found in each sequence. To get a total frequency instead, use the -a switch:

read_fasta -i test.fna | oligo_freq -w 3 -a

SEQ: AAATG
SEQ_NAME: test1
SEQ_LEN: 5
---
SEQ: TGAAA
SEQ_NAME: test2
SEQ_LEN: 5
---
OLIGO: AAA
COUNT: 2
FREQ: 0.3333
---
OLIGO: AAT
COUNT: 1
FREQ: 0.1667
---
OLIGO: ATG
COUNT: 1
FREQ: 0.1667
---
OLIGO: GAA
COUNT: 1
FREQ: 0.1667
---
OLIGO: TGA
COUNT: 1
FREQ: 0.1667
---

Or to get a nice table, first [grab]:

read_fasta -i test.fna | oligo_freq -w 3 -a | grab -p OLIGO -K | write_tab -cx

#OLIGO  COUNT   FREQ
AAA     2       0.3333
AAT     1       0.1667
ATG     1       0.1667
GAA     1       0.1667
TGA     1       0.1667

See also

[read_fasta]

[grab]

[write_tab]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

[oligo_freq] is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally