Skip to content

create_weight_matrix

Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: create_weight_matrix

Description

create_weight_matrix calculates the frequency of all residues per column in aligned sequences from the stream - either as exact residue counts or percentages.

Usage

... | create_weight_matrix [options]

Options

[-?         | --help]               #  Print full usage description.
[-p         | --percent]            #  Output the result in percent  -  Default=absolute
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file   -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file   -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

Consider the following alignment in the file aln.fna in FASTA format:

>test5
---TAACAGGCACT
>test2
-----GAATCGACT
>test1
--CTAGCTTCGACT
>test3
ACGAAACTAGCATC
>test4
----AGCATCGACT

To create a weight matrix from the above alignment, read it in with read_fasta and pipe the stream through create_weight_matrix:

read_fasta -i aln.fna | create_weight_matrix

The resulting five records will look the first one below, which is not really understandable:

V13: 0
V11: 0
V7: 0
V4: 2
V3: 3
V9: 0
V0: -
V2: 4
V8: 0
V12: 0
V5: 1
V10: 0
V1: 4
V6: 0
V14: 0
---

To make sense pipe the result through write_tab like this:

read_fasta -i aln.fna | create_weight_matrix | write_tab -x

-   4   4   3   2   1   0   0   0   0   0   0   0   0   0
A   1   0   0   1   4   2   1   3   1   0   0   5   0   0
C   0   1   1   0   0   0   4   0   0   3   2   0   4   1
G   0   0   1   0   0   3   0   0   1   2   3   0   0   0
T   0   0   0   2   0   0   0   2   3   0   0   0   1   4

The above weight matrix shows the frequencies of all residue types (1st column) found at all positions throughout the alignment.

To obtain the percentwise frequencies use the -p switch to create_weight_matrix:

read_fasta -i aln.fna | create_weight_matrix -p | write_tab -x

-    80   80   60   40   20   0    0    0    0    0    0    0    0    0
A    20   0    0    20   80   40   20   60   20   0    0    100  0    0
C    0    20   20   0    0    0    80   0    0    60   40   0    80   20
G    0    0    20   0    0    60   0    0    20   40   60   0    0    0
T    0    0    0    40   0    0    0    40   60   0    0    0    20   80

See also

read_fasta

write_tab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

create_weight_matrix is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally