-
Notifications
You must be signed in to change notification settings - Fork 23
create_weight_matrix
create_weight_matrix calculates the frequency of all residues per column in aligned sequences from the stream - either as exact residue counts or percentages.
... | create_weight_matrix [options]
[-? | --help] # Print full usage description.
[-p | --percent] # Output the result in percent - Default=absolute
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following alignment in the file aln.fna
in FASTA format:
>test5
---TAACAGGCACT
>test2
-----GAATCGACT
>test1
--CTAGCTTCGACT
>test3
ACGAAACTAGCATC
>test4
----AGCATCGACT
To create a weight matrix from the above alignment, read it in with read_fasta and pipe the stream through create_weight_matrix:
read_fasta -i aln.fna | create_weight_matrix
The resulting five records will look the first one below, which is not really understandable:
V13: 0
V11: 0
V7: 0
V4: 2
V3: 3
V9: 0
V0: -
V2: 4
V8: 0
V12: 0
V5: 1
V10: 0
V1: 4
V6: 0
V14: 0
---
To make sense pipe the result through write_tab like this:
read_fasta -i aln.fna | create_weight_matrix | write_tab -x
- 4 4 3 2 1 0 0 0 0 0 0 0 0 0
A 1 0 0 1 4 2 1 3 1 0 0 5 0 0
C 0 1 1 0 0 0 4 0 0 3 2 0 4 1
G 0 0 1 0 0 3 0 0 1 2 3 0 0 0
T 0 0 0 2 0 0 0 2 3 0 0 0 1 4
The above weight matrix shows the frequencies of all residue types (1st column) found at all positions throughout the alignment.
To obtain the percentwise frequencies use the -p
switch to create_weight_matrix:
read_fasta -i aln.fna | create_weight_matrix -p | write_tab -x
- 80 80 60 40 20 0 0 0 0 0 0 0 0 0
A 20 0 0 20 80 40 20 60 20 0 0 100 0 0
C 0 20 20 0 0 0 80 0 0 60 40 0 80 20
G 0 0 20 0 0 60 0 0 20 40 60 0 0 0
T 0 0 0 40 0 0 0 40 60 0 0 0 20 80
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
create_weight_matrix is part of the Biopieces framework.