Skip to content

Commit

Permalink
Init
Browse files Browse the repository at this point in the history
  • Loading branch information
markusgumbel committed Jan 26, 2025
1 parent 000ac76 commit dee34ba
Show file tree
Hide file tree
Showing 10 changed files with 276 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.vscode
docs/build
Manifest.toml
16 changes: 16 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name = "GCATConductance"
uuid = "ac29e6a8-09a6-44c6-90e5-cfe97d9aefa1"
authors = ["Markus Gumbel <[email protected]>"]
version = "0.1.0"

[deps]
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
BioSymbols = "3c28c6f8-a34d-59c4-9654-267d177fcfa9"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NamedArrays = "86f7a689-2022-50b4-a561-43c23ac3c673"

[compat]
BioSequences = "3.4.1"
BioSymbols = "5.1.3"
LinearAlgebra = "1.11.0"
NamedArrays = "0.10.3"
5 changes: 5 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[deps]
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
BioSymbols = "3c28c6f8-a34d-59c4-9654-267d177fcfa9"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GCATConductance = "ac29e6a8-09a6-44c6-90e5-cfe97d9aefa3"
4 changes: 4 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
using Pkg; Pkg.activate("./docs")

using Documenter, GCATConductance
makedocs(modules = [GCATConductance], doctest = true, sitename = "GCAT-Conductance", remotes = nothing)
84 changes: 84 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
```@meta
# Information for Documenter
CurrentModule = GCATConductance
```

```@contents
Pages = ["index.md"]
```

# Introduction

## Definitions

Let $B$ be an alphabet and $n$ the alphabet size. Let $l$ denote the lengths of words. Let $S\subset B^l$ a set of tuples. For instance, $\mathcal{B} = \{A, T, C, G\}$ denotes the bases of the genetic code.

## Weight matrix

A symmetric $n \times n$-matrix where each row and each column represents a (unique) letter from the alphabet $B$ and the diagonal elements are 0 is called a transition matrix.
$W_{a,b}$ refers to the element in $W$ in row $a$ and column $b$.

Let us create some results. The following data frame contains four codons.
Column `t` lists the true codons whereas `e` lists the estimated codons.

```@example rt
using GCATConductance, BioSequences, BioSymbols
W = ones_weights([DNA_A, DNA_T], 2)
W[1]
```
```@example rt
W[2]
```

## Conductance and robustness (`set_conductance`)

The summed weights of all edges in a set $S$ is defined as:

``
E(S) =
\underset{\text{ every tuple }}{\underbrace{\sum_{t\in S}}}
\underset{\text{ every position }}{\underbrace{\sum_{i=1}^l}}
\underset{\text{ other letters }}{\underbrace{\sum_{b \in B} W_{t_i, b}}}
``

```@example rt
es = sum_all_wedges([dna"AA", dna"AT"], W, 2)
```

The summed weights of all __internal__ edges in a set $S$ is defined as:

``I(S) = \sum_{\text{all pairs with edges } (t,u) \in S \times S} \sum_{i=1}^l W_{t_i, u_i}``

```@example rt
is = sum_intern_wedges([dna"AA", dna"AT"], W)
```

The (set) conductance is ratio of the number of outgoing edges to all edges of ``S``.

``\varphi(S)=\frac{E(S)-I(S)}{E(S)}``

```@example rt
sc = set_conductance([dna"AA", dna"AT"], W, 2)
```

See [`GCATConductance.set_conductance`](@ref) for details.

The (set) robustness is ratio of the number of internal edges to all edges of ``S``.

``\rho(S)=\frac{I(S)}{E(S)} = 1 - \varphi(S)``

## Conductance for a partition (`part_conductance`)

```@example rt
p = [1, 1, 2, 2] # a partition for the tuples below
#p = [1, 1, 1, 2] # a partition for the tuples below
pc = part_conductance([dna"AA", dna"AT", dna"TA", dna"TT"], p, W, 2)
```

# API

```@autodocs
Order = [:module, :type, :function]
Modules = [GCATConductance]
```
5 changes: 5 additions & 0 deletions src/GCATConductance.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
module GCATConductance

include("conductance.jl")

end
139 changes: 139 additions & 0 deletions src/conductance.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Copyright 2025 by the authors.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

using NamedArrays, BioSequences, BioSymbols, LinearAlgebra

export ones_weights, sum_all_wedges, sum_intern_wedges
export set_conductance, part_conductance

# TODO:
# wedges: make independent of numbers, use any symbol.

"""
ones_weights(alphabet::Vector{<:BioSymbol}, t_size::Int)::Vector{<:Matrix}
Weight matrices with all values set to 1. Returns a list of size `tsize`
where each entry is a matrix of dimension n × n and all values except
for the diagonal are set to 1.
Functions like `setconductance` or `partconductance` require matrices
with transition values. If the weights of the conductance graph
should be all set to 1, this function can be used.
Arguments
- `alphabet`: Alphabet as a vector of symbols, e.g. A, T, C, G.
- `tsize`: Tuple size
"""
function ones_weights(alphabet::Vector{<:BioSymbol}, t_size::Int)::Vector{<:NamedArray}
n = length(alphabet)
W = ones(n, n) - I(n)
W = NamedArray(W, (alphabet, alphabet))
return [W for _ in 1:t_size]
end

"""
sum_all_wedges(tuples::Vector{<:BioSequence}, W::Vector, n::Int)::Number
Calculate sum of weighted edges for a set of tuples.
This is a helper function used in setconductance and partconductance.
Arguments
- `tuples`: List of tuples represented by a vector of strings.
- `W`: List of transition weight matrices. The size of the list
must be the tuple size. Each list entry must have matrices of dimensions
n × n (alphabet sizes).
- `n::Int`: Alphabet size, e.g. |{A, T, C, G}| = 4.
"""
function sum_all_wedges(tuples::Vector{<:BioSequence}, W::Vector, n::Int)::Number
l = length(W) # Tuple length
sum([sum([W[i][tuple[i], j] for j in 1:n]) for i in 1:l for tuple in tuples])
end

"""
sum_intern_wedges(tuples::Vector{<:BioSequence}, W::Vector{<:AbstractMatrix{<:Number}})::Number
Calculate sum of internal weighted edges for a set of tuples.
This is a helper function used in setconductance and partconductance.
Arguments
- `tuples`: List of tuples represented by a vector of strings.
- `W`: List of transition weight matrices. The size of the list
must be the tuple size. Each list entry must have matrices of dimensions
n × n (alphabet sizes).
"""
function sum_intern_wedges(tuples::Vector{<:BioSequence}, W::Vector{<:AbstractMatrix{<:Number}})::Number
if length(tuples) <= 1 # one node has no internal edges
return 0
else # more than one node
S = to_matrix(tuples)
l = length(W) # Tuple size

ro = eachrow(S) # get rows

k = [(a, b) for a in ro for b in ro] # all combinations

# idx contains list of tuple-pairs which differ in one letter:
idx = [sum([a[i] != b[i] for i in 1:l]) == 1 for (a, b) in k]
pairs = k[idx]
if isempty(pairs) # cannot happen
return 0 # nothing to do.
end

return sum([W[i][p1[i], p2[i]] for (p1, p2) in pairs for i in 1:l])
end
end

"""
set_conductance(tuples::Vector{<:BioSequence}, W::Vector{<:AbstractMatrix{<:Number}}, n::Int)::Number
Calculate the conductance for a set of tuples.
Arguments
- `tuples`: List of tuples represented by a vector of `BioSymbols`.
- `W`: List of transition weight matrices. The size of the list
must be the tuple size. Each list entry must have matrices of dimensions
n × n (alphabet sizes).
- `n`: Alphabet size, e.g. |{A, T, C, G}| = 4.
"""
function set_conductance(tuples::Vector{<:BioSequence}, W::Vector{<:AbstractMatrix{<:Number}}, n::Int)::Number
i = sum_intern_wedges(tuples, W)
e = sum_all_wedges(tuples, W, n)
return (e - i) / e
end

"""
part_conductance(tuples::Vector{<:BioSequence}, p::Vector, W::Vector{<:AbstractMatrix{<:Number}}, n::Int)::Vector{<:Number}
Calculate the set conductance for a partition, i.e. a vector of set partitions.
Arguments
- `tuples`: List of tuples represented by a vector of strings.
- `p`: Partitions for tuples. The size of `p` must match the size of tuples.
- `W`: List of transition weight matrices. The size of the list
must be the tuple size. Each list entry must have matrices of dimensions
n × n (alphabet sizes).
- `n`: Alphabet size, e.g. |{A, T, C, G}| = 4.
"""
function part_conductance(tuples::Vector{<:BioSequence}, p::Vector, W::Vector{<:AbstractMatrix{<:Number}}, n::Int)::Vector{<:Number}
P = [tuples[p.==i] for i in unique(p)]
return [set_conductance(S, W, n) for S in P]
end

# to base package?
function to_matrix(S)
l = [collect(s) for s in S]
return hcat(l...)
end
5 changes: 5 additions & 0 deletions test/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[deps]
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
BioSymbols = "3c28c6f8-a34d-59c4-9654-267d177fcfa9"
GCATConductance = "ac29e6a8-09a6-44c6-90e5-cfe97d9aefa1"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
13 changes: 13 additions & 0 deletions test/conductance.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# some tests

using GCATConductance, BioSequences, BioSymbols, Test

@testset "Conductance" begin
w = ones_weights([DNA_A, DNA_T], 2)
sc = set_conductance([dna"AA", dna"AT"], w, 2)
@test sc == 0.5

w = ones_weights([DNA_A, DNA_T, DNA_C, DNA_G], 3)
sc = set_conductance([dna"ATG"], w, 4)
@test sc == 1
end
2 changes: 2 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
println("Testing everything...")
include("conductance.jl")

0 comments on commit dee34ba

Please sign in to comment.