Warning: version 0.2.x is only a beta and not ready for public use.
This project contains the source code for the R version of the Genetic Code Analysis Toolkit (GCAT) project for circular codes. Please refer also to the cammbio homepage for more information.
gcatcirc
is available for R version 4.1 and higher. It requires a Rust 1.57 (or later) compiler installed on your machine. Furthermore the current version of devtools has to be installed on your computer. If you are using Microsoft Windows,
then you need to install Rtools.
A common error is that rust does not have the target installed.
rustup target add [YOUR_TARGET]
Starting a new R console and run:
install.packages("devtools")
devtools::install_github("informatik-mannheim/gcatcirc")
-Download & Install:
- R: https://cran.r-project.org/bin/windows/base/R-4.3.2-win.exe
- RStudio: https://download1.rstudio.org/electron/windows/RStudio-2023.09.1-494.exe
- Rust: https://static.rust-lang.org/rustup/dist/x86_64-pc-windows-msvc/rustup-init.exe
- RTools: https://cran.r-project.org/bin/windows/Rtools/rtools43/files/rtools43-5863-5818.exe
-Reboot
-Open Command Prompt
- rustup target add x86_64-pc-windows-gnu
-Open RStudio
- install.packages("devtools")
- devtools::install_github("informatik-mannheim/gcatbase")
- devtools::install_github("informatik-mannheim/gcatcirc")
-Done
Once you installed gcatcirc
, you may read its help pages. The file ./example/Tutorial.Rmd is a good start for an introduction and a tutorial. This markdown document can be executed. The executed tutorial is available online.
is_code
all_ambiguous_sequences
circular_shift
is_code_circular
is_code_cn_circular
is_code_comma_free
is_code_strong_comma_free
get_exact_k_circular
get_k_graph_circular
c3_codes
c3_code
c3_equiv_class
c3_equivmatrix
c3_in_class
get_representing_graph
get_component_of_representing_graph
plot_representing_graph
plot_component_of_representing_graph
get_cyclic_paths
get_longest_paths
is_code(tuples)
tuples A gcatbase::gcat.code object
A Boolean. If true the code is a code
This function returns true if a set of words is by definition a code. A code X is a set of words so that any sequence has at most one decomposition in words of X
code <- gcatbase::code(c("ACG", "CGG", "AC"))
is_code(code)
all_ambiguous_sequences(tuples)
tuples A gcatbase::gcat.code object
A String vector with all ambiguous sequences.
This function returns all ambiguous sequences which only exist if a set of words X is by definition not a code. Such a sequence can be decomposed in at least two disjoint sets of words of X.
code <- gcatbase::code(c("ACG", "CGG", "AC"))
all_ambiguous_sequences(code)
circular_shift(tuples, sh)
tuples A gcatbase::gcat.code object
sh A integer, the shift index, i.e. the number of shifts.
Boolean value. True if the code is circular.
Under the concept shift is understood a circular permutation, i.e. let X={123, 332}, then c.shift(2) results in {312, 233}
code <- gcatbase::code(c("ACG", "CGG", "AC"))
circular_shift(code, 2)
is_code_circular(tuples)
tuples A gcatbase::gcat.code object
Boolean value. True if the code is circular.
This function checks if a code is circular. Circular codes are sets of
tuples X of different tuple length where
every concatenation of words w in X written on a circle
has only a single decomposition into words from X.
For more info on this subject read:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5492142/,
http://dpt-info.u-strasbg.fr/~c.michel/Circular_Codes.pdf,
2007 Christian MICHEL. CIRCULAR CODES IN GENES
code <- gcatbase::code(c("ACG", "CGG", "AC"))
is_code_circular(code)
is_code_cn_circular(tuples)
tuples A gcatbase::gcat.code object
Boolean value. True if the code is Cn circular.
A code is cn circular if all circular permutations of the code (of all tuples) are circular codes again. In total, this function checks 'x' circular permutations where 'x' is the least common multiple of all tuple lengths used. This is an extended property of circular codes.
code <- gcatbase::code(c("ACG", "CGG", "AC"))
k <- is_code_cn_circular(code)
is_code_comma_free(tuples)
tuples A gcatbase::gcat.code object
Boolean value. True if the code is comma free.
This function checks if a code is comma free.
Comma free codes are a more restrictive codes from the circular code family.
A comma free code X is a code in which no concatenation of a
nonempty suffix of any word from X and a nonempty prefix of any word from X forms a word from X.
This is an extended property of the circular codes. See is_code_circular for more details.
For more info on this subject read:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5492142/,
http://dpt-info.u-strasbg.fr/~c.michel/Circular_Codes.pdf,
2007 Christian MICHEL. CIRCULAR CODES IN GENES
code <- gcatbase::code(c("ACG", "CGG", "AC"))
is_code_comma_free(code)
is_code_strong_comma_free(tuples)
tuples A gcatbase::gcat.code object
Boolean value. True if the code is strong comma free.
This function checks if a code is strong comma free.
Strong comma free codes are a more restrictive codes from the circular code family.
A strong comma free code X is a code in which no nonempty suffix of any word from X
is a nonempty prefix of any word from X.
This is an extended property of the circular codes. See is_code_comma_free for more details.
For more info on this subject read:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5492142/,
http://dpt-info.u-strasbg.fr/~c.michel/Circular_Codes.pdf,
2007 Christian MICHEL. CIRCULAR CODES IN GENES
code <- gcatbase::code(c("ACG", "CGG", "AC"))
is_code_strong_comma_free(code)
get_exact_k_circular(tuples)
tuples A gcatbase::gcat.code object
Integer value, the exact k value of the k-circularity.
K circle codes are a less restrictive code from the family of circle codes. These codes only ensure that for every
concatenation of less than k tuples from X written on a circle, there is only one partition in tuples from X.
For mor details see: https://link.springer.com/article/10.1007/s11538-020-00770-7
code <- gcatbase::code(c("ACG", "CGG", "AC"))
k <- get_exact_k_circular(code)
get_k_graph_circular(tuples)
tuples A gcatbase::gcat.code object
Integer value, the k-graph value of the k-graph-circularity. If code is not k-graph circle it returns -1.
K-graph circle codes are a more restrictive than k-circle codes.
These codes only ensure that all cycles in the representing graph have a common length.
If the code is circular or the code contains cycles of
different length the teh function returns -1.
For mor details see: https://link.springer.com/article/10.1007/s11538-020-00770-7
code <- gcatbase::code(c("ACG", "CGG", "AC"))
k <- get_k_graph_circular(code)
c3_codes(is = seq_along(all_c3_codes))
is a vector of Integer. for all i in is: 1 <= i <= 216.
A list of C3 codes
Returns a list of maximal self-complementary C3 codes. Without paramet it returns all C3 codes.
R \-
c3_code(i)
i Integer 1 <= i <= 216. The number of the C3 code
A C3 code
Returns the i-th maximal self-complmentary C3 code. There are exactly 216 such codes.
R \-
c3_equiv_class(cid)
cid Integer 0 < i < 217. The number of the C3 code.
Its equivalence class.
Equivalence class for a C3 code number.
R \-
c3_equivmatrix
-
First column: C3 code number, second row: equivalence class.
Table for mapping of code numbers to equivalence classes.
R \-
c3_in_class(eid)
eid Equivalence class.
List of C3 code numbers.
All code numbers for given equivalence class
R \-
get_representing_graph(code, show_cycles = F, show_longest_path = F)
code is A gcatbase::gcat.code object.
show_cycles A bool value. If true the all edges which are part of a cycle are colored red.
show_longest_path A bool value. If true the all edges part of the longest path are colored blue.
A igraph (\url{http://igraph.org/r/}) object: A graph representing a circular code.
This function factors a igraph (\url{http://igraph.org/r/}) object of an representing graph of a circular code.
The following definition describes a directed graph to an code.
Recall from graph theory (Clark and Holton, 1991) that a graph G consists of
a finite set of vertices (nodes) V and a finite set of edges E. Here, an edge is a set {v,w} of vertices
from V . The graph is called oriented if the edges have an orientation, i.e. edges are considered to be
ordered pairs v,w in this case.
Definition Let X be a code. We define a directed graph G(X) =
(V (X),E(X)) with set of vertices V (X) and set of edges E(X) as follows:
V (X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X, 0 < i < n}
E(X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X, 0 < i < n}
The graph G(X) is called the representing graph of X or the graph associated to X.
Basically, the graph G(X) associated to a code X interprets n-tuples from X in (n−1) ways
by pairs of i-tuples and (n-i)-tuples for 0 < i < n.
2007 E. FIMMEL, C. J. MICHEL, AND L. STRÜNGMANN. N-nucleotide circular codes in graph theory
code <- gcatbase::code(c("ACG", "CGG", "AC"))
G <- get_representing_graph(code, TRUE, TRUE)
igraph::tkplot(G)
get_component_of_representing_graph(
code,
i,
show_cycles = F,
show_longest_path = F
)
code is A gcatbase::gcat.code object.
i the component index.
show_cycles A bool value. If true the all edges which are part of a cycle are colored red.
show_longest_path A bool value. If true the all edges part of the longest path are colored blue.
A igraph (\url{http://igraph.org/r/}) object: A graph representing a circular code.
This function factors a igraph (\url{http://igraph.org/r/}) object of an representing graph of a circular code.
The following definition describes a directed graph to an code.
Recall from graph theory (Clark and Holton, 1991) that a graph G consists of
a finite set of vertices (nodes) V and a finite set of edges E. Here, an edge is a set {v,w} of vertices
from V . The graph is called oriented if the edges have an orientation, i.e. edges are considered to be
ordered pairs v,w in this case.
Definition Let X be a code and i be a positive integer, the component index.
We define a directed graph G(X) =
(V (X),E(X)) with set of vertices V (X) and set of edges E(X) as follows:
V (X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X} for a given i
E(X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X} for a given i
The graph G(X) is called the i-component of the representing graph of X or the graph associated to X.
2007 E. FIMMEL, C. J. MICHEL, AND L. STRÜNGMANN. N-nucleotide circular codes in graph theory
code <- gcatbase::code(c("ACG", "CGG", "AC"))
G <- get_component_of_representing_graph(code, i, TRUE, TRUE)
igraph::tkplot(G)
plot_representing_graph(code, show_cycles = F, show_longest_path = F)
code is A gcatbase::gcat.code object.
show_cycles A bool value. If true the all edges which are part of a cycle are colored red.
show_longest_path A bool value. If true the all edges part of the longest path are colored blue.
returns an integer, the id of the plot, this can be used to manipulate it from the command line. tk_canvas returns tkwin object, the Tk canvas..
This function factors a igraph (\url{http://igraph.org/r/}) object of an representing graph of a circular code.
The following definition describes a directed graph to an code.
Recall from graph theory (Clark and Holton, 1991) that a graph G consists of
a finite set of vertices (nodes) V and a finite set of edges E. Here, an edge is a set {v,w} of vertices
from V . The graph is called oriented if the edges have an orientation, i.e. edges are considered to be
ordered pairs v,w in this case.
Definition Let X be a code. We define a directed graph G(X) =
(V (X),E(X)) with set of vertices V (X) and set of edges E(X) as follows:
V (X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X, 0 < i < n}
E(X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X, 0 < i < n}
The graph G(X) is called the representing graph of X or the graph associated to X.
Basically, the graph G(X) associated to a code X interprets n-tuples from X in (n−1) ways
by pairs of i-tuples and (n-i)-tuples for 0 < i < n.
2007 E. FIMMEL, C. J. MICHEL, AND L. STRÜNGMANN. N-nucleotide circular codes in graph theory
code <- gcatbase::code(c("ACG", "CGG", "AC"))
h <- plot_representing_graph(code, TRUE, TRUE)
plot_component_of_representing_graph(
code,
i,
show_cycles = F,
show_longest_path = F
)
code is A gcatbase::gcat.code object.
i the component index.
show_cycles A bool value. If true the all edges which are part of a cycle are colored red.
show_longest_path A bool value. If true the all edges part of the longest path are colored blue.
returns an integer, the id of the plot, this can be used to manipulate it from the command line. tk_canvas returns tkwin object, the Tk canvas.
This function plots a igraph (\url{http://igraph.org/r/}) object of an representing graph of a circular code.
The following definition describes a directed graph to an code.
Recall from graph theory (Clark and Holton, 1991) that a graph G consists of
a finite set of vertices (nodes) V and a finite set of edges E. Here, an edge is a set {v,w} of vertices
from V . The graph is called oriented if the edges have an orientation, i.e. edges are considered to be
ordered pairs v,w in this case.
Definition Let X be a code and i be a positive integer, the component index.
We define a directed graph G(X) =
(V (X),E(X)) with set of vertices V (X) and set of edges E(X) as follows:
V (X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X} for a given i
E(X) = {N1...Ni,Ni+1...Nn : N1N2N3...Nn in X} for a given i
The graph G(X) is called the i-component of the representing graph of X or the graph associated to X.
2007 E. FIMMEL, C. J. MICHEL, AND L. STRÜNGMANN. N-nucleotide circular codes in graph theory
code <- gcatbase::code(c("ACGG", "CGGC", "AC"))
h <- plot_component_of_representing_graph(code, 2, TRUE, TRUE)
get_cyclic_paths(tuples)
tuples A gcatbase::gcat.code object
A list of String vectors with all cyclic paths
This function returns all cyclic paths in the graph associated to a set of words X.
code <- gcatbase::code(c("ACG", "CGA", "CA"))
lp <- get_cyclic_paths(code)
get_longest_paths(tuples)
tuples A gcatbase::gcat.code object
A list of String vectors with all longest paths.
This function returns all longest paths in the graph associated to a set of words X.
code <- gcatbase::code(c("ACG", "CGG", "AC"))
lp <- get_longest_paths(code)
Code and documentation copyright 2018-2022 Mannheim University of Applied Sciences. Code released under the Apache License, Version 2.0.