Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and document Santu Tikka's causaleffect R package #8

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9adcfab
Added testthat
hmhummel Jul 11, 2024
2839a57
Added documentation to simplify.R and made progress on test cases & u…
hmhummel Jul 15, 2024
99ae0ab
Documented simplify.R, made progress on test-simplify unit test for 3…
hmhummel Jul 18, 2024
9edc6c6
Update causal.effect.R
djinnome Jul 22, 2024
be4cff1
Documented/annotated the following functions: join, simplify, powerset.
hmhummel Jul 22, 2024
75831ee
Started documenting/annotating causal.effect
hmhummel Jul 22, 2024
6d743c2
Continuing to work on test case #2 unit tests for parse.expression & …
hmhummel Jul 23, 2024
933c0c6
Was able to get parse.expression unit test for test case #2 to pass!
hmhummel Jul 25, 2024
aa14320
Unit tests for parse.expression and simplify for test case #2 now pass!
hmhummel Jul 25, 2024
53815b3
Added 2 unit tests for test case #2 with causal.effect simp = TRUE
hmhummel Jul 25, 2024
35dc831
Separated 3 test cases' unit tests into 3 separate files
hmhummel Jul 26, 2024
127633a
Rename test-simplify.R to all_3_test_cases.R
hmhummel Jul 26, 2024
a792fb1
Formatted all 3 test cases and worked on test case #2 parse.expressio…
hmhummel Jul 26, 2024
852cca6
Update test_case_1.R
hmhummel Jul 26, 2024
5dc7b77
Update test_case_2.R
hmhummel Jul 26, 2024
bf5a1a9
Update test_case_3.R
hmhummel Jul 26, 2024
f420a39
Concluded merge and renamed test-simplify.R to all_3_test_cases.R
hmhummel Jul 26, 2024
3fa640c
Resolved merge conflict in test_case_2.R
hmhummel Jul 26, 2024
9195ed7
Completed test case files. All 17 total unit tests now pass.
hmhummel Jul 26, 2024
254f415
Updated to completed version of test cases, where all 17 unit tests pass
hmhummel Jul 26, 2024
2868d55
created R documentation file for the function simplify(), and comment…
hmhummel Jul 27, 2024
1724971
Added roxygen line to DESCRIPTION, began to create documentation for …
hmhummel Jul 30, 2024
c0c8a2c
Delete man/simplify.Rd
hmhummel Jul 30, 2024
ebde087
Updated DESCRIPTION file to include Roxygen: list(markdown = TRUE) an…
hmhummel Jul 31, 2024
55ef507
Adding .Rd documentation file for simplify function to man
hmhummel Jul 31, 2024
2a63864
Created working documentation for join.R using roxygen2, and updated …
hmhummel Jul 31, 2024
75b53b6
Started documenting insert function, worked on documentation for join…
hmhummel Jul 31, 2024
1ac3884
Added a unit test for join() to vignette #1, and updated join() funct…
hmhummel Aug 2, 2024
a142e08
Changed documentation for simplify - observed.graph and unobserved.gr…
hmhummel Aug 5, 2024
ce05b4b
Updated simplify documentation further regarding the G.unobs, G, and …
hmhummel Aug 5, 2024
41ed159
Completed documentation of insert.R and created insert.Rd documentati…
hmhummel Aug 6, 2024
3833a78
Fixed typo in insert documentation
hmhummel Aug 6, 2024
f9d47f4
Fixed typo in test case #1
hmhummel Aug 6, 2024
1069e99
Added join unit test (from causal.effect with simp = FALSE) to test c…
hmhummel Aug 6, 2024
a4609f1
Updated test case 3 with 2 new unit tests for join from simp = FALSE …
hmhummel Aug 7, 2024
1c5c9cd
Created join and insert unit tests for test cases #1 and #3. Made for…
hmhummel Aug 7, 2024
e8652e8
Created documentation for powerset function
hmhummel Aug 13, 2024
4a5cd78
Clarified documentation for simplify, join and insert (added dependen…
hmhummel Aug 16, 2024
0d0283d
Clarified that unit test for insert in test case 1 passes. Attempted …
hmhummel Aug 16, 2024
0c20235
Added simplify with breakpoints used to create join and insert unit t…
hmhummel Aug 20, 2024
a9dbd4e
Added documentation for parse.expression function. The simplify.Rd an…
hmhummel Aug 23, 2024
5de7924
Updated compiled file of all 3 test cases with 25 total unit tests. 2…
hmhummel Aug 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
^graphs/.*$
^doc$
^Meta$
^causaleffect\.Rproj$
9 changes: 8 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,15 @@ URL: https://github.com/santikka/causaleffect/
Description: Functions for identification and transportation of causal effects. Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006) <http://ftp.cs.ucla.edu/pub/stat_ser/r329-uai.pdf>, an algorithm for transportability from multiple domains with limited experiments by Bareinboim, E. and Pearl, J. (2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, and a selection bias recovery algorithm by Bareinboim, E. and Tian, J. (2015) <http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf>. All of the previously mentioned algorithms are based on a causal effect identification algorithm by Tian , J. (2002) <http://ftp.cs.ucla.edu/pub/stat_ser/r309.pdf>.
License: GPL (>= 2)
Imports: igraph
Suggests: R.rsp, XML
Suggests:
R.rsp,
testthat (>= 3.0.0),
XML
VignetteBuilder: R.rsp
NeedsCompilation: no
Author: Santtu Tikka [aut, cre] (<https://orcid.org/0000-0003-4039-4342>)
Maintainer: Santtu Tikka <[email protected]>
Config/testthat/edition: 3
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
Encoding: UTF-8
4 changes: 3 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
export(aux.effect)
export(generalize)
export(causal.effect)
export(simplify)
export(join)
export(get.expression)
export(meta.transport)
export(parse.graphml)
Expand Down Expand Up @@ -28,4 +30,4 @@ importFrom(igraph, `%->%`)
importFrom(igraph, V)
importFrom(igraph, E)
importFrom(stats, setNames)
importFrom(utils, combn)
importFrom(utils, combn)
32 changes: 32 additions & 0 deletions R/causal.effect.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,43 @@
# Causal Effect
#
# This function computes the causal effect of a set of variables x on another
# set of variables y given an optional set of variables z, using a given causal
# graph G.
#
#
# y:
# x:
# z:
# G:
# expr:
# simp:
# steps:
# primes:
# prune:
# stop_on_nonid:
#
# Returns:
#
#
# Causaleffect dependencies: probability, deconstruct, get.expression, pid,
# id(?),idc, observed.graph, unobserved.graph, parse.expression, set.primes(?)




causal.effect <- function(y, x, z = NULL, G, expr = TRUE, simp = FALSE, steps = FALSE, primes = FALSE, prune = FALSE, stop_on_nonid = TRUE) {
# if there aren't any attributes in the graph, then we need to create them.
if (length(igraph::edge.attributes(G)) == 0) {
G <- igraph::set.edge.attribute(G, "description", 1:length(igraph::E(G)), NA)
}
# G.obs is the version of G that has no unobserved variables or bidirected edges representing unobserved confounders
G.obs <- observed.graph(G)
if (!igraph::is.dag(G.obs)) stop("Graph 'G' is not a DAG")
# This is the topological sort of a graph's directed edges
topo <- igraph::topological.sort(G.obs)
# This labels the vertices of the topological sort to be the same as the names of the original graph.
topo <- igraph::get.vertex.attribute(G, "name")[topo]
# sanity checks to make sure that x, y and z are in the topological sort of the graph
if (length(setdiff(y, topo)) > 0) stop("Set 'y' contains variables not present in the graph.")
if (length(setdiff(x, topo)) > 0) stop("Set 'x' contains variables not present in the graph.")
if (length(z) > 0 && !identical(z, "")) {
Expand Down
108 changes: 106 additions & 2 deletions R/insert.R
Original file line number Diff line number Diff line change
@@ -1,24 +1,128 @@
#' Insert
#'
#' \code{Insert} will insert a missing variable into a joint distribution \eqn{P(J|D)}
#' using d-separation criteria in a given graph \code{G}.
#' It is called when there are variables without corresponding terms in the expression
#'
#' @param J character vector. The set of variables representing the joint distribution.
#' @param D character vector. The set of variables representing the conditioning set of the joint distribution.
#' @param M character vector. The variable to be inserted.
#' @param cond character vector. The set of conditioning variables.
#' @param S character vector. The current summation variable.
#' @param O character vector. The set of observed variables.
#' @param G.unobs igraph object created with \code{igraph::unobserved.graph(G)}. Separate graph that turns bidirected edges into explicit nodes for unobserved confounders.
#' @param G igraph object created with \code{igraph::graph.formula()}. Main graph G. Includes bidirected edges.
#' @param G.obs igraph object created with \code{igraph::observed.graph(G)}. Separate graph that does not contain bidirected edges (only contains the directed edges with observed nodes).
#' @param topo igraph list object created with \code{igraph::topological.sort} and \code{igraph::get.vertex.attribute}. The topological ordering of the vertices in graph G.
#'
#' @dependencies This function depends on several functions from the causaleffect package, including: \link{powerset} and \link{wrap.dSep}.
#'
#' @return \code{Insert} returns a list with:
#' \code{J.new}{character vector. An updated set of joint distribution variables.}
#' \code{D.new}{character vector. An updated set of conditioning variables.}
#' \code{M}{character vector. The inserted variable.}
#' \code{ds[[i]]}{character vector. The subset from the power set used in the insertion.}
#' However, if no conditions were met, \code{insert} will return the original \code{J} and \code{D}.
#'
#' @references Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
#'
#' @author Haley Hummel,
#' Psychology PhD student at Oregon State University
#'
#' @examples
#' \dontrun{
#' # defining graph information for G_1 using igraph
#' G_1 <- graph.formula(x -+ y, z -+ x, z -+ y , x -+ z, z -+ x, simplify = FALSE)
#' G_1 <- set.edge.attribute(graph = G_1, name = "description", index = c(4,5), value = "U")
#'
#' # defining observed nodes of graph G_1 using igraph
#' G_1.obs <- observed.graph(G_1)
#'
#' # defining unobserved nodes of graph G_1 using igraph
#' G_1.unobs <- unobserved.graph(G_1)
#'
#' # defining topological sort of graph G_1 using igraph
#' topo_1 <- igraph::topological.sort(G_1.obs)
#' topo_1 <- igraph::get.vertex.attribute(G_1, "name")[topo_1]
#'
#' # we can obtain the following from running simplify(P_1, topo_1, G_1.unobs, G_1,
#' # G_1.obs) with break points (the browser() function). I added print statements
#' # after step #5 in simplify():
#' # Step 6 - Inside nested while loop before join operation
#' # P$children[[k]]$var: y (this represents vari in simplify())
#' # P$children[[k]]$cond: z x (this represents cond in simplify())
#' # P$sumset[j]: z (this reprensents S in simplify())
#'
#' J_1 <- character()
#' D_1 <- character()
#' M_1 <- "x"
#' cond_1 <- c("z", "x")
#' S_1 <- "z"
#' O_1 <- c("z", "y")
#'
#' # we can obtain the following from the graph information:
#' # G.unobs = G_1.unobs
#' # G = G_1
#' # G.obs = G_1.obs
#' # topo = topo_1
#'
#' # we expect the output from this (representing J, D) to be:
#' # [1] character(0)
#' # [2] character(0)
#'
#' insert(J_1, D_1, M_1, cond_1, S_1, O_1, G_1.unobs, G_1, G_1.obs, topo_1)
#' }
#'
#' @seealso \code{\link{join}}, \code{\link{simplify}}, \code{\link{wrap.dSep}}, \code{\link{powerset}}
#'
#' @keywords models manip math utilities
#' @keywords graphs methods multivariate distribution probability
#' @concept probabilistic expressions
#' @concept graph theory
#' @concept joint distribution
#' @concept causal inference
#' @concept d-separation

insert <- function(J, D, M, cond, S, O, G.unobs, G, G.obs, topo) {
# Identify which elements of M are in D
mis.ind <- which(M %in% D)
# If no elements of M are in D, return original J and D
if (length(mis.ind) == 0) return(list(J, D))

# Identify the missing variable to be inserted
mis <- M[mis.ind]
M <- mis[length(mis)]
# If M is in cond, return original J and D
if (M %in% cond) return(list(J, D))

# Find the first element of J in the topological ordering 'topo'.
# Set V.prev to this element
# V.pi is set of vertices that are the ancestors of (precede) V.prev.
J.min <- min(which(J %in% topo))
V.prev <- J[J.min]
ind <- which(topo == V.prev)
# Get all vertices before V.prev in topological order
V.pi <- topo[0:(ind-1)]

# Compute the power set of V.pi excluding M.
# n is the numbere of subsets in the power set.
ds <- powerset(setdiff(V.pi, M))
n <- length(ds)
# Create the candidate set add
for (i in 1:n) {
add <- union(ds[[i]], M)
# Compute the set A
a.set <- union(setdiff(add, D), setdiff(D, add))
if (wrap.dSep(G.unobs, J, a.set, setdiff(D, a.set)) &&

# Check the d-separation criteria
if (wrap.dSep(G.unobs, J, a.set, setdiff(D, a.set)) &&
wrap.dSep(G.unobs, M, S, setdiff(ds[[i]], S))) {
# Update J.new and D.new
J.new <- union(J, M)
D.new <- ds[[i]]
return(list(J.new, D.new, M, ds[[i]]))
}
}
# If no conditions were met, return original J and D
return(list(J, D))
}
}
119 changes: 117 additions & 2 deletions R/join.R
Original file line number Diff line number Diff line change
@@ -1,33 +1,148 @@
#' Join
#'
#' \code{Join} and \code{insert} are essentially two variations of the underlying procedure of determining whether the terms of the atomic expression actually represent a joint distribution. \code{Join} is called when we are processing terms that already exist in the expression.
#' Attempts to combine two terms: the joint term \code{P(J|D)} obtained from \code{simplify()} and the
#' term \code{P(V|C) := P(Vk|Ck)} of the current iteration step. The goal is to
#' determine if these terms can be combined based on the d-separation criteria in the graph \code{G}.
#'
#'
#'
#' @param J character vector. Joint set \code{P(J|D)}; already processed and included in joint distribution
#' from previous \code{\link{simplify}} iteration. Initially, may be empty for the starting point of
#' the joint distribution. \code{vari} is added to expand it if d-separation conditions are met.
#' @param D character vector. Term \code{P(V|C) := P(Vk|Ck)}; set of variables that condition the joint distribution.
#' \code{Join} checks and updates \code{D} as necessary to maintain the validity of the joint distribution
#' when combined with \code{vari}.
#' @param vari character scalar. Current variable being considered for inclusion in the joint distribution.
#' @param cond character vector. Set of variables that condition the current variable \code{vari}. \code{Join} uses \code{cond}
#' to evaluate conditional independence and determine if \code{vari} can be added to \code{J}.
#' @param S likely a character vector. Not used directly in \code{join}. Current summation variable.
#' @param M character vector. Missing variables (variables not contained within the expression).
#' @param O character vector. Observed variables (variables contained within the expression).
#' @param G.unobs igraph object created with \code{igraph::unobserved.graph(G)}. Separate graph that turns bidirected edges into explicit nodes for unobserved confounders.
#' @param G igraph object created with \code{igraph::graph.formula()}. Main graph G. Includes bidirected edges.
#' @param G.obs igraph object created with \code{igraph::observed.graph(G)}. Separate graph that does not contain bidirected edges (only contains the directed edges with observed nodes).
#' @param topo igraph list object created with \code{igraph::topological.sort} and \code{igraph::get.vertex.attribute}. The topological ordering of the vertices in graph G.
#'
#' @dependencies This function depends on several functions from the causaleffect package, including: \link{powerset}, \link{wrap.dSep}, and \link{insert}.
#'
#' @return \code{Join} returns the joint result, or the original result if none of the conditions for joining were met.
#'
#' @references Tikka, S., & Karvanen, J. (2017). Simplifying probabilistic expressions in causal inference. Journal of Machine Learning Research, 18(36), 1-30.
#'
#' @author Haley Hummel,
#' Psychology PhD student at Oregon State University
#'
#' @seealso \code{\link{causal.effect}}, \code{\link{parse.expression}}, \code{\link{get.expression}}, \code{\link{probability}}
#'
#'
#' @examples
#' \dontrun{
#'
#' # defining graph information for G_1 using igraph
#' G_1 <- graph.formula(x -+ y, z -+ x, z -+ y , x -+ z, z -+ x, simplify = FALSE)
#' G_1 <- set.edge.attribute(graph = G_1, name = "description", index = c(4,5), value = "U")
#'
#' # defining observed nodes of graph G_1 using igraph
#' G_1.obs <- observed.graph(G_1)
#'
#' # defining unobserved nodes of graph G_1 using igraph
#' G_1.unobs <- unobserved.graph(G_1)
#'
#' # defining topological sort of graph G_1 using igraph
#' topo_1 <- igraph::topological.sort(G_1.obs)
#' topo_1 <- igraph::get.vertex.attribute(G_1, "name")[topo_1]
#'
#' # we can obtain the following from running simplify(P_1, topo_1, G_1.unobs, G_1,
#' # G_1.obs) with break points (the browser() function). I added print statements
#' # after step #5 in simplify():
#' # Step 6 - Inside nested while loop before join operation
#' # P$children[[k]]$var: y (this represents vari in simplify())
#' # P$children[[k]]$cond: z x (this represents cond in simplify())
#' # P$sumset[j]: z (this reprensents S in simplify())
#'
#' J_1 <- character()
#' D_1 <- character()
#' vari_1 <- "y"
#' cond_1 <- c("z", "x")
#' S_1 <- "z"
#' M_1 <- "x"
#' O_1 <- c("z", "y")
#'
#' # we can obtain the following from the graph information:
#' # G.unobs = G_1.unobs
#' # G = G_1
#' # G.obs = G_1.obs
#' # topo = topo_1
#'
#' # we expect the output from this to be:
#' # [1] "y"
#' # [2] "z" "x"
#'
#' join(J_1, D_1, vari_1, cond_1, S_1, M_1, O_1, G_1.unobs, G_1, G_1.obs, topo_1)
#' }
#'
#' @seealso \code{\link{simplify}}, \code{\link{wrap.dSep}}, \code{\link{insert}}
#'
#' @keywords models manip math utilities
#' @concept probabilistic expressions
#' @concept graph theory
#' @concept causal inference


join <- function(J, D, vari, cond, S, M, O, G.unobs, G, G.obs, topo) {
# initialize J and D as empty character vectors
J.new <- character()
D.new <- character()
# check if J is empty. If it is, set J.new (the joint subset) to vari and
# D.new (the conditioning subset) to cond, and then return these values.
# This represents the simplest case where the first variable forms the joint distribution alone.
if (length(J) == 0) {
J.new <- vari
D.new <- cond
return(list(J.new, D.new))
}
# Set up necessary variables for iteration. Calculate ancestors: find the
# of the first element of J in the topological order (the topo vector).
# V.prev is set to this element. Compute V.pi (aka: "G") as the set of
# vertices in topo that precede the first element of V.prev (aka: "J").
J.min <- min(which(J %in% topo))
V.prev <- J[J.min]
ind <- which(topo == V.prev)
V.pi <- topo[0:(ind-1)]
# The power set "ds" of V.pi, excluding elements in vari, is computed.
ds <- powerset(setdiff(V.pi, vari))
n <- length(ds)
# Iterate over the power set, forming candidate sets add, a.set, and b.set,
# sets used to characterize the changes needed in the conditioning sets to
# enable the combination of two probabilistic terms while preserving the
# required conditional independencies
# A represents necessary changes to the conditioning set D to combine the joint
# distribution term P(J|D) with the current term P(vari|cond)
# B represents necessary changes to the conditioning set cond to combine the
# joint term P(J|D) with the current P(vari|cond)
for (i in 1:n) {
add <- union(ds[[i]], vari)
a.set <- union(setdiff(add, D), setdiff(D, add))
a.set <- union(setdiff(add, D), setdiff(D, add))
b.set <- union(setdiff(ds[[i]], cond), setdiff(cond, ds[[i]]))
if (wrap.dSep(G.unobs, J, a.set, setdiff(D, a.set)) &&
# Check if they meet conditional independence (d-separation) conditions using the wrap.dSep function.
if (wrap.dSep(G.unobs, J, a.set, setdiff(D, a.set)) &&
wrap.dSep(G.unobs, vari, b.set, setdiff(cond, b.set))) {
# If conditions are satisfied, update J.new and D.new and return
J.new <- union(J, vari)
D.new <- ds[[i]]
return(list(J.new, D.new))
}
}
# If any element of M is in D, attempt to insert a missing variable from M into J
# and D using the insert and join functions.
if (any(M %in% D)) {
joint <- insert(J, D, M, cond, S, O, G.unobs, G, G.obs, topo)
# If the joint operation results in a larger J, return the joint result
if (length(joint[[1]]) > length(J)) {
return(joint)
}
}
# If no updates were made, return the original J and D
return(list(J, D))
}
Loading