This repository will contain the code for generating (Java
) and analysing (R programming language
) a simple multi-Agent Based Model (mABM) of patients in a 2D continuous space. The idea is to simulate (i) causal diagrams for interference, (ii) ABMs, and the contagion of a disease thought a social network: human papilloma virus (HPV).
Note: This .java
project could be run by creating a project using your favourite Integrated Development Environment (IDE) and cloning this repository. This simulation relies on the Mason
java library and its dependencies: already included in the repository. If there are any problems, the link to Mason
dependencies is the following: https://cs.gmu.edu/~eclab/projects/mason/
- Learn Java programming while creating an ABM
- Learn R programming while using the G-methods to analyse the results generated from the ABM
- Integrate of ABMs and Causal Inference
- Integrate the spatial dimension with time-dependent confounders
- Violate the Stable Unit Treatment Value Assumption (SUTVA) of causal inference
Why this integration? ABM are used to simulate individuals and the consequences of interactions and their behaviours. Nevertheless, the accuracy of this simulations is highly reliant on capturing the complexity of the relationships between individuals over spatio-temporal scenarios. Current methodologies lack the sophistication to capture causal relationships. By integrating ABM and causal inference:
- more complex and accurate simulations could be implemented
- better understanding about how populations react to interventions
Why the violation of SUTVA? SUTVA states that:
- Individuals do not interfere with each other (this is the strength of AMBs)
- Treatment assignment of one unit does not affect outcome of another unit
Traditionally, causal inference relied on the assumption of no interference. Nonetheless, one individual's exposure may affect another individual's outcome. In the literature, there are described two effects:
- Indirect effect of one individual's treatment has an effect on another individual's outcome
- Direct effect of one individual's treatment on her/his own outcome.
VanderWeele et al. (2012b) demonstrated that the individual-level indirect effect of vaccination could be decomposed into two effects:
-
Contagion is the indirect effect that vaccinating one individual may have on another by preventing the vaccinated individual from getting the disease and therefore from passing it on. E.g. vaccine for tetanus, hepatitis A and B, rabies, and measles reduce the susceptibility of treated individuals to the disease.
-
Infectiousness is the indirect effect that vaccination might have if, instead of preventing the vaccinated individual from getting the disease, it renders the disease less infectious, thereby reducing the probability that the vaccinated infected individual transmits the disease even if infected. E.g. malaria transmission-blocking vaccine prevents mosquitos from acquiring, and thereby from transmitting, malaria parasites upon biting infected individuals.
Our example will be a combination of all effects: direct, indirect with combination of contagion and infectiousness.
A social network is a collection of individuals and the ties betwen them. The presence of a tie between two individuals indicates that the individuals share some kind of relationship: family, friendship, partnership, etc. We will assume that the ties between agents will be undirectional.
Vaccine programs do not in general target distant, independent pairs of individuals; they target villages, cities, communities in which individuals are interconnected and their outcomes correlated. Therefore, assessing the presence of vaccine effects in social network data may be informative for real-world applications.
Note!: This simulation is not going to be 100% realistic, it will be a proof-of-concept on how to integrate causal interference and ABMs in the ongoing endeavor to develop methods for valid causal inference using simulated data from a single network of agents.
HPV is the most common sexually transmitted infection. Most HPV infections cause no symptoms and resolve spontaneously. Nevertheless, it increases the risk of cancer of the cervix, vulva, vagina, penis, anus, mouth, or throat.
Risk factors include sexual intercourse, multiple partners, smoking, and poor immune system. HPV is typically transmitted by sustained direct skin-to-skin contact.
Once the HPV infects a person, an active infection occurs and the virus can be transmitted. Several months to years may elapse before the visible symptoms can be clinically detected in the form of intraepithelial lesions, making it difficult to know which partner was the source of infection.
HPV vaccines can prevent the most common types of infection. In women, HPV infection can cause cervix cancer and women are more likely to get vaccinated with Gardasil, preventing around 90% of infections. Nevertheless, vaccination is less common in men (here it comes our simple confounder: sex).
Causal diagrams, or causal directed acyclic graphs (DAGs) consist of nodes, representing the variables in a study, and arrows, representing direct causal effects. A path on a DAG is any unbroken, non-repeating sequence of arrows connecting one variable to another.
- DAGs are directed because there is a unique path that follows arrows from tail to head.
- DAGs are acyclic because they do not contemplate the existance of loops of arrows that converge in the same variable the first arrow emanated from. Reference for more information about DAGs: [include]
Following the description from Ogburn and VanderWeele (2014), it is often reasonable to make a "partial interference" assumption where interference can only occur within subgroups or "blocks" of agents that are separated in time and/or space. The conterfactual notation for interference will follow Hudgens and Halloran (2008): suppose than n
individual fall into N
blocks, indexed by k
with m = n/N
individuals in each block. In this example, we will assume N = 1
so that interference may occur between any two agens in the population: a full interference with no blocks.
Furthermore, we wish to estimate the average causal effect of a vaccine A
on an outcome Y
, infection, from simulation data on n
individuals for whom we have also measured a vector of confounders C
, in this case just one variable: sex of the agents. For simplicity , we assume that both A
and Y
are dichotomous, binary variables.
Let A ≡ (A_1,...,A_n)
be the vector of vaccination assignment under the assumption of single version of treatment for agents at a given time t
. Let Y ≡ (Y_1,...,Y_n)
be the vector of outcomes, let C ≡ (C_1,...,C_n)
be the array of covariates, and let f(Y) ≡ (f(Y)_1,...,f(Y)_n)
be the vector of a distance function of the outcome for n
agents at given time T = t
. Yi(a),a = 0,1
is defined as the counterfactual outcome we would have observed if, contrary to the fact, agent i
had received treatment a
, this is, if we would have observed for agent i
under an intervention that set A
to a
. In an undirected network, the degree of a node or agent is the number of edges or, equivalently, the number of alters. Let , that is, Ki
will be the degree of agent i
or the number of individuals sharing a tie with individual i
.
!
Table 1. Variables of the DAG
Variable | Meaning | Type |
---|---|---|
C |
Sex of the agent | Boolean |
A |
Has received vaccination? | Boolean |
Y |
Is infected? | Boolean |
f(Y) |
Interference of contagion, or inverse cumulative distance of infected | Double |
a |
Causal weight of C in A |
Double |
b |
Causal weight of C in Y |
Double |
c |
Causal weight of A in Y |
Double |
d |
Causal weight of f(Y) in Y or contagion weight |
Double |
The causal structure of the effect of of Ai
in Yi
is straightforward: Ai
has a direct protective effect on Yi
, represented by a direct arrow from Ai
to Yi
on the DAG. The effect of Ai
on Yj
will be represented as a mediated effect through Yi
and a function of the latter f(Yi)
. But this cannot be correct since Yi
and Yj
are contemporaneous and therefore one cannot cause the other. Instead, the effect of Ai
on Yj
will be mediated though a distance function f(Y)
of the evolution of the outcome of agent i
. This assumption is represented in Figure 1 where represents the outcome of individual i
at time t
. T
is the time of the end of the simulation. The dashed arrows represent times through 4
to T-1
which do not fit in the DAG (but which are observed in the simulation).
f(Y)
will be a vector of length n-1
comprised by the values of distance function f(Y)
of the vector of outcomes where the subindex -
can be indexed by -i
for all agents except for agent i
, D
represents the indexed distance D
between each agent -i
excluding agent i
to i
(Figure 1) at the time T = t
. f(Y)
was defined at time T = t
as:
In the extreme case where all -i
infected agents are at at distance D = 0
of the uninfected Yi = 0
agent i
, f(Y)i
will get its maximum value .Conversely, in the oposite extreme case where all -i
infected agents are infinitely distant D = +∞
from the uninfected Yi = 0
agent i
, f(Y)
causal effect will be minimum.
Figure 1. Distance between n = 6
agents at time T = t
. THE INDEX IS WRONG:CHANGE
We define the consistency assumption based on Ogburn and VanderWeele (2014) as:
The exchangeability assumption, also known as the "no unmeasured confounding assumption" to account for the causal effects under interference: we assume that we have measured a set of prevaccination covariates C
for each agent such that:
and the positivity assumption:
for all a
in support of A
and all c
in support of C
The overall effect (OE) of intervention a
compared to intervention a'
on subject i
is defined as:
where i
indicates that the expectations do not average over individuals and averages over the empirical mean of the conterfactual outcomes at time t
.
The unit level effect (UE) of treatment of agent i
fixes the treatment assignments for all agents expect for agent i
, and compares the conterfactual outcomes for agent i
under two different treatment assignments. Let be a vector of length n-1
of treatment values for all agents in the simulation except for agent i
, where
represents the agent's i
counterfactual outcome under the intervention in which all agents except for agent i
receive treatment
and agent i
receives treatment . Then, the UE will be defined as:
The spillover effect (SE) of intervention compared to intervention on agent i
fixes i
s treatment level and compares its conterfactual outcomes under two different interventions. The SE will be defined as:
and the total effect can be decomposed into a sum of unit level and spillover effects:
The posterior probability of getting the vaccine is just going to depend on the confounder sex (C
), a time-independent variable, and the baseline probability of getting the vaccine:
[I have to put this formula in proper mathematical terms!]
The posterior probability of infection Y
at time T = t
will depend on the confounder sex (C
), the vaccination status (A
), and the function of the outcome (f(Y)
):
In this mABM, the agents will be created at the same location but there will be some forces that will control their movement in the 2D space:
- Partner force
- Central force
- Random force
At every step of the simulation, each agent will have at least one partner.
The following variables were defined for the ABM:
Variable | Meaning | Type |
---|---|---|
probInfection |
Baseline probability of infection | Double |
probVaccine |
Baseline probability of vaccination | Double |
maxForce |
Max value of partner forces | Double |
centralForce |
Joining force to keep agents in the center of the 2D space | Double |
randomForce |
Weight that control the force that makes the agents wander randomly | Double |
promiscuityPopulation |
Probability of changing partners defined for the whole population | Double |
- E. L. Ogburn and T. J. VanDerWeele: "Causal Diagrams for Interference" Statistical Science, Vol. 29, No. 4, Special Issues on Semiparametrics and Causal Inference (November 2014), pp. 559-578 at https://www.jstor.org/stable/43288499. Accessed: 02-06-2018
- M. G. Hudgens and M. E. Halloran: "Towards Causal Inference with Interference" Journal of the American Statistical Association. 2008, June; 103 (482): pp. 832-842 at https://amstat.tandfonline.com/doi/abs/10.1198/016214508000000292#.WxaoRRzTWGA Accessed: 02-06-2018