-
Notifications
You must be signed in to change notification settings - Fork 1
/
Pierini_Mignella_Vestini.Rmd
482 lines (347 loc) · 17.8 KB
/
Pierini_Mignella_Vestini.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
---
title: "Pierini_Mignella_Vestini"
output: html_document
date: "2022-12-30"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library("rlist")
library("igraph")
```
# Loading the data
The first thing to do is load the data contained in the file "hw2_data.RData".
```{r}
load("hw2_data.RData")
```
We can now see that the file contained two different list of length 12, one for each group of patients:
- The data of the patients with the Autism Spectrum Disorder, are going to be contained in the "asd_sel" list.
- The data of the Typically Developed patients, are going to be contained in the "td_sel" list.
In both cases, each element of the lists contains a data frame (145x116): where 116 represent the different ROIs and 145 the number of observations in time.
# Pre-processing
After loading the data we can take a look inside it.
Here are the first 10 observations in time of the ROI 2001 of the patient caltech_0051472:
```{r, echo=FALSE}
asd_sel[[1]][[1]][1:10]
```
And the ones of the patient trinity_0050234:
```{r, echo=FALSE}
asd_sel[[2]][[1]][1:10]
```
We can notice that there is variability in the data.
So to deal with this we decided to standardize on a per-subject + per-ROI basis. To do so, for each subject we will subtract to their time series (all 116 of them) the arithmetic means and divide by the standard deviation, across time.
```{r}
for(i in (1:length(asd_sel))){ # Iterate over the patients
paziente <- asd_sel[[i]] # Get the patient data.frame
for(j in (1:116)){ # Iterate over the ROIs
m <- mean(paziente[[j]]) # Compute the mean
s <- sd(paziente[[j]]) # And the standard deviation
paziente[[j]] <- (paziente[[j]]-m)/s # Standardize
}
asd_sel[[i]] <- paziente # Save the new time series
}
for(i in (1:length(td_sel))){ # Iterate over the patients
paziente <- td_sel[[i]] # Get the patient data.frame
for(j in (1:116)){ # Iterate over the ROIs
m <- mean(paziente[[j]]) # Compute the mean
s <- sd(paziente[[j]]) # And the standard deviation
paziente[[j]] <- (paziente[[j]]-m)/s # Standardize
}
td_sel[[i]] <- paziente # Save the new time series
}
```
Let's look at the data after the standardization, again with the first 10 observations in time of the ROI 2001 of the patient caltech_0051472:
```{r, echo=FALSE}
asd_sel[[1]][[1]][1:10]
```
And the patient trinity_0050234:
```{r, echo=FALSE}
asd_sel[[2]][[1]][1:10]
```
# Pool the data together
We decided to find a representative for each of the groups of patients by using some kind of statistical summary.
So to pool together the data we will use the standard deviation over the patients, since we thought that it could be more relevant to study ROI co-variation in variability rather than average level. So we will consider the same ROI at the same time and compute it's standard deviation across the patients.
```{r}
# Get the names of the ROIs from the data.frame
names <- names(asd_sel[[1]])
# Create two data.frames one for each group.
# To build it we will use our chosen statistical summary
ASD <- data.frame() # Create the data.frame
for(j in (1:116)){ # Iterate over the ROIs
aux <- c()
for(i in (1:length(asd_sel))){ # Iterate over the patients
aux <- c(aux,asd_sel[[i]][[j]]) # Append the ROI of all the patients
}
aux <- matrix(aux, nrow = 145)
ASD <- rbind(ASD,round(apply(aux, 1, sd), 3)) # Compute the summary over the patients
}
ASD <- data.frame(t(ASD)) # Save it with the same dimensions of the originals data.frames
colnames(ASD) <- names # Add the names of the ROIs
TD <- data.frame() # Create the data.frame
for(j in (1:116)){ # Iterate over the ROIs
aux <- c()
for(i in (1:length(td_sel))){ # Iterate over the patients
aux <- c(aux,td_sel[[i]][[j]]) # Append the ROI of all the patients
}
aux <- matrix(aux, nrow = 145)
TD <- rbind(TD,round(apply(aux, 1, sd),3)) # Compute the summary over the patients
}
TD <- data.frame(t(TD)) # Save it with the same dimensions of the originals data.frames
colnames(TD) <- names # Add the names of the ROIs
```
The reason why we used the standard deviation is that we would like to give more importance to the differences between the patients in the same group. While the mean gives informations about what happen in the central part of the distribution, the standard deviation does a better job at giving informations about the data's variability.
# Threshold
To compute the threshold we will take a quantile of the correlation values found in both groups.
```{r}
Threshold <- function(asd_sel, td_sel, p){
# Find the correlation matrix for all the patients
a <- lapply(asd_sel,function(x) cor(x)) # Gives 12 matrices 116x116
t <- lapply(td_sel,function(x) cor(x)) # Gives 12 matrices 116x116
# Save all correlation values in the vector q
q <- c()
for(i in a){
q <- c(q, abs(i))
}
for(i in t){
q <- c(q, abs(i))
}
# Compute and return the wanted quantile
return(quantile(q, p))
}
```
# Find the correlation
We will use two different methods to find the correlation:
- For the first we will directly compute the correlation of the time-series between couples of ROIs.
- For the second we will compute the covariance matrix, of the time-series between couples of ROIs, and later find the matrix of partial correlations.
Here is the function that we will use to do so:
```{r}
correlation <- function(ASD, TD, partial = F){
# If we don't want to find the partial correlation coefficients
# we just compute it with the built in function
if(partial == F){
rhoA <- cor(ASD)
rhoT <- cor(TD)
}
# If we want to find the partial correlation coefficients
else{
Lambda_A <- solve(cov(ASD, ASD)) # Compute the covariance matrix and invert it
DIAG <- diag(Lambda_A) %*% t(diag(Lambda_A)) # Get the lambda_jj * lambda_kk
rhoA <- -Lambda_A/sqrt(DIAG) # And divide
Lambda_T <- solve(cov(TD, TD))
DIAG <- diag(Lambda_T) %*% t(diag(Lambda_T))
rhoT <- -Lambda_T/sqrt(DIAG)
}
return(list(rhoA, rhoT)) # Return both the correlation matrices
}
```
# Fisher Z-transform + Bonferroni correction
Now we have to find the 95% asymptotic Confidence Intervals, using the Fisher Z-transform and Bonferroni correction, so we will later be able to build the estimated graphs for each group (G_A and G_T).
We will apply the Z-transform by using the hyperbolic arctangent on the correlation values. On the values that we obtain we will build the confidence intervals, and than we will need to "go back" to the distribution of the correlations by applying the inverse of hyperbolic arctangent (hyperbolic tangent).
```{r}
Find_CIs <- function(ASD, TD, alpha, n, D, bonferroni = T, partial = F){
# Compute the correlations
Rho <- correlation(ASD, TD, partial)
rhoA <- Rho[[1]]
rhoT <- Rho[[2]]
# Apply Z-Transform, that is the same as applying the hyperbolic arctangent (atanh)
ZT_A <- atanh(rhoA)
ZT_T <- atanh(rhoT)
# Bonferroni correction
if(bonferroni){
alpha <- alpha/choose(D,2)
}
# Compute the asymptotic CIs + hyperbolic tangent
std <- 1/sqrt(n-3)
if(partial){
# In case of the
std <- 1/sqrt(n-D-1)
}
LA <- tanh(ZT_A - qnorm(1 - alpha/2)*std)
UA <- tanh(ZT_A + qnorm(1 - alpha/2)*std)
LT <- tanh(ZT_T - qnorm(1 - alpha/2)*std)
UT <- tanh(ZT_T + qnorm(1 - alpha/2)*std)
# Return all the matrices with the lower and upper bounds
return(list(LA, UA, LT, UT))
}
```
# Graphs
To build our graphs we will compute adjacency matrices.
Our adjacency matrix will be a 116x116 matrix with boolean values:
- TRUE, if there is a edge between the two ROIs, and that occurs when the intersection between the interval [-threshold, threshold] and the Confidence Interval, related to those ROIs, is empty.
- FALSE, otherwise.
```{r}
graph <- function(L, U, threshold){
# We have the intersection if the Lower bound is over the threshold or the Upper is under -threshold
adj <- L>threshold | U<(-threshold)
G <- graph_from_adjacency_matrix(
adjmatrix = adj,
mode = c("undirected"),
diag = F
)
return(G)
}
```
Here is an example of the graphs.
This was made:
1. Taking a 0.5 quantile as threshold, resulting in t = 0.18.
2. Using the Bonferroni correction.
```{r}
# Get the Confidence intervals
CIs <- Find_CIs(ASD, TD, 0.05, 145, 116)
# Define the threshold
t <- Threshold(asd_sel, td_sel, 0.5)
# Compute the graphs
G_A <- graph(CIs[[1]], CIs[[2]], t)
G_T <- graph(CIs[[3]], CIs[[4]], t)
```
You can see in green the graph for the ASD patients, in red the TD ones and in blue the graph that show the common connections.
```{r, echo=FALSE}
par(mfrow=c(1,3))
plot(G_A,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkgreen", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Autism Spectrum Disorder")
plot(G_T,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkred", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Typically Developed")
g <- graph.intersection(G_A, G_T, keep.all.vertices = TRUE)
plot(g,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="blue", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Common connections")
mtext(paste('Graphs based on 95% CIs, with Bonferroni correction. Threshold:', round(t,2)), side=3, outer=TRUE, line=-30)
```
From the graphs we can see that there are a few more connections in the case of the ASD patients, and that there are only a handfull of common connections.
# How it changes without Bonferroni
In the previous graph we applied the Bonferroni correction while defining the confidence intervals, here we will see how things change in case the correction is not applied.
We will be using the same data and threshold as before.
```{r}
# Get the Confidence intervals
CIs <- Find_CIs(ASD, TD, 0.05, 145, 116, bonferroni = F)
# Define the threshold
t <- Threshold(asd_sel, td_sel, 0.5)
# Compute the graphs
G_A <- graph(CIs[[1]], CIs[[2]], t)
G_T <- graph(CIs[[3]], CIs[[4]], t)
```
```{r, echo=FALSE}
par(mfrow=c(1,3))
plot(G_A,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkgreen", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Autism Spectrum Disorder")
plot(G_T,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkred", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Typically Developed")
g <- graph.intersection(G_A, G_T, keep.all.vertices = TRUE)
plot(g,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="blue", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Common connections")
mtext(paste('Graphs based on 95% CIs, without Bonferroni correction. Threshold:', round(t,2)), side=3, outer=TRUE, line=-30)
```
We can see that also in this case the number of connections for the ASD patients is higher that the one of the TD ones.
The biggest difference form the case with the Bonferroni correction is that there are many more edges in all the graphs. This can be explained by the fact that Bonferroni correction guarantees the 95% Confidence Intervals across all the elements. Without it the 95% CI is built for each of the elements, so it has a overall lower confidence level.
To quantify this, we have:
```{r, echo=FALSE}
# Get the Confidence intervals
CIsB <- Find_CIs(ASD, TD, 0.05, 145, 116, bonferroni = T)
CIs <- Find_CIs(ASD, TD, 0.05, 145, 116, bonferroni = F)
DifB <- CIsB[[2]]-CIsB[[1]]
Dif <- CIs[[2]]-CIs[[1]]
t <- Threshold(asd_sel, td_sel, 0.5)
G_AB <- graph(CIsB[[1]], CIsB[[2]], t)
G_A <- graph(CIs[[1]], CIs[[2]], t)
paste('The increment of the number of edges from the case without Bonferroni to the one with it:',
round(((length(E(G_AB))-length(E(G_A)))/length(E(G_A))) *100,3), '%')
```
This results are also confirmed if we look at how much the confidence intervals width increases between the two cases:
```{r, echo=FALSE}
paste('The increment of the confidence intervals width from the case without Bonferroni to the one with it:',
round(mean(replace(((DifB-Dif)/Dif) * 100 , is.na(((DifB-Dif)/Dif) * 100), 0)), 3), '%')
```
# Changing threshold
Now we want to look at how the results changes by varying the threshold t from small to large values.
To do so we will:
1. Use the Bonferroni correction.
2. Take different quantilies (0.0, 0.3, 0.6, 0.9) to obtain different thresholds (0, 0.1, 0.22, 0.44).
```{r, echo=FALSE}
CIs <- Find_CIs(ASD, TD, 0.05, 145, 116)
for(q in seq(from = 0, to = 0.9, length.out = 4)){
t <- Threshold(asd_sel, td_sel, q)
G_A <- graph(CIs[[1]], CIs[[2]], t)
G_T <- graph(CIs[[3]], CIs[[4]], t)
par(mfrow=c(1,3))
plot(G_A,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkgreen", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Autism Spectrum Disorder")
plot(G_T,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkred", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Typically Developed")
g <- graph.intersection(G_A, G_T, keep.all.vertices = TRUE)
plot(g,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="blue", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Common connections")
mtext(paste('Graphs based on 95% CIs, with Bonferroni correction. Threshold:', round(t,2)), side=3, outer=TRUE, line=-30)
}
```
As expected if we increase the threshold we always get less edges.
The previous pattern of the ASD group having more connections is still present, in particular we can see the last edge "to die" is in the ASD group and it's the edge:
```{r, echo=FALSE}
E(G_A)
```
Trying to find the "last connection to die" for the TD patients, we found that we can obtain it for the threshold 0.43 (0.89 quantile). In particular last ROIs connection is:
```{r, echo=FALSE}
t <- Threshold(asd_sel, td_sel, 0.89)
G_T <- graph(CIs[[3]], CIs[[4]], t)
E(G_T)
```
# Partial correlation coefficients
Now we want to look at the results using the partial correlation coefficients, so we will change the threshold t from small to large values, just like we did before.
So we will:
1. Use the Bonferroni correction.
2. Take different quantilies (0, 0.23, 0.47, 0.7) to obtain different thresholds (0, 0.08, 0.16, 0.28).
```{r, echo=FALSE}
CIs <- Find_CIs(ASD, TD, 0.05, 145, 116, partial = T)
for(q in seq(from = 0, to = 0.7, length.out = 4)){
t <- Threshold(asd_sel, td_sel, q)
G_A <- graph(CIs[[1]], CIs[[2]], t)
G_T <- graph(CIs[[3]], CIs[[4]], t)
par(mfrow=c(1,3))
plot(G_A,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkgreen", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Autism Spectrum Disorder")
plot(G_T,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="darkred", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Typically Developed")
g <- graph.intersection(G_A, G_T, keep.all.vertices = TRUE)
plot(g,layout=layout_on_sphere, vertex.size=6, edge.color = "black",
vertex.label.dist=1.3, vertex.color="blue", vertex.label.cex=0.5,
edge.width = 1.3, lwd=5,vertex.label.color =rgb(0,0,0,0.7),
main="Common connections")
mtext(paste('Graphs based on 95% CIs, with Bonferroni correction. Threshold:', round(t,2)), side=3, outer=TRUE, line=-30)
}
```
Again as expected if we increase the threshold we always get less edges.
The previous pattern of the ASD group having more connections is still present, even with a different method to compute the correlation, and in particular, we can see the last edge "to die" is in the ASD group and it's the edge:
```{r, echo=FALSE}
E(G_A)
```
Trying to find the "last connection to die" for the TD patients, as we can see from the third image, we found that we can obtain it for the threshold 0.16 (0.47 quantile). In particular last ROIs connection is:
```{r, echo=FALSE}
t <- Threshold(asd_sel, td_sel, 0.47)
G_T <- graph(CIs[[3]], CIs[[4]], t)
E(G_T)
```
The main differences that we can notice between the two methods is that:
1. Using the method of the partial correlations we obtain less connections between the ROIs.
2. The connection that we get, with this method, seems to be weaker, so they "die" with lower thresholds.
3. We seems to not obtain any common edges, with the second method.
The first and second points can be explained by the fact that, the asymptotic distribution of partial correlation coefficients has a higher variance, it's 0.19 = 1/n-g-3, with g = D-2, while it was 0.08 with the previous method, the higher variance gives us back wider Confidence Intervals.
The third point, makes us think that, for the goal of our analysis, the first method is the most efficient. The reason for that is: we are trying to find not only differences, but also common things between the two groups.