-
Notifications
You must be signed in to change notification settings - Fork 0
/
Gene Significance Markdown.Rmd
95 lines (74 loc) · 3.41 KB
/
Gene Significance Markdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
title: "BS9001 Research Experience: Gene Significance"
author: "Austin Chia Cheng En (U1740366A)"
output:
rmdformats::material:
highlight: kate
code_folding: "hide"
---
# 1) Determining Gene Significance
##### The rowSums function was used to determine the total sums of "1" values in the binary matrix of significant genes.
##### A histogram was plotted to view the distribution of significance of the genes.
```{r 1) Determining Gene Significance, collapse = TRUE, eval= FALSE, results= 'hide', message=FALSE}
row_total <- rowSums(boot_df, na.rm = FALSE, dims = 1)
hist(row_total)
mean(row_total) # 90.99891/1000
t.test(row_total)
```
# 2) Confusion Matrix
##### The package `caret` was installed and the library `caret` was imported.
##### Two factors were created, one for the entire data matrix and the other for the reference.
##### A user defined function `draw_confusion_matrix` was written to draw a confusion matrix. The function was used to plot a confusion matrix and some additional details. The function generated the sensitivity, specificity, precision, recall, F1, accuracy and Kappa of the t-test that was done on the dataset.
```{r 2) Confusion Matrix, eval=FALSE}
## Determining significance of genes
sig_hist <- hist(row_total)
# For data
row_total <- as.matrix(rowSums(boot_df))
sig_gene <- as.numeric(row_total>200)
dat_fac <- as.factor(sig_gene)
# For reference
ref_fac <- as.factor(boot_df[,1])
library(caret)
cm <- confusionMatrix(data = dat_fac, reference = ref_fac )
draw_confusion_matrix <- function(cm) {
layout(matrix(c(1,1,2)))
par(mar=c(2,2,2,2))
plot(c(100, 345), c(300, 450), type = "n", xlab="", ylab="", xaxt='n', yaxt='n')
title('CONFUSION MATRIX', cex.main=2)
# create the matrix
rect(150, 430, 240, 370, col='#3F97D0')
text(195, 435, 'Class1', cex=1.2)
rect(250, 430, 340, 370, col='#F7AD50')
text(295, 435, 'Class2', cex=1.2)
text(125, 370, 'Predicted', cex=1.3, srt=90, font=2)
text(245, 450, 'Actual', cex=1.3, font=2)
rect(150, 305, 240, 365, col='#F7AD50')
rect(250, 305, 340, 365, col='#3F97D0')
text(140, 400, 'Class1', cex=1.2, srt=90)
text(140, 335, 'Class2', cex=1.2, srt=90)
# add in the cm results
res <- as.numeric(cm$table)
text(195, 400, res[1], cex=1.6, font=2, col='white')
text(195, 335, res[2], cex=1.6, font=2, col='white')
text(295, 400, res[3], cex=1.6, font=2, col='white')
text(295, 335, res[4], cex=1.6, font=2, col='white')
# add in the specifics
plot(c(100, 0), c(100, 0), type = "n", xlab="", ylab="", main = "DETAILS", xaxt='n', yaxt='n')
text(10, 85, names(cm$byClass[1]), cex=1.2, font=2)
text(10, 70, round(as.numeric(cm$byClass[1]), 3), cex=1.2)
text(30, 85, names(cm$byClass[2]), cex=1.2, font=2)
text(30, 70, round(as.numeric(cm$byClass[2]), 3), cex=1.2)
text(50, 85, names(cm$byClass[5]), cex=1.2, font=2)
text(50, 70, round(as.numeric(cm$byClass[5]), 3), cex=1.2)
text(70, 85, names(cm$byClass[6]), cex=1.2, font=2)
text(70, 70, round(as.numeric(cm$byClass[6]), 3), cex=1.2)
text(90, 85, names(cm$byClass[7]), cex=1.2, font=2)
text(90, 70, round(as.numeric(cm$byClass[7]), 3), cex=1.2)
# add in the accuracy information
text(30, 35, names(cm$overall[1]), cex=1.5, font=2)
text(30, 20, round(as.numeric(cm$overall[1]), 3), cex=1.4)
text(70, 35, names(cm$overall[2]), cex=1.5, font=2)
text(70, 20, round(as.numeric(cm$overall[2]), 3), cex=1.4)
}
dat_cm <- draw_confusion_matrix(cm)
```