Gene Significance Markdown.Rmd

---
title: "BS9001 Research Experience: Gene Significance"
author: "Austin Chia Cheng En (U1740366A)"
output:
  rmdformats::material:
    highlight: kate
    code_folding: "hide"
---

# 1) Determining Gene Significance

##### The rowSums function was used to determine the total sums of "1" values in the binary matrix of significant genes. 

##### A histogram was plotted to view the distribution of significance of the genes. 

```{r 1) Determining Gene Significance, collapse = TRUE, eval= FALSE, results= 'hide', message=FALSE}
row_total <- rowSums(boot_df, na.rm = FALSE, dims = 1)

hist(row_total)
mean(row_total) # 90.99891/1000
t.test(row_total)
```

# 2) Confusion Matrix

##### The package `caret` was installed and the library `caret` was imported. 

##### Two factors were created, one for the entire data matrix and the other for the reference.

##### A user defined function `draw_confusion_matrix` was written to draw a confusion matrix. The function was used to plot a confusion matrix and some additional details. The function generated the sensitivity, specificity, precision, recall, F1, accuracy and Kappa of the t-test that was done on the dataset.

```{r 2) Confusion Matrix, eval=FALSE}
## Determining significance of genes

sig_hist <- hist(row_total)

# For data
row_total <- as.matrix(rowSums(boot_df))
sig_gene <- as.numeric(row_total>200)
dat_fac <- as.factor(sig_gene)

# For reference
ref_fac <- as.factor(boot_df[,1])

library(caret)
cm <- confusionMatrix(data = dat_fac, reference = ref_fac )

draw_confusion_matrix <- function(cm) {
  
  layout(matrix(c(1,1,2)))
  par(mar=c(2,2,2,2))
  plot(c(100, 345), c(300, 450), type = "n", xlab="", ylab="", xaxt='n', yaxt='n')
  title('CONFUSION MATRIX', cex.main=2)
  
  # create the matrix 
  rect(150, 430, 240, 370, col='#3F97D0')
  text(195, 435, 'Class1', cex=1.2)
  rect(250, 430, 340, 370, col='#F7AD50')
  text(295, 435, 'Class2', cex=1.2)
  text(125, 370, 'Predicted', cex=1.3, srt=90, font=2)
  text(245, 450, 'Actual', cex=1.3, font=2)
  rect(150, 305, 240, 365, col='#F7AD50')
  rect(250, 305, 340, 365, col='#3F97D0')
  text(140, 400, 'Class1', cex=1.2, srt=90)
  text(140, 335, 'Class2', cex=1.2, srt=90)
  
  # add in the cm results 
  res <- as.numeric(cm$table)
  text(195, 400, res[1], cex=1.6, font=2, col='white')
  text(195, 335, res[2], cex=1.6, font=2, col='white')
  text(295, 400, res[3], cex=1.6, font=2, col='white')
  text(295, 335, res[4], cex=1.6, font=2, col='white')
  
  # add in the specifics 
  plot(c(100, 0), c(100, 0), type = "n", xlab="", ylab="", main = "DETAILS", xaxt='n', yaxt='n')
  text(10, 85, names(cm$byClass[1]), cex=1.2, font=2)
  text(10, 70, round(as.numeric(cm$byClass[1]), 3), cex=1.2)
  text(30, 85, names(cm$byClass[2]), cex=1.2, font=2)
  text(30, 70, round(as.numeric(cm$byClass[2]), 3), cex=1.2)
  text(50, 85, names(cm$byClass[5]), cex=1.2, font=2)
  text(50, 70, round(as.numeric(cm$byClass[5]), 3), cex=1.2)
  text(70, 85, names(cm$byClass[6]), cex=1.2, font=2)
  text(70, 70, round(as.numeric(cm$byClass[6]), 3), cex=1.2)
  text(90, 85, names(cm$byClass[7]), cex=1.2, font=2)
  text(90, 70, round(as.numeric(cm$byClass[7]), 3), cex=1.2)
  
  # add in the accuracy information 
  text(30, 35, names(cm$overall[1]), cex=1.5, font=2)
  text(30, 20, round(as.numeric(cm$overall[1]), 3), cex=1.4)
  text(70, 35, names(cm$overall[2]), cex=1.5, font=2)
  text(70, 20, round(as.numeric(cm$overall[2]), 3), cex=1.4)
}  

dat_cm <- draw_confusion_matrix(cm)
```