-
Notifications
You must be signed in to change notification settings - Fork 0
/
2_clusterCells.Rmd
129 lines (92 loc) · 3.75 KB
/
2_clusterCells.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "scRNA-seq Rat Metrial Glands"
author: "Ha T. H. Vu"
output: html_document
---
```{r setup, include=FALSE}
options(max.print = "75")
knitr::opts_chunk$set(
echo = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "Files/",
fig.width = 15,
prompt = FALSE,
tidy = FALSE,
message = FALSE,
warning = TRUE
)
knitr::opts_knit$set(width = 75)
```
This is a documentation for analyses of scRNA-seq data, generated from rat metrial gland tissues on gestational day (GD) 15.5 and 19.5. <br>
In this step, we will be clustering cells using a standard workflow to find groups of similar cells.
## 1. GD15.5 cells
```{r}
library(dplyr)
library(Seurat)
library(patchwork)
library(ggplot2)
set.seed(1234)
load("/work/LAS/geetu-lab/hhvu/project3_scATAC/scRNA-seq-analysis/2_mergedSamples/gd15.5.RNAmerged.rda")
data <- gd15.5.merged
#remove genes on chromosome MT
mart <- biomaRt::useMart("ensembl", host="http://sep2019.archive.ensembl.org", dataset = "rnorvegicus_gene_ensembl") #to make sure that we use the correct version of Ensembl
rnor6 <- biomaRt::getBM(attributes = c("ensembl_gene_id", "external_gene_name", "chromosome_name", "start_position", "end_position", "strand", "gene_biotype"), mart = mart) #build attribute table
mt.genes <- subset(rnor6, rnor6$chromosome_name == "MT")$external_gene_name
counts <- GetAssayData(data, assay = "RNA")
counts <- counts[-(which(rownames(counts) %in% mt.genes)),]
data <- subset(data, features = rownames(counts))
# Normalize data
data <- NormalizeData(data, normalization.method = "LogNormalize", scale.factor = 10000)
data <- FindVariableFeatures(data, selection.method = "vst", nfeatures = 2000)
# Scale data
data <- ScaleData(data)
# PCA
data <- RunPCA(data, features = VariableFeatures(object = data), npcs = 100)
# Determining dimensions
data <- JackStraw(data, num.replicate = 100, dims = 100)
data <- ScoreJackStraw(data, dims = 1:100)
JackStrawPlot(data, dims = 1:100)
ElbowPlot(data, ndims = 100)
```
From the Jack Straw plot and elbow plot, we can see component 72 was the last significant one in the first 100 PCs, so we use `dim = 72` for the finding clusters step.
```{r}
# Cluster cells + UMAP
dim <- 72
data <- FindNeighbors(data, dims = 1:dim)
data <- FindClusters(data, resolution = 0.8)
data <- RunUMAP(data, dims = 1:dim)
DimPlot(data, reduction = "umap")
table(Idents(data))
#save(data, file = "/work/LAS/geetu-lab/hhvu/project3_scATAC/scRNA-seq-analysis/3_cluster_DEG/rmChrMTgenes/GD15.5/gd15.5_res0.8.rda")
```
## 2. GD19.5 cells
First, we determine the demensions to keep of the dataset. Since our data had been normalized and scaled from the integration step, we will calculate and generate the Jack Straw plot and elbow plot directly.
```{r}
set.seed(1234)
load("/work/LAS/geetu-lab/hhvu/project3_scATAC/scRNA-seq-analysis/2_mergedSamples/integrationGD19.5/rmChrMTgenes/gd19.5-4-5-6-7.integrated.rda")
data <- gd19.5.combined
DefaultAssay(data) <- "integrated"
# PCA
data <- RunPCA(data, features = VariableFeatures(object = data), npcs = 100)
# Determining dimensions
data <- JackStraw(data, num.replicate = 100, dims = 100)
data <- ScoreJackStraw(data, dims = 1:100)
JackStrawPlot(data, dims = 1:100)
ElbowPlot(data, ndims = 100)
```
From the Jack Straw plot and elbow plot, we can see component 77 was the last significant one in the first 100 PCs, so we use `dim = 77` for the finding clusters step.
```{r}
# Cluster cells + UMAP
dim <- 77
data <- FindNeighbors(data, dims = 1:dim)
#test resolutions
data2 <- FindClusters(data, resolution = 0.8)
data2 <- RunUMAP(data2, dims = 1:dim)
DimPlot(data2, reduction = "umap")
table(Idents(data2))
#save(data2, file = "/work/LAS/geetu-lab/hhvu/project3_scATAC/scRNA-seq-analysis/3_cluster_DEG/rmChrMTgenes/GD19.5/gd19.5-4-5-6-7_res0.8.rda")
```
```{r}
sessionInfo()
```