Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results by running clustering steps (FindNeighbors+FindClusters+RunUMAP) before/after Harmony #9673

Open
Mar10L opened this issue Feb 7, 2025 · 0 comments

Comments

@Mar10L
Copy link

Mar10L commented Feb 7, 2025

Dear Seurat team,

I am having some discrepancies in the results generated from the interaction (and different combination) of the clustering steps (FindNeighbors+FindClusters+RunUMAP) and harmony.

I am starting from a Seurat object which presents the pca reduction field.

seurat.norm <- seurat.filt %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData()

seurat.norm <- RunPCA(seurat.norm, seed.use = 39, npcs = 50)

CASE 1

On this object I then tested the presence of batch effect by running the clusterization as follows:


#  Section - 1

#checking of batch effect by running clusterization and UMAP on non-harmonized data
seurat.norm.noharm.test <- FindNeighbors(seurat.norm, dims = 1:40, k.param = 23, reduction="pca")
seurat.norm.noharm.test <- FindClusters(seurat.norm.noharm.test,verbose = TRUE, resolution=1.2, group.singletons = TRUE, random.seed = 17, do.sparse = T, save.SNN = T, pc.use =1:40, reduction="pca")
seurat.norm.noharm.test <- RunUMAP(seurat.norm.noharm.test, dims = 1:40, seed.use = 1843, reduction="pca")
#plotting
p_no_harm.clusters <- DimPlot(seurat.norm.noharm.test, label=T, reduction = "umap", group.by = "seurat_clusters", alpha = 0.5) + ggtitle("spots grouped by clusters")
p_no_harm.sample <- DimPlot(seurat.norm.noharm.test, label=F, reduction = "umap", group.by = "sample_name", alpha = 0.5) + ggtitle("spots grouped by sample")
p_no_harm.slide <- DimPlot(seurat.norm.noharm.test, label=F, reduction = "umap", group.by = "slide_id", alpha = 0.5) + ggtitle("spots grouped by slide_id")
p_no_harm.array <- DimPlot(seurat.norm.noharm.test, label=F, reduction = "umap", group.by = "sub_array", alpha = 0.5) + ggtitle("spots grouped by sub array")
p1 <- (p_no_harm.clusters - p_no_harm.sample) / (p_no_harm.slide - p_no_harm.array)
p1

Image

I then ran Harmony, followed by clusterization and UMAP


#  Section - 2.1

#correctin batch effect by running Harmony
seurat.norm.harm <- RunHarmony(object = seurat.norm, group.by.vars = c("sample_name"), theta = c(2), plot_convergence = T, assay.use = "Spatial", reduction = "pca", dims.use = 1:40, verbose = T, reduction.save = "harmony_1")


# Section - 2.2

#running clusterization and UMAP on harmonized data
seurat.norm.harm.test <- FindNeighbors(seurat.norm.harm, dims = 1:40, k.param = 23, reduction="harmony_1")
seurat.norm.harm.test <- FindClusters(seurat.norm.harm.test,verbose = TRUE, resolution=1.2, group.singletons = TRUE, random.seed = 17, do.sparse = T, save.SNN = T, pc.use =1:40, reduction="harmony_1")
seurat.norm.harm.test <- RunUMAP(seurat.norm.harm.test, dims = 1:40, seed.use = 1843, reduction="harmony_1")
p_harm.clusters <- DimPlot(seurat.norm.harm.test, label=T, reduction = "umap", group.by = "seurat_clusters", alpha = 0.5) + ggtitle("spots grouped by clusters")
p_harm.sample <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "sample_name", alpha = 0.5) + ggtitle("spots grouped by sample")
p_harm.slide <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "slide_id", alpha = 0.5) + ggtitle("spots grouped by slide_id")
p_harm.array <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "sub_array", alpha = 0.5) + ggtitle("spots grouped by sub array")
p2 <- (p_harm.clusters - p_harm.sample) / (p_harm.slide - p_harm.array)
p2

Image

CASE 2

After starting again from scratch, but skipping the estimation of the batch effect (skipping part 1) I ran the Harmony+clusterinzation, but obtained different clusterization results:

# normalization, feature selection, scaling, and PCA
seurat.norm <- seurat.filt %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData()

seurat.norm <- RunPCA(seurat.norm, seed.use = 39, npcs = 50)


#  Section - 2.1

#correctin batch effect by running Harmony
seurat.norm.harm <- RunHarmony(object = seurat.norm, group.by.vars = c("sample_name"), theta = c(2), plot_convergence = T, assay.use = "Spatial", reduction = "pca", dims.use = 1:40, verbose = T, reduction.save = "harmony_1")


#  Section - 2.2

#running clusterization and UMAP on harmonized data
seurat.norm.harm.test <- FindNeighbors(seurat.norm.harm, dims = 1:40, k.param = 23, reduction="harmony_1")
seurat.norm.harm.test <- FindClusters(seurat.norm.harm.test,verbose = TRUE, resolution=1.2, group.singletons = TRUE, random.seed = 17, do.sparse = T, save.SNN = T, pc.use =1:40, reduction="harmony_1")
seurat.norm.harm.test <- RunUMAP(seurat.norm.harm.test, dims = 1:40, seed.use = 1843, reduction="harmony_1")
p_harm.clusters <- DimPlot(seurat.norm.harm.test, label=T, reduction = "umap", group.by = "seurat_clusters", alpha = 0.5) + ggtitle("spots grouped by clusters")
p_harm.sample <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "sample_name", alpha = 0.5) + ggtitle("spots grouped by sample")
p_harm.slide <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "slide_id", alpha = 0.5) + ggtitle("spots grouped by slide_id")
p_harm.array <- DimPlot(seurat.norm.harm.test, label=F, reduction = "umap", group.by = "sub_array", alpha = 0.5) + ggtitle("spots grouped by sub array")
p3 <- (p_harm.clusters - p_harm.sample) / (p_harm.slide - p_harm.array)
p3

Image

CASE 3

I then started from scratch, as in CASE 2, but ran two times the section 2. What I get now are the same results as for CASE 1

# normalization, feature selection, scaling, and PCA
seurat.norm <- seurat.filt %>%
  NormalizeData() %>%
  FindVariableFeatures() %>%
  ScaleData()

seurat.norm <- RunPCA(seurat.norm, seed.use = 39, npcs = 50)


#  Section - 2.1

#correctin batch effect by running Harmony
seurat.norm.harm <- RunHarmony(object = seurat.norm, group.by.vars = c("sample_name"), theta = c(2), plot_convergence = T, assay.use = "Spatial", reduction = "pca", dims.use = 1:40, verbose = T, reduction.save = "harmony_1")


#  Section - 2.2

#running clusterization and UMAP on harmonized data
seurat.norm.harm.test <- FindNeighbors(seurat.norm.harm, dims = 1:40, k.param = 23, reduction="harmony_1")
seurat.norm.harm.test <- FindClusters(seurat.norm.harm.test,verbose = TRUE, resolution=1.2, group.singletons = TRUE, random.seed = 17, do.sparse = T, save.SNN = T, pc.use =1:40, reduction="harmony_1")
seurat.norm.harm.test <- RunUMAP(seurat.norm.harm.test, dims = 1:40, seed.use = 1843, reduction="harmony_1")


#  Section - 2.1 (second time)

#correctin batch effect by running Harmony
seurat.norm.harm2 <- RunHarmony(object = seurat.norm, group.by.vars = c("sample_name"), theta = c(2), plot_convergence = T, assay.use = "Spatial", reduction = "pca", dims.use = 1:40, verbose = T, reduction.save = "harmony")


#  Section - 2.2 (second time)

#running clusterization and UMAP on harmonized data
seurat.norm.harm2.test <- FindNeighbors(seurat.norm.harm2, dims = 1:40, k.param = 23, reduction="harmony")
seurat.norm.harm2.test <- FindClusters(seurat.norm.harm2.test,verbose = TRUE, resolution=1.2, group.singletons = TRUE, random.seed = 17, do.sparse = T, save.SNN = T, pc.use =1:40, reduction="harmony")
seurat.norm.harm2.test <- RunUMAP(seurat.norm.harm2.test, dims = 1:40, seed.use = 1843, reduction="harmony")
p_harm.clusters <- DimPlot(seurat.norm.harm2.test, label=T, reduction = "umap", group.by = "seurat_clusters", alpha = 0.5) + ggtitle("spots grouped by clusters")
p_harm.sample <- DimPlot(seurat.norm.harm2.test, label=F, reduction = "umap", group.by = "sample_name", alpha = 0.5) + ggtitle("spots grouped by sample")
p_harm.slide <- DimPlot(seurat.norm.harm2.test, label=F, reduction = "umap", group.by = "slide_id", alpha = 0.5) + ggtitle("spots grouped by slide_id")
p_harm.array <- DimPlot(seurat.norm.harm2.test, label=F, reduction = "umap", group.by = "sub_array", alpha = 0.5) + ggtitle("spots grouped by sub array")
p4 <- (p_harm.clusters - p_harm.sample) / (p_harm.slide - p_harm.array)
p4

Image

I am wondering why this is happening. Apparently something happens if I run Harmony AFTER the clustering step (even though the results have been generated and saved in different objects).

Hopefully somebody can help me.

Mario

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant