Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: ScaleData Not Working with BPCells #9676

Open
EliseCoopman opened this issue Feb 10, 2025 · 9 comments
Open

Issue: ScaleData Not Working with BPCells #9676

EliseCoopman opened this issue Feb 10, 2025 · 9 comments

Comments

@EliseCoopman
Copy link

Hi,

I'm working with Seurat v5.1.0, and I encountered an issue when attempting to regress out nFeature_RNA during the data scaling step in my workflow. My code:

seuratobj_splitted <- SplitObject(seuratobj, split.by = "sample_id")

# Remove lowly expressed genes
seuratobj_splitted <- lapply(seuratobj_splitted, function(x) {
    expressed_genes <- rowSums(x[["RNA"]]$counts > 0) >= 10 
    expressed_genes <- names(expressed_genes)[expressed_genes] 
    subset(x, features = expressed_genes)
})

for (lib in names(seuratobj_splitted)) {
    seuratobj_splitted[[lib]]@project.name <- lib
}

# Merge into a single object
seuratobj <- merge(x = seuratobj_splitted[[1]], y = seuratobj_splitted[-1])

seuratobj <- NormalizeData(seuratobj)
seuratobj <- FindVariableFeatures(seuratobj, selection.method = "vst", nfeatures = 2000)
seuratobj <- ScaleData(seuratobj, vars.to.regress = "nFeature_RNA")

seuratobj <- RunPCA(seuratobj)
seuratobj <- FindNeighbors(seuratobj, dims = 1:25, reduction = "pca")
seuratobj <- FindClusters(seuratobj, resolution = 1, cluster.name = "cluster.unintegrated")
seuratobj <- RunUMAP(seuratobj, dims = 1:25, reduction = "pca", reduction.name = "umap.unintegrated")

# Harmony integration
seuratobj <- IntegrateLayers(object = seuratobj, 
                             method = HarmonyIntegration, 
                             orig.reduction = "pca", 
                             new.reduction = "harmony", 
                             verbose = TRUE, 
                             npcs = 25)

seuratobj <- FindNeighbors(seuratobj, reduction = "harmony", dims = 1:25)
seuratobj <- FindClusters(seuratobj, resolution = 1, cluster.name = "cluster.integrated")
seuratobj <- RunUMAP(seuratobj, reduction = "harmony", dims = 1:25, reduction.name = "umap.integrated")

Despite specifying vars.to.regress = "nFeature_RNA" in the ScaleData step, I observed no changes in the scale.data layer or the created dimplots compared to not regressing out any source of variance.

  • The scaled values remain exactly the same whether or not regression is performed.
  • I clearly see the same nFeature_RNA gradient in my dimplots, regardless of regressing for it.
    I also attempted to regress out multiple sources of variance (overfitting intentionally), but don't observe any changes.

In seurat version 4, regressing out any sources of variance was perfomed after integration, but this doesn't seem to work anymore in version 5 (there is no integrated assay anymore?).

Questions:

  • Is this the correct approach for regressing out any sources of variance in seurat version 5?
  • Could there be an issue in how Seurat v5.1.0 handles regression during ScaleData?

Thank you!

@BaukjeBij
Copy link

@EliseCoopman same problem! Would be nice to have some clarification regarding why this happens as well as a resolution!

@samuel-marsh
Copy link
Collaborator

Hi @EliseCoopman @BaukjeBij,

Not member of dev team but hopefully can be helpful. Can you provide reproducible example using ifnb SeuratData datasets?

I'm unable to replicate issues with scale data being identical and resulting plots being the same using the ifnb dataset. See reprex below.

Best,
Sam

library(tidyverse)
#> Warning: package 'lubridate' was built under R version 4.4.1
library(Seurat)
#> Warning: package 'Seurat' was built under R version 4.4.1
#> Loading required package: SeuratObject
#> Loading required package: sp
#> Warning: package 'sp' was built under R version 4.4.1
#> 'SeuratObject' was built with package 'Matrix' 1.7.0 but the current
#> version is 1.7.2; it is recomended that you reinstall 'SeuratObject' as
#> the ABI for 'Matrix' may have changed
#> 
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t
library(scCustomize)
#> scCustomize v3.0.1.8
#> If you find the scCustomize useful please cite.
#> See 'samuel-marsh.github.io/scCustomize/articles/FAQ.html' for citation info.

ifnb <- ifnb.SeuratData::ifnb
ifnb <- UpdateSeuratObject(ifnb)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version
ifnb <- Split_Layers(ifnb, split.by = "stim")
#> • Splitting layers within assay: RNA into 2 parts by "stim"
#> ℹ RNA is not Assay5, converting to Assay5 before splitting.

ifnb <- NormalizeData(ifnb)
#> Normalizing layer: counts.CTRL
#> Normalizing layer: counts.STIM
ifnb <- FindVariableFeatures(ifnb)
#> Finding variable features for layer counts.CTRL
#> Finding variable features for layer counts.STIM
ifnb <- ScaleData(ifnb)
#> Centering and scaling data matrix
ifnb <- RunPCA(ifnb)
#> PC_ 1 
#> Positive:  TYROBP, C15orf48, FCER1G, CST3, SOD2, ANXA5, FTL, TYMP, TIMP1, CD63 
#>     LGALS1, CTSB, S100A4, LGALS3, KYNU, PSAP, FCN1, NPC2, ANXA2, S100A11 
#>     IGSF6, LYZ, SPI1, APOBEC3A, CD68, CTSL, SDCBP, NINJ1, HLA-DRA, CCL2 
#> Negative:  NPM1, CCR7, GIMAP7, LTB, CD7, SELL, CD2, TMSB4X, TRAT1, IL7R 
#>     IL32, RHOH, ITM2A, RGCC, LEF1, CD3G, ALOX5AP, CREM, NHP2, PASK 
#>     MYC, SNHG8, TSC22D3, GPR171, BIRC3, NOP58, RARRES3, CD27, SRM, CD8B 
#> PC_ 2 
#> Positive:  ISG15, ISG20, IFIT3, IFIT1, LY6E, TNFSF10, IFIT2, MX1, IFI6, RSAD2 
#>     CXCL10, OAS1, CXCL11, IFITM3, MT2A, OASL, TNFSF13B, IDO1, IL1RN, APOBEC3A 
#>     GBP1, CCL8, HERC5, FAM26F, GBP4, HES4, WARS, VAMP5, DEFB1, XAF1 
#> Negative:  IL8, CLEC5A, CD14, VCAN, S100A8, IER3, MARCKSL1, IL1B, PID1, CD9 
#>     GPX1, PHLDA1, INSIG1, PLAUR, PPIF, THBS1, OSM, SLC7A11, GAPDH, CTB-61M7.2 
#>     LIMS1, S100A9, GAPT, CXCL3, ACTB, C19orf59, CEBPB, OLR1, MGST1, FTH1 
#> PC_ 3 
#> Positive:  HLA-DQA1, CD83, HLA-DQB1, CD74, HLA-DPA1, HLA-DRA, HLA-DRB1, HLA-DPB1, SYNGR2, IRF8 
#>     CD79A, MIR155HG, HERPUD1, REL, HLA-DMA, MS4A1, HSP90AB1, FABP5, TVP23A, ID3 
#>     CCL22, EBI3, TSPAN13, PMAIP1, TCF4, PRMT1, NME1, HSPE1, HSPD1, CD70 
#> Negative:  ANXA1, GIMAP7, TMSB4X, CD7, CD2, RARRES3, MT2A, IL32, GNLY, PRF1 
#>     NKG7, CCL5, TRAT1, RGCC, S100A9, KLRD1, CCL2, GZMH, GZMA, CD3G 
#>     S100A8, CTSW, CCL7, ITM2A, HPSE, FGFBP2, CTSL, GPR171, CCL8, OASL 
#> PC_ 4 
#> Positive:  NKG7, GZMB, GNLY, CST7, CCL5, PRF1, CLIC3, KLRD1, GZMH, GZMA 
#>     APOBEC3G, CTSW, FGFBP2, KLRC1, FASLG, C1orf21, HOPX, CXCR3, SH2D1B, LINC00996 
#>     TNFRSF18, SPON2, RARRES3, GCHFR, SH2D2A, IGFBP7, ID2, C12orf75, XCL2, RAMP1 
#> Negative:  CCR7, LTB, SELL, LEF1, IL7R, ADTRP, TRAT1, PASK, MYC, NPM1 
#>     SOCS3, TSC22D3, TSHZ2, HSP90AB1, SNHG8, GIMAP7, PIM2, HSPD1, CD3G, TXNIP 
#>     RHOH, GBP1, C12orf57, CA6, PNRC1, CMSS1, CD27, SESN3, NHP2, BIRC3 
#> PC_ 5 
#> Positive:  CCL2, CCL7, CCL8, PLA2G7, LMNA, S100A9, SDS, TXN, CSTB, ATP6V1F 
#>     CCR1, EMP1, CAPG, CCR5, TPM4, IDO1, MGST1, HPSE, CTSB, LILRB4 
#>     RSAD2, HSPA1A, VIM, CCNA1, CTSL, GCLM, PDE4DIP, SGTB, SLC7A11, FABP5 
#> Negative:  VMO1, FCGR3A, MS4A4A, CXCL16, MS4A7, PPM1N, HN1, LST1, SMPDL3A, ATP1B3 
#>     CASP5, CDKN1C, CH25H, AIF1, PLAC8, SERPINA1, LRRC25, CD86, HCAR3, GBP5 
#>     TMSB4X, RP11-290F20.3, RGS19, VNN2, ADA, LILRA5, STXBP2, C3AR1, PILRA, FCGR3B
ifnb <- suppressMessages(IntegrateLayers(object = ifnb, 
                             method = HarmonyIntegration, 
                             orig.reduction = "pca", 
                             new.reduction = "harmony", 
                             verbose = TRUE, 
                             npcs = 25))
#> Warning in harmony::HarmonyMatrix(data_mat = Embeddings(object = orig), :
#> HarmonyMatrix is deprecated and will be removed in the future from the API in
#> the future
#> Warning: Warning: The parameters do_pca and npcs are deprecated. They will be ignored for this function call and please remove parameters do_pca and npcs and pass to harmony cell_embeddings directly.
#> This warning is displayed once per session.
#> Warning: Warning: The parameter tau is deprecated. It will be ignored for this function call and please remove parameter tau in future function calls. Advanced users can set value of parameter tau by using parameter .options and function harmony_options().
#> This warning is displayed once per session.
#> Warning: Warning: The parameter block.size is deprecated. It will be ignored for this function call and please remove parameter block.size in future function calls. Advanced users can set value of parameter block.size by using parameter .options and function harmony_options().
#> This warning is displayed once per session.
#> Warning: Warning: The parameter max.iter.harmony is replaced with parameter max_iter. It will be ignored for this function call and please use parameter max_iter in future function calls.
#> This warning is displayed once per session.
#> Warning: Warning: The parameter max.iter.cluster is deprecated. It will be ignored for this function call and please remove parameter max.iter.cluster in future function calls. Advanced users can set value of parameter max.iter.cluster by using parameter .options and function harmony_options().
#> This warning is displayed once per session.
#> Warning: Warning: The parameter epsilon.cluster is deprecated. It will be ignored for this function call and please remove parameter epsilon.cluster in future function calls. Advanced users can set value of parameter epsilon.cluster by using parameter .options and function harmony_options().
#> This warning is displayed once per session.
#> Warning: Warning: The parameter epsilon.harmony is deprecated. It will be ignored for this function call and please remove parameter epsilon.harmony in future function calls. If users want to control if harmony would stop early or not, use parameter early_stop. Advanced users can set value of parameter epsilon.harmony by using parameter .options and function harmony_options().
#> This warning is displayed once per session.
ifnb <- JoinLayers(ifnb)

ifnb <- FindNeighbors(ifnb, reduction = "harmony", dims = 1:30)
#> Computing nearest neighbor graph
#> Computing SNN
ifnb <- FindClusters(ifnb, resolution = 1)
#> Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
#> 
#> Number of nodes: 13999
#> Number of edges: 532566
#> 
#> Running Louvain algorithm...
#> Maximum modularity in 10 random starts: 0.8594
#> Number of communities: 17
#> Elapsed time: 2 seconds
ifnb <- RunUMAP(ifnb, dims = 1:30, reduction = "harmony")
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session
#> 08:51:19 UMAP embedding parameters a = 0.9922 b = 1.112
#> 08:51:19 Read 13999 rows and found 30 numeric columns
#> 08:51:19 Using Annoy for neighbor search, n_neighbors = 30
#> 08:51:19 Building Annoy index with metric = cosine, n_trees = 50
#> 0%   10   20   30   40   50   60   70   80   90   100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 08:51:21 Writing NN index file to temp file /var/folders/92/rwb8659d3jl658jf5nmmvw580000gn/T//RtmplRJKQj/file1623b4b580368
#> 08:51:21 Searching Annoy index using 1 thread, search_k = 3000
#> 08:51:26 Annoy recall = 100%
#> 08:51:26 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
#> 08:51:27 Initializing from normalized Laplacian + noise (using RSpectra)
#> 08:51:27 Commencing optimization for 200 epochs, with 620186 positive edges
#> 08:51:38 Optimization finished

DimPlot(ifnb)

ifnb2 <- ifnb.SeuratData::ifnb
ifnb2 <- UpdateSeuratObject(ifnb2)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version
ifnb2 <- Split_Layers(ifnb2, split.by = "stim")
#> • Splitting layers within assay: RNA into 2 parts by "stim"
#> ℹ RNA is not Assay5, converting to Assay5 before splitting.

ifnb2 <- NormalizeData(ifnb2)
#> Normalizing layer: counts.CTRL
#> Normalizing layer: counts.STIM
ifnb2 <- FindVariableFeatures(ifnb2)
#> Finding variable features for layer counts.CTRL
#> Finding variable features for layer counts.STIM
ifnb2 <- ScaleData(ifnb2, vars.to.regress = "nFeature_RNA")
#> Regressing out nFeature_RNA
#> Centering and scaling data matrix
ifnb2 <- RunPCA(ifnb2)
#> PC_ 1 
#> Positive:  TYROBP, C15orf48, FCER1G, SOD2, CST3, ANXA5, FTL, TYMP, TIMP1, CD63 
#>     CTSB, FCN1, LGALS1, S100A4, LGALS3, NPC2, KYNU, PSAP, APOBEC3A, IGSF6 
#>     S100A11, ANXA2, CCL2, LYZ, CTSL, CD68, SPI1, NINJ1, FTH1, S100A10 
#> Negative:  NPM1, CCR7, LTB, GIMAP7, SELL, CD7, TMSB4X, CD2, IL7R, TRAT1 
#>     RHOH, ITM2A, IL32, NHP2, CREM, RGCC, BIRC3, MYC, NOP58, SNHG8 
#>     GPR183, LEF1, SRM, ALOX5AP, TSC22D3, CD3G, SRSF2, HSPD1, PRMT1, NME1 
#> PC_ 2 
#> Positive:  ISG15, ISG20, IFIT3, IFIT1, LY6E, TNFSF10, IFIT2, MX1, RSAD2, CXCL10 
#>     IFI6, OAS1, CXCL11, IFITM3, TNFSF13B, MT2A, OASL, IDO1, IL1RN, APOBEC3A 
#>     GBP1, FAM26F, HERC5, CCL8, GBP4, WARS, HES4, VAMP5, SLAMF7, DEFB1 
#> Negative:  IL8, CLEC5A, CD14, VCAN, S100A8, IER3, MARCKSL1, IL1B, PID1, CD9 
#>     GPX1, PLAUR, INSIG1, PHLDA1, PPIF, THBS1, OSM, GAPDH, SLC7A11, S100A9 
#>     LIMS1, CTB-61M7.2, ACTB, GAPT, CXCL3, CEBPB, OLR1, C19orf59, MGST1, FTH1 
#> PC_ 3 
#> Positive:  HLA-DQA1, HLA-DRA, HLA-DRB1, CD74, HLA-DQB1, CD83, CD79A, HLA-DPA1, HLA-DPB1, MS4A1 
#>     IRF8, MIR155HG, SYNGR2, ID3, HLA-DMA, HERPUD1, REL, BANK1, HSP90AB1, TVP23A 
#>     CCR7, HVCN1, FABP5, KIAA0226L, NME1, TCF4, TNFRSF13B, CD70, TSPAN13, BASP1 
#> Negative:  NKG7, GNLY, ANXA1, TMSB4X, PRF1, CCL5, GIMAP7, GZMB, CD7, GZMH 
#>     RARRES3, GZMA, KLRD1, CTSW, FGFBP2, CD2, CLIC3, CST7, FASLG, IL32 
#>     KLRC1, SH2D2A, APOBEC3G, MT2A, CLEC2B, CXCR3, RGCC, CD8A, GCHFR, C1orf21 
#> PC_ 4 
#> Positive:  NKG7, GZMB, GNLY, CST7, CCL5, PRF1, CLIC3, KLRD1, APOBEC3G, GZMH 
#>     GZMA, CTSW, FGFBP2, KLRC1, FASLG, C1orf21, CD74, HLA-DPB1, HOPX, TNFRSF18 
#>     CXCR3, HLA-DPA1, SH2D1B, HLA-DQA1, LINC00996, RAMP1, CD38, HLA-DQB1, SPON2, HLA-DRB1 
#> Negative:  LTB, SELL, GIMAP7, LEF1, IL7R, TRAT1, CCR7, ADTRP, PASK, TSC22D3 
#>     CD3G, TSHZ2, SOCS3, SNHG8, PIM2, NPM1, TXNIP, ICOS, SCML1, MYC 
#>     PNRC1, C12orf57, SNHG12, GBP1, RHOH, AIF1, CD27, IL23A, MS4A4A, GPR171 
#> PC_ 5 
#> Positive:  CCL2, CCL7, CCL8, PLA2G7, LMNA, SDS, S100A9, CSTB, TXN, EMP1 
#>     CCR1, ATP6V1F, HSPA1A, CTSL, TPM4, CCR5, CTSB, RSAD2, LILRB4, HPSE 
#>     MGST1, VIM, CAPG, IDO1, SGTB, CCNA1, PDE4DIP, GCLM, CREG1, CD163 
#> Negative:  VMO1, FCGR3A, MS4A4A, CXCL16, MS4A7, PPM1N, HN1, LST1, ATP1B3, SMPDL3A 
#>     CASP5, CDKN1C, AIF1, CD86, CH25H, SERPINA1, PLAC8, LRRC25, HCAR3, GBP5 
#>     HLA-DPA1, RGS19, RP11-290F20.3, TMSB4X, ADA, STXBP2, VNN2, PILRA, C3AR1, LILRA5
ifnb2 <- suppressMessages(IntegrateLayers(object = ifnb2, 
                        method = HarmonyIntegration, 
                        orig.reduction = "pca", 
                        new.reduction = "harmony", 
                        verbose = TRUE, 
                        npcs = 25))
#> Warning in harmony::HarmonyMatrix(data_mat = Embeddings(object = orig), :
#> HarmonyMatrix is deprecated and will be removed in the future from the API in
#> the future
ifnb2 <- JoinLayers(ifnb2)

ifnb2 <- FindNeighbors(ifnb2, reduction = "harmony", dims = 1:30)
#> Computing nearest neighbor graph
#> Computing SNN
ifnb2 <- FindClusters(ifnb2, resolution = 1)
#> Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
#> 
#> Number of nodes: 13999
#> Number of edges: 541750
#> 
#> Running Louvain algorithm...
#> Maximum modularity in 10 random starts: 0.8574
#> Number of communities: 16
#> Elapsed time: 2 seconds
ifnb2 <- RunUMAP(ifnb2, dims = 1:30, reduction = "harmony")
#> 08:53:30 UMAP embedding parameters a = 0.9922 b = 1.112
#> 08:53:30 Read 13999 rows and found 30 numeric columns
#> 08:53:30 Using Annoy for neighbor search, n_neighbors = 30
#> 08:53:30 Building Annoy index with metric = cosine, n_trees = 50
#> 0%   10   20   30   40   50   60   70   80   90   100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 08:53:32 Writing NN index file to temp file /var/folders/92/rwb8659d3jl658jf5nmmvw580000gn/T//RtmplRJKQj/file1623b7f47c77c
#> 08:53:32 Searching Annoy index using 1 thread, search_k = 3000
#> 08:53:37 Annoy recall = 100%
#> 08:53:38 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
#> 08:53:39 Initializing from normalized Laplacian + noise (using RSpectra)
#> 08:53:39 Commencing optimization for 200 epochs, with 627624 positive edges
#> 08:53:50 Optimization finished

DimPlot(ifnb2)

# scale.data not identical
identical(ifnb@assays$RNA@layers$scale.data, ifnb2@assays$RNA@layers$scale.data)
#> [1] FALSE

Created on 2025-02-11 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       macOS Monterey 12.7.6
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-02-11
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#>  quarto   1.5.57 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package          * version    date (UTC) lib source
#>  abind              1.4-8      2024-09-12 [1] CRAN (R 4.4.1)
#>  beeswarm           0.4.0      2021-06-01 [1] CRAN (R 4.4.0)
#>  circlize           0.4.16     2024-02-20 [1] CRAN (R 4.4.0)
#>  cli                3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
#>  cluster            2.1.8      2024-12-11 [1] CRAN (R 4.4.1)
#>  codetools          0.2-20     2024-03-31 [1] CRAN (R 4.4.0)
#>  colorspace         2.1-1      2024-07-26 [1] CRAN (R 4.4.0)
#>  cowplot            1.1.3      2024-01-22 [1] CRAN (R 4.4.0)
#>  curl               6.2.0      2025-01-23 [1] CRAN (R 4.4.1)
#>  data.table         1.16.4     2024-12-06 [1] CRAN (R 4.4.1)
#>  deldir             2.0-4      2024-02-28 [1] CRAN (R 4.4.0)
#>  digest             0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
#>  dotCall64          1.2        2024-10-04 [1] CRAN (R 4.4.1)
#>  dplyr            * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate           1.0.3      2025-01-10 [1] CRAN (R 4.4.1)
#>  farver             2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
#>  fastDummies        1.7.5      2025-01-20 [1] CRAN (R 4.4.1)
#>  fastmap            1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  fitdistrplus       1.2-2      2025-01-07 [1] CRAN (R 4.4.1)
#>  forcats          * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
#>  fs                 1.6.5      2024-10-30 [1] CRAN (R 4.4.1)
#>  future             1.34.0     2024-07-29 [1] CRAN (R 4.4.0)
#>  future.apply       1.11.3     2024-10-27 [1] CRAN (R 4.4.1)
#>  generics           0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
#>  ggbeeswarm         0.7.2      2023-04-29 [1] CRAN (R 4.4.0)
#>  ggplot2          * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
#>  ggprism            1.0.5      2024-03-21 [1] CRAN (R 4.4.0)
#>  ggrastr            1.0.2      2023-06-01 [1] CRAN (R 4.4.0)
#>  ggrepel            0.9.6      2024-09-07 [1] CRAN (R 4.4.1)
#>  ggridges           0.5.6      2024-01-23 [1] CRAN (R 4.4.0)
#>  GlobalOptions      0.1.2      2020-06-10 [1] CRAN (R 4.4.0)
#>  globals            0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
#>  glue               1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  goftest            1.2-3      2021-10-07 [1] CRAN (R 4.4.0)
#>  gridExtra          2.3        2017-09-09 [1] CRAN (R 4.4.0)
#>  gtable             0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
#>  harmony            1.2.3      2024-11-27 [1] CRAN (R 4.4.1)
#>  hms                1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools          0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
#>  htmlwidgets        1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
#>  httpuv             1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
#>  httr               1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
#>  ica                1.0-3      2022-07-08 [1] CRAN (R 4.4.0)
#>  ifnb.SeuratData    3.1.0      2024-05-01 [1] local
#>  igraph             2.1.4      2025-01-23 [1] CRAN (R 4.4.1)
#>  irlba              2.3.5.1    2022-10-03 [1] CRAN (R 4.4.0)
#>  janitor            2.2.1      2024-12-22 [1] CRAN (R 4.4.1)
#>  jsonlite           1.8.9      2024-09-20 [1] CRAN (R 4.4.1)
#>  KernSmooth         2.23-26    2025-01-01 [1] CRAN (R 4.4.1)
#>  knitr              1.49       2024-11-08 [1] CRAN (R 4.4.1)
#>  labeling           0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
#>  later              1.4.1      2024-11-27 [1] CRAN (R 4.4.1)
#>  lattice            0.22-6     2024-03-20 [1] CRAN (R 4.4.0)
#>  lazyeval           0.2.2      2019-03-15 [1] CRAN (R 4.4.0)
#>  lifecycle          1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  listenv            0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
#>  lmtest             0.9-40     2022-03-21 [1] CRAN (R 4.4.0)
#>  lubridate        * 1.9.4      2024-12-08 [1] CRAN (R 4.4.1)
#>  magrittr           2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
#>  MASS               7.3-64     2025-01-04 [1] CRAN (R 4.4.1)
#>  Matrix             1.7-2      2025-01-23 [1] CRAN (R 4.4.1)
#>  matrixStats        1.5.0      2025-01-07 [1] CRAN (R 4.4.1)
#>  mime               0.12       2021-09-28 [1] CRAN (R 4.4.0)
#>  miniUI             0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
#>  munsell            0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
#>  nlme               3.1-167    2025-01-27 [1] CRAN (R 4.4.1)
#>  paletteer          1.6.0      2024-01-21 [1] CRAN (R 4.4.0)
#>  parallelly         1.42.0     2025-01-30 [1] CRAN (R 4.4.1)
#>  patchwork          1.3.0      2024-09-16 [1] CRAN (R 4.4.1)
#>  pbapply            1.7-2      2023-06-27 [1] CRAN (R 4.4.0)
#>  pillar             1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig          2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
#>  plotly             4.10.4     2024-01-13 [1] CRAN (R 4.4.0)
#>  plyr               1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
#>  png                0.1-8      2022-11-29 [1] CRAN (R 4.4.0)
#>  polyclip           1.10-7     2024-07-23 [1] CRAN (R 4.4.0)
#>  progressr          0.15.1     2024-11-22 [1] CRAN (R 4.4.1)
#>  promises           1.3.2      2024-11-28 [1] CRAN (R 4.4.1)
#>  purrr            * 1.0.4      2025-02-05 [1] CRAN (R 4.4.0)
#>  R6                 2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
#>  RANN               2.6.2      2024-08-25 [1] CRAN (R 4.4.1)
#>  RColorBrewer       1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
#>  Rcpp               1.0.14     2025-01-12 [1] CRAN (R 4.4.1)
#>  RcppAnnoy          0.0.22     2024-01-23 [1] CRAN (R 4.4.0)
#>  RcppHNSW           0.6.0      2024-02-04 [1] CRAN (R 4.4.0)
#>  readr            * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
#>  rematch2           2.1.2      2020-05-01 [1] CRAN (R 4.4.0)
#>  reprex             2.1.1      2024-07-06 [1] CRAN (R 4.4.0)
#>  reshape2           1.4.4      2020-04-09 [1] CRAN (R 4.4.0)
#>  reticulate         1.40.0     2024-11-15 [1] CRAN (R 4.4.1)
#>  RhpcBLASctl        0.23-42    2023-02-11 [1] CRAN (R 4.4.0)
#>  rlang              1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown          2.29       2024-11-04 [1] CRAN (R 4.4.1)
#>  ROCR               1.0-11     2020-05-02 [1] CRAN (R 4.4.0)
#>  RSpectra           0.16-2     2024-07-18 [1] CRAN (R 4.4.0)
#>  rstudioapi         0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
#>  Rtsne              0.17       2023-12-07 [1] CRAN (R 4.4.0)
#>  scales             1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
#>  scattermore        1.2        2023-06-12 [1] CRAN (R 4.4.0)
#>  scCustomize      * 3.0.1.0008 2025-02-06 [1] Github (samuel-marsh/scCustomize@18ed2a4)
#>  sctransform        0.4.1.9001 2025-02-06 [1] Github (satijalab/sctransform@515beac)
#>  sessioninfo        1.2.3      2025-02-05 [1] CRAN (R 4.4.1)
#>  Seurat           * 5.2.1      2025-01-24 [1] CRAN (R 4.4.1)
#>  SeuratObject     * 5.0.2      2024-05-08 [1] CRAN (R 4.4.0)
#>  shape              1.4.6.1    2024-02-23 [1] CRAN (R 4.4.0)
#>  shiny              1.10.0     2024-12-14 [1] CRAN (R 4.4.1)
#>  snakecase          0.11.1     2023-08-27 [1] CRAN (R 4.4.0)
#>  sp               * 2.2-0      2025-02-01 [1] CRAN (R 4.4.1)
#>  spam               2.11-1     2025-01-20 [1] CRAN (R 4.4.1)
#>  spatstat.data      3.1-4      2024-11-15 [1] CRAN (R 4.4.1)
#>  spatstat.explore   3.3-4      2025-01-08 [1] CRAN (R 4.4.1)
#>  spatstat.geom      3.3-5      2025-01-18 [1] CRAN (R 4.4.1)
#>  spatstat.random    3.3-2      2024-09-18 [1] CRAN (R 4.4.1)
#>  spatstat.sparse    3.1-0      2024-06-21 [1] CRAN (R 4.4.0)
#>  spatstat.univar    3.1-1      2024-11-05 [1] CRAN (R 4.4.1)
#>  spatstat.utils     3.1-2      2025-01-08 [1] CRAN (R 4.4.1)
#>  stringi            1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr          * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
#>  survival           3.8-3      2024-12-17 [1] CRAN (R 4.4.1)
#>  tensor             1.5        2012-05-05 [1] CRAN (R 4.4.0)
#>  tibble           * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr            * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect         1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
#>  tidyverse        * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
#>  timechange         0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
#>  tzdb               0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
#>  uwot               0.2.2      2024-04-21 [1] CRAN (R 4.4.0)
#>  vctrs              0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  vipor              0.4.7      2023-12-18 [1] CRAN (R 4.4.0)
#>  viridisLite        0.4.2      2023-05-02 [1] CRAN (R 4.4.0)
#>  withr              3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun               0.50.5     2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
#>  xml2               1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
#>  xtable             1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml               2.3.10     2024-07-26 [1] CRAN (R 4.4.0)
#>  zoo                1.8-12     2023-04-13 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@JABioinf
Copy link

Hi, I haven't look at a reproducible example, but I have updated to Seurat v5.2 recently and observed the same problem with ScaleData.
3 notable problems:

  1. ScaleData doesn't print anything even with verbose on
  2. doesn't raise an error when selecting a wrong metadata column to regress,
  3. doesn't seem to regress anything when looking at the resulting UMAP (e.g. a mitochondrial-gene cluster appears even after asking for percent.mito to be regressed, same with cell cycle). The function is actually so fast that it most likely isn't doing any regression.

Note that using the SCT processing do regress out the metadata columns, so it's specific to ScaleData.

Will try to do some debugging on my side why or if another package than Seurat is involved, but that's concerning enough to see if other people figured what the problem was?

SessionInfo():

R version 4.2.3 (2023-03-15)

Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.9 (Ootpa)

Matrix products: default
BLAS/LAPACK: ~/conda_envs/lib/libopenblasp-r0.3.25.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8      
 [8] LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stringr_1.5.1         tibble_3.2.1          tidyr_1.3.0           ggplot2_3.5.1         dplyr_1.1.4           cowplot_1.1.2         patchwork_1.2.0      
 [8] harmony_1.0.1         Rcpp_1.0.13           SeuratDisk_0.0.0.9021 SeuratData_0.2.2.9001 Seurat_5.2.1          SeuratObject_5.0.2    sp_2.1-4             
[15] BPCells_0.3.0        

loaded via a namespace (and not attached):
  [1] spam_2.10-0                 plyr_1.8.9                  igraph_1.6.0                lazyeval_0.2.2              splines_4.2.3              
  [6] RcppHNSW_0.5.0              BiocParallel_1.32.6         listenv_0.9.1               scattermore_1.2             GenomeInfoDb_1.34.9        
 [11] digest_0.6.36               htmltools_0.5.7             fansi_1.0.6                 magrittr_2.0.3              tensor_1.5                 
 [16] cluster_2.1.6               ROCR_1.0-11                 Biostrings_2.66.0           globals_0.16.3              matrixStats_1.1.0          
 [21] R.utils_2.12.3              spatstat.sparse_3.0-3       colorspace_2.1-0            rappdirs_0.3.3              ggrepel_0.9.5              
 [26] xfun_0.49                   crayon_1.5.3                RCurl_1.98-1.14             jsonlite_1.8.8              progressr_0.14.0           
 [31] spatstat.data_3.0-3         survival_3.5-7              zoo_1.8-12                  glue_1.7.0                  polyclip_1.10-6            
 [36] gtable_0.3.4                zlibbioc_1.44.0             XVector_0.38.0              DelayedArray_0.24.0         future.apply_1.11.2        
 [41] BiocGenerics_0.44.0         abind_1.4-5                 scales_1.3.0                spatstat.random_3.2-2       miniUI_0.1.1.1             
 [46] viridisLite_0.4.2           xtable_1.8-4                reticulate_1.40.0           bit_4.0.5                   dotCall64_1.1-1            
 [51] stats4_4.2.3                htmlwidgets_1.6.4           httr_1.4.7                  RColorBrewer_1.1-3          ellipsis_0.3.2             
 [56] ica_1.0-3                   XML_3.99-0.15               pkgconfig_2.0.3             R.methodsS3_1.8.2           farver_2.1.1               
 [61] uwot_0.1.16                 deldir_2.0-2                utf8_1.2.4                  tidyselect_1.2.0            labeling_0.4.3             
 [66] rlang_1.1.4                 reshape2_1.4.4              later_1.3.2                 munsell_0.5.0               tools_4.2.3                
 [71] cli_3.6.3                   generics_0.1.3              ggridges_0.5.5              evaluate_0.23               fastmap_1.1.1              
 [76] yaml_2.3.8                  goftest_1.2-3               RhpcBLASctl_0.23-42         knitr_1.45                  bit64_4.0.5                
 [81] fitdistrplus_1.1-11         purrr_1.0.2                 RANN_2.6.1                  sparseMatrixStats_1.10.0    pbapply_1.7-2              
 [86] future_1.34.0               nlme_3.1-164                mime_0.12                   R.oo_1.25.0                 hdf5r_1.3.11               
 [91] compiler_4.2.3              rstudioapi_0.15.0           plotly_4.10.4               png_0.1-8                   spatstat.utils_3.1-0       
 [96] glmGamPoi_1.10.2            stringi_1.8.3               RSpectra_0.16-1             lattice_0.22-5              Matrix_1.6-5               
[101] vctrs_0.6.5                 pillar_1.9.0                lifecycle_1.0.4             spatstat.geom_3.2-7         lmtest_0.9-40              
[106] RcppAnnoy_0.0.21            data.table_1.14.10          bitops_1.0-7                irlba_2.3.5.1               rtracklayer_1.58.0         
[111] httpuv_1.6.13               GenomicRanges_1.50.2        BiocIO_1.8.0                R6_2.5.1                    promises_1.2.1             
[116] KernSmooth_2.23-22          gridExtra_2.3               IRanges_2.32.0              parallelly_1.38.0           codetools_0.2-19           
[121] fastDummies_1.7.3           MASS_7.3-60                 SummarizedExperiment_1.28.0 rjson_0.2.21                withr_2.5.2                
[126] GenomicAlignments_1.34.1    Rsamtools_2.14.0            sctransform_0.4.1           S4Vectors_0.36.2            GenomeInfoDbData_1.2.9     
[131] parallel_4.2.3              grid_4.2.3                  DelayedMatrixStats_1.20.0   rmarkdown_2.25              MatrixGenerics_1.10.0      
[136] Rtsne_0.17                  spatstat.explore_3.2-5      Biobase_2.58.0              shiny_1.8.0                 restfulr_0.0.15 `

@samuel-marsh
Copy link
Collaborator

Hi @JABioinf,

If you can provide reproducible example that would be great. My reprex above was with Seurat 5.2.1 and could not replicate the issue. I can look into the other issues you raised though.

Best,
Sam

@samuel-marsh
Copy link
Collaborator

Hi @JABioinf,

I cannot replicate either of the other issues that you mentioned. The function properly prints progress (progress bars don't show in reprex but do in console) and errors when specifying a variable that isn't present in meta.data. See reprex below:

Best,
Sam

library(tidyverse)
#> Warning: package 'lubridate' was built under R version 4.4.1
library(Seurat)
#> Warning: package 'Seurat' was built under R version 4.4.1
#> Loading required package: SeuratObject
#> Loading required package: sp
#> Warning: package 'sp' was built under R version 4.4.1
#> 'SeuratObject' was built with package 'Matrix' 1.7.0 but the current
#> version is 1.7.2; it is recomended that you reinstall 'SeuratObject' as
#> the ABI for 'Matrix' may have changed
#> 
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t


pbmc <- pbmc3k.SeuratData::pbmc3k
pbmc <- UpdateSeuratObject(pbmc)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version

pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc)
# Prints output
pbmc <- ScaleData(pbmc, vars.to.regress = "nFeature_RNA")
#> Regressing out nFeature_RNA
#> Centering and scaling data matrix

# Fails when column not present
pbmc <- ScaleData(pbmc, vars.to.regress = "random")
#> Error: None of the requested variables to regress are present in the object.

Created on 2025-02-13 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       macOS Monterey 12.7.6
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-02-13
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#>  quarto   1.5.57 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version    date (UTC) lib source
#>  abind               1.4-8      2024-09-12 [1] CRAN (R 4.4.1)
#>  cli                 3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
#>  cluster             2.1.8      2024-12-11 [1] CRAN (R 4.4.1)
#>  codetools           0.2-20     2024-03-31 [1] CRAN (R 4.4.0)
#>  colorspace          2.1-1      2024-07-26 [1] CRAN (R 4.4.0)
#>  cowplot             1.1.3      2024-01-22 [1] CRAN (R 4.4.0)
#>  data.table          1.16.4     2024-12-06 [1] CRAN (R 4.4.1)
#>  deldir              2.0-4      2024-02-28 [1] CRAN (R 4.4.0)
#>  digest              0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
#>  dotCall64           1.2        2024-10-04 [1] CRAN (R 4.4.1)
#>  dplyr             * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate            1.0.3      2025-01-10 [1] CRAN (R 4.4.1)
#>  farver              2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
#>  fastDummies         1.7.5      2025-01-20 [1] CRAN (R 4.4.1)
#>  fastmap             1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  fitdistrplus        1.2-2      2025-01-07 [1] CRAN (R 4.4.1)
#>  forcats           * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
#>  fs                  1.6.5      2024-10-30 [1] CRAN (R 4.4.1)
#>  future              1.34.0     2024-07-29 [1] CRAN (R 4.4.0)
#>  future.apply        1.11.3     2024-10-27 [1] CRAN (R 4.4.1)
#>  generics            0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
#>  ggplot2           * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
#>  ggrepel             0.9.6      2024-09-07 [1] CRAN (R 4.4.1)
#>  ggridges            0.5.6      2024-01-23 [1] CRAN (R 4.4.0)
#>  globals             0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
#>  glue                1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  goftest             1.2-3      2021-10-07 [1] CRAN (R 4.4.0)
#>  gridExtra           2.3        2017-09-09 [1] CRAN (R 4.4.0)
#>  gtable              0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
#>  hms                 1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools           0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
#>  htmlwidgets         1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
#>  httpuv              1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
#>  httr                1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
#>  ica                 1.0-3      2022-07-08 [1] CRAN (R 4.4.0)
#>  igraph              2.1.4      2025-01-23 [1] CRAN (R 4.4.1)
#>  irlba               2.3.5.1    2022-10-03 [1] CRAN (R 4.4.0)
#>  jsonlite            1.8.9      2024-09-20 [1] CRAN (R 4.4.1)
#>  KernSmooth          2.23-26    2025-01-01 [1] CRAN (R 4.4.1)
#>  knitr               1.49       2024-11-08 [1] CRAN (R 4.4.1)
#>  later               1.4.1      2024-11-27 [1] CRAN (R 4.4.1)
#>  lattice             0.22-6     2024-03-20 [1] CRAN (R 4.4.0)
#>  lazyeval            0.2.2      2019-03-15 [1] CRAN (R 4.4.0)
#>  lifecycle           1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  listenv             0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
#>  lmtest              0.9-40     2022-03-21 [1] CRAN (R 4.4.0)
#>  lubridate         * 1.9.4      2024-12-08 [1] CRAN (R 4.4.1)
#>  magrittr            2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
#>  MASS                7.3-64     2025-01-04 [1] CRAN (R 4.4.1)
#>  Matrix              1.7-2      2025-01-23 [1] CRAN (R 4.4.1)
#>  matrixStats         1.5.0      2025-01-07 [1] CRAN (R 4.4.1)
#>  mime                0.12       2021-09-28 [1] CRAN (R 4.4.0)
#>  miniUI              0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
#>  munsell             0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
#>  nlme                3.1-167    2025-01-27 [1] CRAN (R 4.4.1)
#>  parallelly          1.42.0     2025-01-30 [1] CRAN (R 4.4.1)
#>  patchwork           1.3.0      2024-09-16 [1] CRAN (R 4.4.1)
#>  pbapply             1.7-2      2023-06-27 [1] CRAN (R 4.4.0)
#>  pbmc3k.SeuratData   3.1.4      2024-05-01 [1] local
#>  pillar              1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig           2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
#>  plotly              4.10.4     2024-01-13 [1] CRAN (R 4.4.0)
#>  plyr                1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
#>  png                 0.1-8      2022-11-29 [1] CRAN (R 4.4.0)
#>  polyclip            1.10-7     2024-07-23 [1] CRAN (R 4.4.0)
#>  progressr           0.15.1     2024-11-22 [1] CRAN (R 4.4.1)
#>  promises            1.3.2      2024-11-28 [1] CRAN (R 4.4.1)
#>  purrr             * 1.0.4      2025-02-05 [1] CRAN (R 4.4.0)
#>  R6                  2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
#>  RANN                2.6.2      2024-08-25 [1] CRAN (R 4.4.1)
#>  RColorBrewer        1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
#>  Rcpp                1.0.14     2025-01-12 [1] CRAN (R 4.4.1)
#>  RcppAnnoy           0.0.22     2024-01-23 [1] CRAN (R 4.4.0)
#>  RcppHNSW            0.6.0      2024-02-04 [1] CRAN (R 4.4.0)
#>  readr             * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
#>  reprex              2.1.1      2024-07-06 [1] CRAN (R 4.4.0)
#>  reshape2            1.4.4      2020-04-09 [1] CRAN (R 4.4.0)
#>  reticulate          1.40.0     2024-11-15 [1] CRAN (R 4.4.1)
#>  rlang               1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown           2.29       2024-11-04 [1] CRAN (R 4.4.1)
#>  ROCR                1.0-11     2020-05-02 [1] CRAN (R 4.4.0)
#>  RSpectra            0.16-2     2024-07-18 [1] CRAN (R 4.4.0)
#>  rstudioapi          0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
#>  Rtsne               0.17       2023-12-07 [1] CRAN (R 4.4.0)
#>  scales              1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
#>  scattermore         1.2        2023-06-12 [1] CRAN (R 4.4.0)
#>  sctransform         0.4.1.9001 2025-02-06 [1] Github (satijalab/sctransform@515beac)
#>  sessioninfo         1.2.3      2025-02-05 [1] CRAN (R 4.4.1)
#>  Seurat            * 5.2.1      2025-01-24 [1] CRAN (R 4.4.1)
#>  SeuratObject      * 5.0.2      2024-05-08 [1] CRAN (R 4.4.0)
#>  shiny               1.10.0     2024-12-14 [1] CRAN (R 4.4.1)
#>  sp                * 2.2-0      2025-02-01 [1] CRAN (R 4.4.1)
#>  spam                2.11-1     2025-01-20 [1] CRAN (R 4.4.1)
#>  spatstat.data       3.1-4      2024-11-15 [1] CRAN (R 4.4.1)
#>  spatstat.explore    3.3-4      2025-01-08 [1] CRAN (R 4.4.1)
#>  spatstat.geom       3.3-5      2025-01-18 [1] CRAN (R 4.4.1)
#>  spatstat.random     3.3-2      2024-09-18 [1] CRAN (R 4.4.1)
#>  spatstat.sparse     3.1-0      2024-06-21 [1] CRAN (R 4.4.0)
#>  spatstat.univar     3.1-1      2024-11-05 [1] CRAN (R 4.4.1)
#>  spatstat.utils      3.1-2      2025-01-08 [1] CRAN (R 4.4.1)
#>  stringi             1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr           * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
#>  survival            3.8-3      2024-12-17 [1] CRAN (R 4.4.1)
#>  tensor              1.5        2012-05-05 [1] CRAN (R 4.4.0)
#>  tibble            * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr             * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect          1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
#>  tidyverse         * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
#>  timechange          0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
#>  tzdb                0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
#>  uwot                0.2.2      2024-04-21 [1] CRAN (R 4.4.0)
#>  vctrs               0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite         0.4.2      2023-05-02 [1] CRAN (R 4.4.0)
#>  withr               3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun                0.50.5     2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
#>  xtable              1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml                2.3.10     2024-07-26 [1] CRAN (R 4.4.0)
#>  zoo                 1.8-12     2023-04-13 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@EliseCoopman
Copy link
Author

EliseCoopman commented Feb 14, 2025

Hi @samuel-marsh @JABioinf,

Thank you for your reactions!
I can replicate your example @samuel-marsh, so I do think it is something more related to my seuratobject specifically.

I also have the same issues as @JABioinf;

  • no printing, even with verbose on (i also have no printing when using ScaleData without the vars.to.regress argument)
  • no error when I give a non-existing metadata column name

After running ScaleData, there is a scale.data layer created in my object. However, I noticed that the dimensions are 656x43802.
For running FindVariableFeatures, I have default parameters (= 2000 features), and since ScaleData uses this as input, I expected the dimensions to be 2000x43802 -> it seems that ScaleData is doing something, but not everything as I don't get the 'Centering and scaling data matrix' printed...

seuratobj <- FindVariableFeatures(seuratobj, selection.method = "vst", nfeatures = 2000)
seuratobj <- ScaleData(seuratobj)
Image

(I didn't mention this before, but maybe important; I'm using BPCells for efficient storage.)

Do you have a scale.data layer created @JABioinf? And what are its dimensions?

Thank you!!

All the best,

Elise

@JABioinf
Copy link

Hello @EliseCoopman & @samuel-marsh
This is definitely related to BPCells. Not the library itself but the Seurat object create after a open_matrix_dir for RNA counts.
Below is a reproducible error. Please let me know if you have the same thing and agree with my pipeline for BPCells.

library(BPCells) # BP cells version 0.3.0
library(Seurat) # Seurat version 5.2.1 (SeuratObject 5.0.2)

#SeuratData::AvailableData()
#SeuratData::InstallData("pbmc3k.SeuratData")

pbmc <- SeuratData::LoadData("pbmc3k.SeuratData")
pbmc <- UpdateSeuratObject(pbmc)

# Create a splitting variable
set.seed(123)
pbmc$condition <- sample(c("a","b","c"),size =  length(Cells(pbmc)) ,replace = T)

# Save for BPCells and open
mat <- GetAssayData(pbmc,assay = "RNA",layer = "counts")
matrixdir <- "~/tempdir/misc"
write_matrix_dir(
  mat = convert_matrix_type((mat),type="uint32_t") ,
  dir = file.path(matrixdir,"temp"),
  overwrite = T
)
mat <- open_matrix_dir(file.path(matrixdir,"temp"))
pbmc2 <- CreateSeuratObject(counts = mat, project = "pbmc_BPcell",meta.data = [email protected])

# Process both
pbmc[["RNA"]] <- split(pbmc[["RNA"]], f = pbmc$condition)
pbmc2[["RNA"]] <- split(pbmc2[["RNA"]], f = pbmc2$condition)
print(pbmc)
print(pbmc2)

pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc)
# This Prints output:
pbmc <- ScaleData(pbmc, vars.to.regress = "nFeature_RNA")
# This throws an error:
pbmc <- ScaleData(pbmc, vars.to.regress = "random") 

pbmc2 <- NormalizeData(pbmc2)
pbmc2 <- FindVariableFeatures(pbmc2)
# Doesn't Print outputs 
pbmc2 <- ScaleData(pbmc2, vars.to.regress = "nFeature_RNA")
# Doesn't throw an or error
pbmc2 <- ScaleData(pbmc2, vars.to.regress = "random")  

pbmc2
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features, 2000 variable features)
7 layers present: counts.c, counts.b, counts.a, data.c, data.b, data.a, scale.data
pbmc
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features, 2000 variable features)
7 layers present: counts.c, counts.b, counts.a, data.c, data.b, data.a, scale.data

It is hard to judge, but processing with both doesn't generate similar UMAP plots or PCA components. In theory shouldn't both pipelines generate exactly the same results?

Side note, splitting by pbmc$annotation succeeded with classic pipeline but failed at the FindVariableFeatures(pbmc2) showing that the 2 pipelines are definitely not equivalent.

@EliseCoopman
Copy link
Author

Hi @JABioinf,

I haven't tried the reproducible data yet, but I was able to identify the same issue in my own data; ScaleData does not perform properly when using BPCells (or at least different compared to an object that doesn't make use of BPCells). I was able to get ScaleData to work when using a 'non BPCells object'.

I would indeed expect both pipelines to generate exactly the same results...

For now, I don't know how to fix this. I will also put this issue on the BPCells github page. Let me know if you found a way to solve this problem!

For clarity reasons, I will change the title of this issue :).

All the best!

Elise

@EliseCoopman EliseCoopman changed the title Issue: ScaleData Not Regressing Out nFeature_RNA in Seurat v5.1.0 Issue: ScaleData Not Working with BPCells Feb 18, 2025
@bnprks
Copy link
Contributor

bnprks commented Feb 22, 2025

Just chiming in here as a BPCells author: the functionality to regress out data with a linear model does exist in BPCells via the function BPCells::regress_out(). So if Seurat's ScaleData.IterableMatrix function were updated to call BPCells::regress_out() when vars.to.regress is passed, that could likely address this issue (at least when model.use = "linear")

See my comment in Elise's related BPCells issue for a quick example of how this might look to use as a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants