-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed issue with SingleR v2.8.0 in BioC 3.20 #292
Comments
OK, so I decided to divide the I am using the same As you can see below, the new v2.8.0 is faster in this step. The I'll dig a little into it and see what I can find. library(BiocParallel)
library(DelayedArray)
library(SingleR)
bpen <- celldex::BlueprintEncodeData()
sce <- readRDS("sce.rds") For R 4.3, v2.4.1prep.input <- function (test, ref,
assay.type.test = "logcounts", assay.type.ref = "logcounts",
check.missing = TRUE, num.threads = bpnworkers(BPPARAM),
BNPARAM = NULL, BPPARAM = SerialParam()) {
if (!bpisup(BPPARAM) && !is(BPPARAM, "MulticoreParam")) {
bpstart(BPPARAM)
on.exit(bpstop(BPPARAM))
}
test <- SingleR:::.to_clean_matrix(test, assay.type.test, check.missing,
msg = "test", BPPARAM = BPPARAM)
if (single.ref <- !SingleR:::.is_list(ref)) {
ref <- list(ref)
}
ref <- lapply(ref, FUN = SingleR:::.to_clean_matrix, assay.type = assay.type.ref,
check.missing = check.missing, msg = "ref", BPPARAM = BPPARAM)
refnames <- Reduce(intersect, lapply(ref, rownames))
keep <- intersect(rownames(test), refnames)
if (length(keep) == 0) {
stop("no common genes between 'test' and 'ref'")
}
if (!identical(keep, rownames(test))) {
test <- test[keep, ]
}
for (i in seq_along(ref)) {
if (!identical(keep, rownames(ref[[i]]))) {
ref[[i]] <- ref[[i]][keep, , drop = FALSE]
}
}
if (single.ref) {
ref <- ref[[1]]
}
return(list(ref, test))
} system.time({
x <- prep.input(test = sce, ref = list(BP = bpen), assay.type.test = "logcounts", num.threads = 2)
})
system.time({
trained <- trainSingleR(x[[1]], labels = list(bpen$label.main), genes = "de", sd.thresh = 1,
de.method = "classic", de.n = NULL,, de.args = list(),
aggr.ref = FALSE, aggr.args = list(), recompute = TRUE,
restrict = NULL, check.missing = TRUE,
BNPARAM = NULL, num.threads = 2, BPPARAM = SerialParam())
})
system.time({
class <- classifySingleR(x[[2]], trained, quantile = 0.8, fine.tune = TRUE,
tune.thresh = 0.05, prune = TRUE, check.missing = FALSE,
num.threads = 2, BPPARAM = SerialParam())
})
For R 4.4, v2.8.0prep.input <- function (test, ref,
assay.type.test = "logcounts", assay.type.ref = "logcounts",
check.missing = TRUE, num.threads = bpnworkers(BPPARAM),
BNPARAM = NULL, BPPARAM = SerialParam()) {
if (!bpisup(BPPARAM) && !is(BPPARAM, "MulticoreParam")) {
bpstart(BPPARAM)
on.exit(bpstop(BPPARAM))
}
test <- SingleR:::.to_clean_matrix(test, assay.type.test, check.missing,
msg = "test", BPPARAM = BPPARAM)
tmp.ref <- ref
if (!is.list(tmp.ref) || is.data.frame(tmp.ref)) {
tmp.ref <- list(ref)
}
for (rr in tmp.ref) {
keep <- rownames(test) %in% rownames(rr)
if (!all(keep)) {
test <- DelayedArray(test)[keep, , drop = FALSE]
}
}
if (nrow(test) == 0) {
stop("no common genes between 'test' and 'ref")
}
return(list(ref, test))
} system.time({
x <- prep.input(test = sce, ref = list(BP = bpen), assay.type.test = "logcounts", num.threads = 2)
})
system.time({
trained <- trainSingleR(x[[1]], labels = list(bpen$label.main), genes = "de", sd.thresh = 1,
de.method = "classic", de.n = NULL,, de.args = list(),
aggr.ref = FALSE, aggr.args = list(), recompute = TRUE,
restrict = NULL, test.genes = rownames(x[[2]]), check.missing = TRUE,
BNPARAM = NULL, num.threads = 2, BPPARAM = SerialParam())
})
system.time({
class <- classifySingleR(x[[2]], trained, quantile = 0.8, fine.tune = TRUE,
tune.thresh = 0.05, prune = TRUE, check.missing = FALSE,
num.threads = 2, BPPARAM = SerialParam())
})
|
OK, I've looked into the |
For me: R 4.4, BioC 3.20library(SingleR)
library(celldex)
bp <- BlueprintEncodeData()
# Using Baron as it's human and it is large enough for some decent testing (~8k cells).
library(scRNAseq)
sce <- BaronPancreasData('human')
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1))
## user system elapsed
## 29.179 0.081 29.259
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1, num.threads=2))
## user system elapsed
## 30.008 0.141 15.756
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1, num.threads=4))
## user system elapsed
## 30.981 0.051 8.795 Session information
R 4.3, BioC 3.18# Omitting the identical setup code here...
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1))
## user system elapsed
## 37.935 0.156 38.186
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1, num.threads=2))
## user system elapsed
## 38.296 0.135 19.808
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test=1, num.threads=4))
## user system elapsed
## 39.354 0.120 11.058 Session information
As we can see, SingleR 2.8.0 is a little bit faster than 2.4.1 on my machine (Mac M2). If you can't share your dataset, try to reproduce your timings with a dataset in scRNAseq. If you're still getting worse timings for 2.8.0 on Another observation is that, for me, SingleR scales nicely as we double the number of threads from 1 to 2, and then 2 to 4. If you're not observing this, it suggests that there is a serial processing bottleneck - specifically related to your test dataset, given that I can say that the slight differences in results are expected and are discussed in the NEWS for 2.8.0, see Lines 34 to 38 in c2922b3
|
Hi @LTLA Thanks for your explanation. I was indeed using an object that has its assays stored in Here I am using library(SingleR)
library(scran)
library(scuttle)
library(TENxPBMCData)
sce <- TENxPBMCData(dataset = "pbmc33k")
rownames(sce) <- uniquifyFeatureNames(rowData(sce)$ENSEMBL_ID, rowData(sce)$Symbol_TENx)
colnames(sce) <- sce$Barcode
set.seed(1000)
clusters <- quickCluster(sce)
sce <- computeSumFactors(sce, cluster=clusters)
sce <- logNormCounts(sce)
bp <- celldex::BlueprintEncodeData() R 4.4, BioC 3.20system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts'))
## user system elapsed
##126.988 0.317 127.297
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts', num.threads=2))
## user system elapsed
##169.576 0.241 107.422
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts', num.threads=4))
## user system elapsed
##169.480 0.185 79.117 R 4.3, BioC 3.18system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts'))
## user system elapsed
##158.698 0.121 158.803
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts', num.threads=2))
## user system elapsed
##170.560 0.148 101.268
system.time(res <- SingleR(sce, bp, labels=bp$label.main, assay.type.test='logcounts', num.threads=4))
## user system elapsed
##181.603 0.191 72.212 I am now tweaking my workflow to create |
Hi,
I am testing my single-cell workflow using the new R-4.4 (v4.4.2) these couple of days and noticed SingleR (v2.8.0) is running a lot slower compared with my other environment that has R-4.3 and SingleR v2.4.1.
From the commit history, I don't see what might have caused the drop in speed between the two versions.
I am wondering if anyone has any insight into this and how I can run the new version of
SingleR
as fast as before (is there any parameter I can tweak)?Below is how I run SingleR and my
sce
has 9,188 genes and 16,601 cells.With R-4.3 and SingleR v2.4.1, I have
and with R-4.4 and SingleR v2.8.0, I have
When I change num.threads to
4
(no speed gain in both, new version still runs slower):With R-4.3 and SingleR v2.4.1, I have
and with R-4.4 and SingleR v2.8.0, I have
p.s. The predicted cell type labels are a little different, but I am guessing its due to updates with the
celldex
package. Although, I did notice theassays
structure of the reference I am using (BlueprintEncodeData
) is a little different.This is
celldex_1.12.0
(loaded and attached):And this is
celldex_1.16.0
(loaded, not attached):R v4.3.3 and v4.4.2 session info
The text was updated successfully, but these errors were encountered: