-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67
Comments
FWIW I'm pretty certain figures in my slides are just copy+pasted from the version of OSCA at the time I made the slides. |
Thanks for the info Pete! Here's a verification that the t-SNE shape issue is due to The annotated clusters and all that are different, likely due to other changes. # Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
# Detección de _droplets_ con células
set.seed(100)
# e.out <- emptyDrops(counts(sce.pbmc))
e.out <- readRDS("e.out_BioC3.11.rds")
table(e.out$FDR < 0.001, useNA = "ifany")
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
keys = rowData(sce.pbmc)$ID,
column = "SEQNAME", keytype = "GENEID"
)
sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]
# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]
# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)
## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)
## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
subset.row = top.pbmc,
technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")
plotTSNE(sce.pbmc)
# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)
library(celldex)
ref <- celldex::BlueprintEncodeData()
library(SingleR)
pred <- SingleR(
test = sce.pbmc, ref = ref,
labels = ref$label.main
)
sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels") Left: BioC 3.13 with |
Actually, the colors was due to
vs
Left: BioC 3.13 with e.out from BioC 3.11 Show below is the difference between using BioC 3.13 with the old output from Left: BioC 3.13 with e.out from BioC 3.111 |
Ohh thanks!!
El mar., 10 de agosto de 2021 22:25, Leonardo Collado-Torres <
***@***.***> escribió:
… Actually, the colors was due to
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")
vs
plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels")
[image: Screen Shot 2021-08-10 at 11 10 30 PM]
<https://user-images.githubusercontent.com/2288213/128963828-314c8272-bc68-488a-877b-1356e2aa7a44.png>
Left: BioC 3.13 with e.out from BioC 3.11
Right: Pete's slides
Show below is the difference between using BioC 3.13 with the old output
from DropletUtils::emptyDrops(), which matches the BioC 3.11 output, and
the new BioC 3.13 output. In terms of explaining methods, I don't think
that the BioC 3.13 output is bad in any way. So we could potentially close
this issue. Though in terms of reproducing DropletUtils::emptyDrops()
output from BioC 3.11 with newer versions, well, that's a different story.
My guess is that the answer is that it simply can't, like I said earlier.
[image: Screen Shot 2021-08-10 at 11 20 16 PM]
<https://user-images.githubusercontent.com/2288213/128964570-6873f9c5-af6e-41bf-b326-23224e5b0302.png>
Left: BioC 3.13 with e.out from BioC 3.111
Middle: Pete's slides with BioC 3.11
Right: BioC 3.13 only
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUHQZXGRU6IS2FGRQOTYIDT4HUS5ANCNFSM5B5IWV7A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Well, if your scope is "somewhere between April and Oct 2020", then the more impactful change is that of #48, which adjusts the spline fit and presumably shifts the |
I see, thanks Aaron. I wasn't aware of that change, though well, with BioC 3.13 I get the same results with and without > e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991 > library(scran)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Attaching package: ‘IRanges’
The following object is masked from ‘package:grDevices’:
windows
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:MatrixGenerics’:
rowMedians
The following objects are masked from ‘package:matrixStats’:
anyMissing, rowMedians
Loading required package: scuttle
> library(DropletUtils)
> ?emptyDrops
> # Usemos datos de pbmc4k
> library(BiocFileCache)
Loading required package: dbplyr
> bfc <- BiocFileCache()
> raw.path <- bfcrpath(bfc, file.path(
+ "http://cf.10xgenomics.com/samples",
+ "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
> untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
>
> library(DropletUtils)
> library(Matrix)
Attaching package: ‘Matrix’
The following object is masked from ‘package:S4Vectors’:
expand
> fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
> sce.pbmc <- read10xCounts(fname, col.names = TRUE)
> # Anotación de los genes
> library(scater)
Loading required package: ggplot2
> rownames(sce.pbmc) <- uniquifyFeatureNames(
+ rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
> library(EnsDb.Hsapiens.v86)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: AnnotationFilter
Attaching package: ‘AnnotationFilter’
The following object is masked from ‘package:testthat’:
not
Attaching package: 'ensembldb'
The following object is masked from 'package:stats':
filter
> location <- mapIds(EnsDb.Hsapiens.v86,
+ keys = rowData(sce.pbmc)$ID,
+ column = "SEQNAME", keytype = "GENEID"
+ )
Warning message:
Unable to map 144 of 33694 requested IDs.
> # Detección de _droplets_ con células
> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991
> options(width = 120)
> sessioninfo::session_info()
- Session info -------------------------------------------------------------------------------------------------------
setting value
version R version 4.1.0 alpha (2021-04-20 r80202)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2021-08-11
- Packages -----------------------------------------------------------------------------------------------------------
! package * version date lib source
AnnotationDbi * 1.54.1 2021-06-08 [1] Bioconductor
AnnotationFilter * 1.16.0 2021-05-19 [1] Bioconductor
AnnotationHub 3.0.1 2021-06-20 [1] Bioconductor
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
beachmat 2.8.0 2021-05-19 [1] Bioconductor
beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.1.0)
Biobase * 2.52.0 2021-05-19 [1] Bioconductor
BiocFileCache * 2.0.0 2021-05-19 [1] Bioconductor
BiocGenerics * 0.38.0 2021-05-19 [1] Bioconductor
BiocIO 1.2.0 2021-05-19 [1] Bioconductor
BiocManager 1.30.16 2021-06-15 [1] CRAN (R 4.1.0)
BiocNeighbors 1.10.0 2021-05-19 [1] Bioconductor
BiocParallel 1.26.1 2021-07-04 [1] Bioconductor
BiocSingular 1.8.1 2021-06-08 [1] Bioconductor
biocthis 1.2.0 2021-05-19 [1] Bioconductor
BiocVersion 3.13.1 2021-03-08 [1] Bioconductor
biomaRt 2.48.2 2021-07-01 [1] Bioconductor
Biostrings 2.60.2 2021-08-05 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
bluster 1.2.1 2021-05-27 [1] Bioconductor
bookdown 0.22 2021-04-22 [1] CRAN (R 4.1.0)
bslib 0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.1.0)
Cairo 1.5-12.2 2020-07-07 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
celldex 1.2.0 2021-05-20 [1] Bioconductor
circlize 0.4.13 2021-06-09 [1] CRAN (R 4.1.0)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
clue 0.3-59 2021-04-16 [1] CRAN (R 4.1.0)
cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.0)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.0)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
colourpicker 1.1.0 2020-09-14 [1] CRAN (R 4.1.0)
ComplexHeatmap 2.8.0 2021-05-19 [1] Bioconductor
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
DelayedArray 0.18.0 2021-05-19 [1] Bioconductor
DelayedMatrixStats 1.14.2 2021-08-08 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
devtools * 2.4.2 2021-06-07 [1] CRAN (R 4.1.0)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
doParallel 1.0.16 2020-10-16 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.0)
DropletUtils * 1.12.2 2021-07-22 [1] Bioconductor
DT 0.18 2021-04-14 [1] CRAN (R 4.1.0)
edgeR 3.34.0 2021-05-19 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
EnsDb.Hsapiens.v86 * 2.99.0 2021-08-08 [1] Bioconductor
ensembldb * 2.16.4 2021-08-05 [1] Bioconductor
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
ExperimentHub 2.0.0 2021-05-19 [1] Bioconductor
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
filelock 1.0.2 2018-10-05 [1] CRAN (R 4.1.0)
foreach 1.5.1 2020-10-15 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.28.1 2021-07-01 [1] Bioconductor
GenomeInfoDbData 1.2.6 2021-08-07 [1] Bioconductor
GenomicAlignments 1.28.0 2021-05-19 [1] Bioconductor
GenomicFeatures * 1.44.0 2021-05-19 [1] Bioconductor
GenomicRanges * 1.44.0 2021-05-19 [1] Bioconductor
GetoptLong 1.0.5 2020-12-15 [1] CRAN (R 4.1.0)
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.1.0)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.0)
GlobalOptions 0.1.2 2020-06-10 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
HDF5Array 1.20.0 2021-05-19 [1] Bioconductor
here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0)
hms 1.1.0 2021-05-17 [1] CRAN (R 4.1.0)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.1.0)
httpuv 1.6.1 2021-05-07 [1] CRAN (R 4.1.0)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.1.0)
interactiveDisplayBase 1.30.0 2021-05-19 [1] Bioconductor
IRanges * 2.26.0 2021-05-19 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.1.0)
iSEE 2.4.0 2021-05-19 [1] Bioconductor
iterators 1.0.13 2020-10-15 [1] CRAN (R 4.1.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
KEGGREST 1.32.0 2021-05-19 [1] Bioconductor
knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
later 1.2.0 2021-04-23 [1] CRAN (R 4.1.0)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
limma 3.48.2 2021-08-08 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.1.0)
lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix * 1.3-4 2021-06-01 [1] CRAN (R 4.1.0)
MatrixGenerics * 1.4.2 2021-08-08 [1] Bioconductor
matrixStats * 0.60.0 2021-07-26 [1] CRAN (R 4.1.0)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
metapod 1.0.0 2021-05-19 [1] Bioconductor
mgcv 1.8-36 2021-06-01 [1] CRAN (R 4.1.0)
mime 0.11 2021-06-23 [1] CRAN (R 4.1.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
nlme 3.1-152 2021-02-04 [1] CRAN (R 4.1.0)
PCAtools 2.4.0 2021-05-19 [1] Bioconductor
pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0)
png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.1.0)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0)
ProtGenerics 1.24.0 2021-05-19 [1] Bioconductor
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.0)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.1.0)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.0)
restfulr 0.0.13 2017-08-06 [1] CRAN (R 4.1.0)
rhdf5 2.36.0 2021-05-19 [1] Bioconductor
D rhdf5filters 1.4.0 2021-05-19 [1] Bioconductor
Rhdf5lib 1.14.2 2021-07-06 [1] Bioconductor
rintrojs 0.3.0 2021-06-06 [1] CRAN (R 4.1.0)
rjson 0.2.20 2018-06-08 [1] CRAN (R 4.1.0)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.1.0)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
Rsamtools 2.8.0 2021-05-19 [1] Bioconductor
RSQLite 2.2.7 2021-04-22 [1] CRAN (R 4.1.0)
rsthemes 0.2.1.9000 2021-04-22 [1] Github (gadenbuie/rsthemes@19299e5)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.1.0)
rtracklayer 1.52.0 2021-05-19 [1] Bioconductor
S4Vectors * 0.30.0 2021-05-19 [1] Bioconductor
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
ScaledMatrix 1.0.0 2021-05-19 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
scater * 1.20.1 2021-06-15 [1] Bioconductor
scran * 1.20.1 2021-05-24 [1] Bioconductor
scuttle * 1.2.1 2021-08-05 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
shape 1.4.6 2021-05-19 [1] CRAN (R 4.1.0)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.1.0)
shinyAce 0.4.1 2019-09-24 [1] CRAN (R 4.1.0)
shinydashboard 0.7.1 2018-10-17 [1] CRAN (R 4.1.0)
shinyjs 2.0.0 2020-09-09 [1] CRAN (R 4.1.0)
shinyWidgets 0.6.0 2021-03-15 [1] CRAN (R 4.1.0)
SingleCellExperiment * 1.14.1 2021-05-21 [1] Bioconductor
sparseMatrixStats 1.4.2 2021-08-08 [1] Bioconductor
statmod 1.4.36 2021-05-10 [1] CRAN (R 4.1.0)
stringi 1.7.3 2021-07-16 [1] CRAN (R 4.1.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
styler 1.5.1 2021-07-13 [1] CRAN (R 4.1.0)
SummarizedExperiment * 1.22.0 2021-05-19 [1] Bioconductor
suncalc 0.5.0 2019-04-03 [1] CRAN (R 4.1.0)
testthat * 3.0.4 2021-07-01 [1] CRAN (R 4.1.0)
tibble 3.1.3 2021-07-23 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.1.0)
viridis 0.6.1 2021-05-11 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xfun 0.25 2021-08-06 [1] CRAN (R 4.1.0)
XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.1.0)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0)
XVector 0.32.0 2021-05-19 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
zlibbioc 1.38.0 2021-05-19 [1] Bioconductor
[1] C:/R/R-4.1.0alpha/library
D -- DLL MD5 mismatch, broken installation. |
Under a first inspection, it doesn't seem to be that the following commit is the problem 71b161e#diff-cf7a6dd4088835e532ce821c61ceb633ccc99ee3ffe8a812ab3c0a7f46dcaa8b either based on the test below and from looking at https://github.com/LTLA/beachmat/blob/d3e4bafbc840df2d106197d7163c3f3aa6fb1fb1/R/whichNonZero.R#L42-L44. set.seed(100)
p.n0 <- runif(100)
j <- sample(1:10, 100, TRUE)
mat <- matrix(ncol = 10)
dim(mat)
#> [1] 1 10
max(j)
#> [1] 10
by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#> [1] 3.998302 4.518524 2.379189 5.237262 6.077561 4.126877 4.634267
#> [8] 4.671363 4.275538 12.068117
j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#> [1] 3.998302 4.518524 2.379189 5.237262 6.077561 4.126877 4.634267
#> [8] 4.671363 4.275538 12.068117
testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new) Created on 2021-08-11 by the reprex package (v2.0.1) However, in that case, the full matrix More complicated caseHowever, let's say that the full matrix has 15 columns with columns 6 to 10 being all zeros. ## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)
## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1] 1 15
by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#> [1] 3.998302 4.518524 2.379189 5.237262 6.077561 0.000000 0.000000
#> [8] 0.000000 0.000000 0.000000 4.126877 4.634267 4.671363 4.275538
#> [15] 12.068117
j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#> [1] 3.998302 4.518524 2.379189 5.237262 6.077561 NA NA
#> [8] NA NA NA 4.126877 4.634267 4.671363 4.275538
#> [15] 12.068117
testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6] 0 - NA == NA
#> [7] 0 - NA == NA
#> [8] 0 - NA == NA
#> [9] 0 - NA == NA
#> [10] 0 - NA == NA
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6] 0 - NA == NA
#> [7] 0 - NA == NA
#> [8] 0 - NA == NA
#> [9] 0 - NA == NA
#> [10] 0 - NA == NA Created on 2021-08-11 by the reprex package (v2.0.1) In that case, we end up having With the data used for this issue, I see that well, we do have lots of non-zero columns as expected. # Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
# Anotación de los genes
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
library(beachmat)
mat <- counts(sce.pbmc)
dim(mat)
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
max(nonzero$j) where > library(beachmat)
>
> mat <- counts(sce.pbmc)
> dim(mat)
[1] 33694 737280
> nonzero <- whichNonZero(mat)
> length(unique(nonzero$j))
[1] 272442
> max(nonzero$j)
[1] 737280 |
pressed the wrong button while drafting my message.... |
Ok, this does seem to be the issue with the actual data as shown in the I imagine that it doesn't matter for most droplets since well, they already had # Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#>
#> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#> clusterExport, clusterMap, parApply, parCapply, parLapply,
#> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#>
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#>
#> windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(Matrix)
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#>
#> expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
library(beachmat)
mat <- counts(sce.pbmc)
dim(mat)
#> [1] 33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280
m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)
i <- nonzero$i
j <- nonzero$j
x <- nonzero$x
set.seed(100)
p.n0 <- runif(length(j))
by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.0000 0.0000 5.7792 0.7371 2622.1900
j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
summary(obs.P_new)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0.0 0.6 1.3 15.6 9.1 2622.2 464838
testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2] 0 - NA == NA
#> [4] 0 - NA == NA
#> [7] 0 - NA == NA
#> [8] 0 - NA == NA
#> [9] 0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ...
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2] 0 - NA == NA
#> [4] 0 - NA == NA
#> [7] 0 - NA == NA
#> [8] 0 - NA == NA
#> [9] 0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ... Created on 2021-08-11 by the reprex package (v2.0.1) |
Adding ## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)
## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1] 1 15
by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
## This works
# obs.P_new[is.na(obs.P_new)] <- 0
## But well, maybe there are NAs in the original p.n0 already
## Here's a more robust fix to simply add 0
## to the columns with no data
obs.P_new[!seq_len(ncol(mat)) %in% unique(j)] <- 0
data.frame(obs.P, obs.P_new)
#> obs.P obs.P_new
#> 1 3.998302 3.998302
#> 2 4.518524 4.518524
#> 3 2.379189 2.379189
#> 4 5.237262 5.237262
#> 5 6.077561 6.077561
#> 6 0.000000 0.000000
#> 7 0.000000 0.000000
#> 8 0.000000 0.000000
#> 9 0.000000 0.000000
#> 10 0.000000 0.000000
#> 11 4.126877 4.126877
#> 12 4.634267 4.634267
#> 13 4.671363 4.671363
#> 14 4.275538 4.275538
#> 15 12.068117 12.068117
testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new) Created on 2021-08-11 by the reprex package (v2.0.1) This is true with the full data too. # Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#>
#> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#> clusterExport, clusterMap, parApply, parCapply, parLapply,
#> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#>
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#>
#> windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(Matrix)
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#>
#> expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
library(beachmat)
mat <- counts(sce.pbmc)
dim(mat)
#> [1] 33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280
m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)
i <- nonzero$i
j <- nonzero$j
x <- nonzero$x
set.seed(100)
p.n0 <- runif(length(j))
by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.0000 0.0000 5.7792 0.7371 2622.1900
j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new[!seq_len(ncol(m)) %in% unique(j)] <- 0
summary(obs.P_new)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.0000 0.0000 5.7792 0.7371 2622.1900
testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new) Created on 2021-08-11 by the reprex package (v2.0.1) Next, I'll try on a fresh clone of |
Bah, I still can't get the BioC 3.11 results even with these changes :/ > set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991 It didn't work with lcolladotor@790d099 > table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-08-11
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
beachmat 2.9.0 2021-05-19 [1] Bioconductor
beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.1.0)
Biobase * 2.53.0 2021-05-19 [1] Bioconductor
BiocFileCache * 2.1.1 2021-06-23 [1] Bioconductor
BiocGenerics * 0.39.1 2021-06-08 [1] Bioconductor
BiocNeighbors 1.11.0 2021-05-19 [1] Bioconductor
BiocParallel 1.27.2 2021-07-12 [1] Bioconductor
BiocSingular 1.9.1 2021-06-08 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
colorout 1.2-2 2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
DelayedArray 0.19.1 2021-06-25 [1] Bioconductor
DelayedMatrixStats 1.15.2 2021-08-05 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
devtools * 2.4.2 2021-06-07 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.0)
DropletUtils * 1.13.3 2021-08-11 [1] Github (lcolladotor/DropletUtils@790d099)
edgeR 3.35.0 2021-05-19 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
filelock 1.0.2 2018-10-05 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.29.3 2021-07-01 [1] Bioconductor
GenomeInfoDbData 1.2.6 2021-05-24 [1] Bioconductor
GenomicRanges * 1.45.0 2021-05-19 [1] Bioconductor
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.1.0)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
HDF5Array 1.21.0 2021-05-19 [1] Bioconductor
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
IRanges * 2.27.0 2021-05-19 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.1.0)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
limma 3.49.4 2021-08-08 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix * 1.3-4 2021-06-01 [1] CRAN (R 4.1.0)
MatrixGenerics * 1.5.3 2021-08-05 [1] Bioconductor
matrixStats * 0.60.0 2021-07-26 [1] CRAN (R 4.1.0)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
prompt 1.0.1 2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.1.0)
rhdf5 2.37.0 2021-05-19 [1] Bioconductor
rhdf5filters 1.5.0 2021-05-19 [1] Bioconductor
Rhdf5lib 1.15.2 2021-07-01 [1] Bioconductor
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
RSQLite 2.2.7 2021-04-22 [1] CRAN (R 4.1.0)
rsthemes 0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.1.0)
S4Vectors * 0.31.0 2021-05-19 [1] Bioconductor
ScaledMatrix 1.1.0 2021-05-19 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
scater * 1.21.3 2021-08-01 [1] Bioconductor
scuttle * 1.3.1 2021-08-05 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
SingleCellExperiment * 1.15.1 2021-05-21 [1] Bioconductor
sparseMatrixStats 1.5.2 2021-08-05 [1] Bioconductor
SummarizedExperiment * 1.23.1 2021-06-24 [1] Bioconductor
testthat * 3.0.4 2021-07-01 [1] CRAN (R 4.1.0)
tibble 3.1.3 2021-07-23 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.1.0)
viridis 0.6.1 2021-05-11 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
XVector 0.33.0 2021-05-19 [1] Bioconductor
zlibbioc 1.39.0 2021-05-19 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library nor lcolladotor@2b0bb4b, both with BioC 3.14 (bioc-devel). > library(BiocFileCache)
Loading required package: dbplyr
fix_emptyDrops_bioc3.11_vs_3.12 > bfc <- BiocFileCache()
fix_emptyDrops_bioc3.11_vs_3.12 > raw.path <- bfcrpath(bfc, file.path(
+ "http://cf.10xgenomics.com/samples",
+ "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
fix_emptyDrops_bioc3.11_vs_3.12 > untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(DropletUtils)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:MatrixGenerics’:
rowMedians
The following objects are masked from ‘package:matrixStats’:
anyMissing, rowMedians
fix_emptyDrops_bioc3.11_vs_3.12 > library(Matrix)
Attaching package: ‘Matrix’
The following object is masked from ‘package:S4Vectors’:
expand
fix_emptyDrops_bioc3.11_vs_3.12 > fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
fix_emptyDrops_bioc3.11_vs_3.12 > sce.pbmc <- read10xCounts(fname, col.names = TRUE)
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(scater)
Loading required package: scuttle
Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
fix_emptyDrops_bioc3.11_vs_3.12 > rownames(sce.pbmc) <- uniquifyFeatureNames(
+ rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > # Detección de _droplets_ con células
fix_emptyDrops_bioc3.11_vs_3.12 > set.seed(100)
fix_emptyDrops_bioc3.11_vs_3.12 > #e.out <- readRDS("e.out_BioC3.11.rds")
fix_emptyDrops_bioc3.11_vs_3.12 > e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
fix_emptyDrops_bioc3.11_vs_3.12 > table(e.out$FDR < 0.001, useNA = "ifany")
FALSE TRUE <NA>
989 4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-08-11
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
beachmat 2.9.0 2021-05-19 [1] Bioconductor
beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.1.0)
Biobase * 2.53.0 2021-05-19 [1] Bioconductor
BiocFileCache * 2.1.1 2021-06-23 [1] Bioconductor
BiocGenerics * 0.39.1 2021-06-08 [1] Bioconductor
BiocNeighbors 1.11.0 2021-05-19 [1] Bioconductor
BiocParallel 1.27.2 2021-07-12 [1] Bioconductor
BiocSingular 1.9.1 2021-06-08 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.1.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
colorout 1.2-2 2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
DelayedArray 0.19.1 2021-06-25 [1] Bioconductor
DelayedMatrixStats 1.15.2 2021-08-05 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.1.0)
devtools * 2.4.2 2021-06-07 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.0)
DropletUtils * 1.13.3 2021-08-11 [1] Github (lcolladotor/DropletUtils@2b0bb4b)
edgeR 3.35.0 2021-05-19 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
filelock 1.0.2 2018-10-05 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.29.3 2021-07-01 [1] Bioconductor
GenomeInfoDbData 1.2.6 2021-05-24 [1] Bioconductor
GenomicRanges * 1.45.0 2021-05-19 [1] Bioconductor
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.1.0)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
HDF5Array 1.21.0 2021-05-19 [1] Bioconductor
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
IRanges * 2.27.0 2021-05-19 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.1.0)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
limma 3.49.4 2021-08-08 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix * 1.3-4 2021-06-01 [1] CRAN (R 4.1.0)
MatrixGenerics * 1.5.3 2021-08-05 [1] Bioconductor
matrixStats * 0.60.0 2021-07-26 [1] CRAN (R 4.1.0)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.1.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
prompt 1.0.1 2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.1.0)
rhdf5 2.37.0 2021-05-19 [1] Bioconductor
rhdf5filters 1.5.0 2021-05-19 [1] Bioconductor
Rhdf5lib 1.15.2 2021-07-01 [1] Bioconductor
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
RSQLite 2.2.7 2021-04-22 [1] CRAN (R 4.1.0)
rsthemes 0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.1.0)
S4Vectors * 0.31.0 2021-05-19 [1] Bioconductor
ScaledMatrix 1.1.0 2021-05-19 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
scater * 1.21.3 2021-08-01 [1] Bioconductor
scuttle * 1.3.1 2021-08-05 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
SingleCellExperiment * 1.15.1 2021-05-21 [1] Bioconductor
sparseMatrixStats 1.5.2 2021-08-05 [1] Bioconductor
SummarizedExperiment * 1.23.1 2021-06-24 [1] Bioconductor
testthat * 3.0.4 2021-07-01 [1] CRAN (R 4.1.0)
tibble 3.1.3 2021-07-23 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.1.0)
viridis 0.6.1 2021-05-11 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
XVector 0.33.0 2021-05-19 [1] Bioconductor
zlibbioc 1.39.0 2021-05-19 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library |
I would suggest looking at 584320d. Reverting that change and running:
This is mentioned briefly in the |
Oh interesting, thanks Aaron! I'll do that tomorrow morning. Thanks! |
Hi Aaron, Reverting the bugfix reproduces prior resultsUsing lcolladotor@eb8eb45 which reverts 584320d on BioC 3.14 I do indeed get the same results I get with BioC 3.13 using the e.out_BioC3.14_lcolladotor_eb8eb45.rds.zip # Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
"http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)
# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
#e.out <- readRDS("~/Desktop/e.out_BioC3.11.rds")
table(e.out$FDR < 0.001, useNA = "ifany")
# FALSE TRUE <NA>
# 1056 4233 731991
saveRDS(e.out, file = "~/Desktop/e.out_BioC3.14_lcolladotor_eb8eb45.rds")
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
keys = rowData(sce.pbmc)$ID,
column = "SEQNAME", keytype = "GENEID"
)
sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]
# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]
# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)
## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)
## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
subset.row = top.pbmc,
technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")
plotTSNE(sce.pbmc)
# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)
library(celldex)
ref <- celldex::BlueprintEncodeData()
library(SingleR)
pred <- SingleR(
test = sce.pbmc, ref = ref,
labels = ref$label.main
)
sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")
plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels") I did notice a few small differences vs Pete's slides, like those few red cells to the right of the monocytes on his version and the shape of the B-cells cluster, but well, that's likely to other changes. Wrapping upSo indeed, 584320d was the main source of the discrepancy. Since it's a bug fix, like you noted above and at 24e8073, we can close this issue on the reproducibility of BioC 3.11 results.
|
What's wrong with the |
Hi Aaron et al,
We (@Yalbibalderas @AnaBVA) noticed that the following code from OSCA returns different t-SNE plots. Eventually we narrowed down the issue to BioC 3.11 vs 3.12 (3.13 is the same as 3.12). In particular, the issue is with changes between those versions for
DropletUtils
.The path to
DropletUtils
is documented at https://twitter.com/lcolladotor/status/1425242252872409092?s=20.You can reproduce this with:
where BioC 3.11 returns
and later versions return
The full code is below for getting to the tSNE (adapted from https://comunidadbioinfo.github.io/cdsb2021_scRNAseq/anotaci%C3%B3n-de-clusters-de-c%C3%A9lulas.html and ultimately from OSCA at https://bioconductor.org/books/release/OSCA/unfiltered-human-pbmcs-10x-genomics.html) is below.
We think that it comes down to
DropletUtils::emptyDrops()
and between April 27 and October 27 2020 (when BioC 3.12 was bioc-devel), there's 2 commits that altered it, in particular we think that 71b161e might be the root of the difference.Based on the history, we see that commit was early in the BioC 3.12 devel cycle. With that in mind, I did a chimera installation with BioC 3.12 and the BioC 3.11 version of
DropletUtils
, specifically 51d00b0 (remotes::install_github("MarioniLab/DropletUtils@51d00b0")
). That version reproduced the t-SNE andDropletUtils::emptyDrops()
we see with BioC 3.11.I tried installing this version of DropletUtils 71b161e but I can't due some compilation errors related to
scuttle
, which makes sense since in BioC 3.12 you changed many packages. I also tried a chimera of BioC 3.11 and upgradingDropletUtils
to BioC 3.12 versions (withremotes::install_github()
) around the time of the suspected root commit and related packages, but well, I couldn't get it to work tonight.In any case, do you think that there's a way to get the same
emptyDrops()
results with current versions ofDropletUtils
as those we were getting back in BioC 3.11? My sense is that the answer is no since the internals of the function changed. Aka, it's not like a parameter changed and we can just change it from sayFALSE
toTRUE
, etc.All of this arose due to how in BioC 3.11, the clusters of CD4 and CD8 T-cells looked nicely separated in the t-SNE and don't with current versions as shown below.
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>And well, we wanted to be able that question when we teach that part of OSCA on Friday morning.
Best,
Leo
The text was updated successfully, but these errors were encountered: