emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

lcolladotor · 2021-08-11T02:27:56Z

Hi Aaron et al,

We (@Yalbibalderas @AnaBVA) noticed that the following code from OSCA returns different t-SNE plots. Eventually we narrowed down the issue to BioC 3.11 vs 3.12 (3.13 is the same as 3.12). In particular, the issue is with changes between those versions for DropletUtils.

The path to DropletUtils is documented at https://twitter.com/lcolladotor/status/1425242252872409092?s=20.

You can reproduce this with:

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")

where BioC 3.11 returns

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
  1056   4233 731991

and later versions return

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991

The full code is below for getting to the tSNE (adapted from https://comunidadbioinfo.github.io/cdsb2021_scRNAseq/anotaci%C3%B3n-de-clusters-de-c%C3%A9lulas.html and ultimately from OSCA at https://bioconductor.org/books/release/OSCA/unfiltered-human-pbmcs-10x-genomics.html) is below.

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")





library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



options(width = 120)
sessioninfo::session_info()

We think that it comes down to DropletUtils::emptyDrops() and between April 27 and October 27 2020 (when BioC 3.12 was bioc-devel), there's 2 commits that altered it, in particular we think that 71b161e might be the root of the difference.

Based on the history, we see that commit was early in the BioC 3.12 devel cycle. With that in mind, I did a chimera installation with BioC 3.12 and the BioC 3.11 version of DropletUtils, specifically 51d00b0 (remotes::install_github("MarioniLab/DropletUtils@51d00b0")). That version reproduced the t-SNE and DropletUtils::emptyDrops() we see with BioC 3.11.

> options(width = 120)
main* > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                                      
 version  R version 4.0.5 Patched (2021-03-31 r80247)
 os       macOS Big Sur 11.5.1                       
 system   x86_64, darwin17.0                         
 ui       RStudio                                    
 language (EN)                                       
 collate  en_US.UTF-8                                
 ctype    en_US.UTF-8                                
 tz       America/New_York                           
 date     2021-08-10                                 

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source                                  
 AnnotationDbi        * 1.52.0   2020-10-27 [1] Bioconductor                            
 AnnotationFilter     * 1.14.0   2020-10-27 [1] Bioconductor                            
 askpass                1.1      2019-01-13 [1] CRAN (R 4.0.2)                          
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.0.2)                          
 backports              1.2.1    2020-12-09 [1] CRAN (R 4.0.2)                          
 beachmat               2.6.4    2020-12-20 [1] Bioconductor                            
 beeswarm               0.3.1    2021-03-07 [1] CRAN (R 4.0.2)                          
 Biobase              * 2.50.0   2020-10-27 [1] Bioconductor                            
 BiocFileCache        * 1.14.0   2020-10-27 [1] Bioconductor                            
 BiocGenerics         * 0.36.1   2021-04-16 [1] Bioconductor                            
 BiocNeighbors          1.8.2    2020-12-07 [1] Bioconductor                            
 BiocParallel           1.24.1   2020-11-06 [1] Bioconductor                            
 BiocSingular           1.6.0    2020-10-27 [1] Bioconductor                            
 biocthis               1.0.10   2021-02-26 [1] Bioconductor                            
 biomaRt                2.46.3   2021-02-09 [1] Bioconductor                            
 Biostrings             2.58.0   2020-10-27 [1] Bioconductor                            
 bit                    4.0.4    2020-08-04 [1] CRAN (R 4.0.2)                          
 bit64                  4.0.5    2020-08-30 [1] CRAN (R 4.0.2)                          
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.0.2)                          
 blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.2)                          
 bluster                1.0.0    2020-10-27 [1] Bioconductor                            
 bookdown               0.22     2021-04-22 [1] CRAN (R 4.0.2)                          
 cachem                 1.0.4    2021-02-13 [1] CRAN (R 4.0.2)                          
 callr                  3.7.0    2021-04-20 [1] CRAN (R 4.0.2)                          
 cli                    2.5.0    2021-04-26 [1] CRAN (R 4.0.5)                          
 colorout               1.2-2    2020-11-03 [1] Github (jalvesaq/colorout@726d681)      
 colorspace             2.0-0    2020-11-11 [1] CRAN (R 4.0.2)                          
 cowplot                1.1.1    2020-12-30 [1] CRAN (R 4.0.2)                          
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.0.2)                          
 curl                   4.3.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 data.table             1.14.0   2021-02-21 [1] CRAN (R 4.0.2)                          
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.0.2)                          
 dbplyr               * 2.1.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 DelayedArray           0.16.3   2021-03-24 [1] Bioconductor                            
 DelayedMatrixStats     1.12.3   2021-02-03 [1] Bioconductor                            
 desc                   1.3.0    2021-03-05 [1] CRAN (R 4.0.2)                          
 devtools             * 2.4.0    2021-04-07 [1] CRAN (R 4.0.2)                          
 digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.2)                          
 dplyr                  1.0.5    2021-03-05 [1] CRAN (R 4.0.2)                          
 dqrng                  0.3.0    2021-05-01 [1] CRAN (R 4.0.5)                          
 DropletUtils         * 1.9.0    2021-08-11 [1] Github (MarioniLab/DropletUtils@51d00b0)
 edgeR                  3.32.1   2021-01-14 [1] Bioconductor                            
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.0.2)                          
 EnsDb.Hsapiens.v86   * 2.99.0   2021-08-11 [1] Bioconductor                            
 ensembldb            * 2.14.1   2021-04-19 [1] Bioconductor                            
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.0.1)                          
 fansi                  0.4.2    2021-01-15 [1] CRAN (R 4.0.2)                          
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.0.2)                          
 FNN                    1.1.3    2019-02-15 [1] CRAN (R 4.0.2)                          
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.2)                          
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.0.2)                          
 GenomeInfoDb         * 1.26.7   2021-04-08 [1] Bioconductor                            
 GenomeInfoDbData       1.2.4    2020-11-03 [1] Bioconductor                            
 GenomicAlignments      1.26.0   2020-10-27 [1] Bioconductor                            
 GenomicFeatures      * 1.42.3   2021-04-04 [1] Bioconductor                            
 GenomicRanges        * 1.42.0   2020-10-27 [1] Bioconductor                            
 ggbeeswarm             0.6.0    2017-08-07 [1] CRAN (R 4.0.2)                          
 ggplot2              * 3.3.3    2020-12-30 [1] CRAN (R 4.0.2)                          
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)                          
 gridExtra              2.3      2017-09-09 [1] CRAN (R 4.0.2)                          
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.0.2)                          
 HDF5Array              1.18.1   2021-02-04 [1] Bioconductor                            
 hms                    1.0.0    2021-01-13 [1] CRAN (R 4.0.2)                          
 htmltools              0.5.1.1  2021-01-22 [1] CRAN (R 4.0.2)                          
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.0.2)                          
 igraph                 1.2.6    2020-10-06 [1] CRAN (R 4.0.2)                          
 IRanges              * 2.24.1   2020-12-12 [1] Bioconductor                            
 irlba                  2.3.3    2019-02-05 [1] CRAN (R 4.0.2)                          
 knitr                  1.33     2021-04-24 [1] CRAN (R 4.0.2)                          
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.0.2)                          
 lattice                0.20-44  2021-05-02 [1] CRAN (R 4.0.5)                          
 lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.0.2)                          
 lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.0.2)                          
 limma                  3.46.0   2020-10-27 [1] Bioconductor                            
 locfit                 1.5-9.4  2020-03-25 [1] CRAN (R 4.0.2)                          
 lubridate              1.7.10   2021-02-26 [1] CRAN (R 4.0.2)                          
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.0.2)                          
 Matrix               * 1.3-2    2021-01-06 [1] CRAN (R 4.0.5)                          
 MatrixGenerics       * 1.2.1    2021-01-30 [1] Bioconductor                            
 matrixStats          * 0.58.0   2021-01-29 [1] CRAN (R 4.0.2)                          
 memoise                2.0.0    2021-01-26 [1] CRAN (R 4.0.3)                          
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.0.2)                          
 openssl                1.4.4    2021-04-30 [1] CRAN (R 4.0.2)                          
 pillar                 1.6.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.0.3)                          
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.2)                          
 pkgload                1.2.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.0.2)                          
 processx               3.5.2    2021-04-30 [1] CRAN (R 4.0.2)                          
 progress               1.2.2    2019-05-16 [1] CRAN (R 4.0.2)                          
 prompt                 1.0.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 ProtGenerics           1.22.0   2020-10-27 [1] Bioconductor                            
 ps                     1.6.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.0.2)                          
 R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.0.2)                          
 R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.0.2)                          
 R.utils                2.10.1   2020-08-26 [1] CRAN (R 4.0.2)                          
 R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.2)                          
 rappdirs               0.3.3    2021-01-31 [1] CRAN (R 4.0.2)                          
 Rcpp                   1.0.6    2021-01-15 [1] CRAN (R 4.0.2)                          
 RCurl                  1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)                          
 remotes                2.3.0    2021-04-01 [1] CRAN (R 4.0.2)                          
 rhdf5                  2.34.0   2020-10-27 [1] Bioconductor                            
 rhdf5filters           1.2.0    2020-10-27 [1] Bioconductor                            
 Rhdf5lib               1.12.1   2021-01-26 [1] Bioconductor                            
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.0.2)                          
 rmarkdown              2.7      2021-02-19 [1] CRAN (R 4.0.4)                          
 rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.0.2)                          
 Rsamtools              2.6.0    2020-10-27 [1] Bioconductor                            
 RSpectra               0.16-0   2019-12-01 [1] CRAN (R 4.0.2)                          
 RSQLite                2.2.7    2021-04-22 [1] CRAN (R 4.0.2)                          
 rsthemes               0.1.0    2020-11-03 [1] Github (gadenbuie/rsthemes@6391fe5)     
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.0.2)                          
 rsvd                   1.0.5    2021-04-16 [1] CRAN (R 4.0.2)                          
 rtracklayer            1.50.0   2020-10-27 [1] Bioconductor                            
 Rtsne                  0.15     2018-11-10 [1] CRAN (R 4.0.2)                          
 S4Vectors            * 0.28.1   2020-12-09 [1] Bioconductor                            
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.0.2)                          
 scater               * 1.18.6   2021-02-26 [1] Bioconductor                            
 scran                * 1.18.7   2021-04-16 [1] Bioconductor                            
 scuttle                1.0.4    2020-12-17 [1] Bioconductor                            
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.0.2)                          
 SingleCellExperiment * 1.12.0   2020-10-27 [1] Bioconductor                            
 sparseMatrixStats      1.2.1    2021-02-02 [1] Bioconductor                            
 statmod                1.4.35   2020-10-19 [1] CRAN (R 4.0.2)                          
 stringi                1.5.3    2020-09-09 [1] CRAN (R 4.0.2)                          
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.0.2)                          
 styler                 1.4.1    2021-03-30 [1] CRAN (R 4.0.2)                          
 SummarizedExperiment * 1.20.0   2020-10-27 [1] Bioconductor                            
 suncalc                0.5.0    2019-04-03 [1] CRAN (R 4.0.2)                          
 testthat             * 3.0.2    2021-02-14 [1] CRAN (R 4.0.2)                          
 tibble                 3.1.1    2021-04-18 [1] CRAN (R 4.0.2)                          
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 usethis              * 2.0.1    2021-02-10 [1] CRAN (R 4.0.2)                          
 utf8                   1.2.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 uwot                   0.1.10   2020-12-15 [1] CRAN (R 4.0.2)                          
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.0.2)                          
 vipor                  0.4.5    2017-03-22 [1] CRAN (R 4.0.2)                          
 viridis                0.6.0    2021-04-15 [1] CRAN (R 4.0.2)                          
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.0.2)                          
 xfun                   0.22     2021-03-11 [1] CRAN (R 4.0.2)                          
 XML                    3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)                          
 xml2                   1.3.2    2020-04-23 [1] CRAN (R 4.0.2)                          
 XVector                0.30.0   2020-10-28 [1] Bioconductor                            
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.2)                          
 zlibbioc               1.36.0   2020-10-28 [1] Bioconductor                            

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
main* >

I tried installing this version of DropletUtils 71b161e but I can't due some compilation errors related to scuttle, which makes sense since in BioC 3.12 you changed many packages. I also tried a chimera of BioC 3.11 and upgrading DropletUtils to BioC 3.12 versions (with remotes::install_github()) around the time of the suspected root commit and related packages, but well, I couldn't get it to work tonight.

In any case, do you think that there's a way to get the same emptyDrops() results with current versions of DropletUtils as those we were getting back in BioC 3.11? My sense is that the answer is no since the internals of the function changed. Aka, it's not like a parameter changed and we can just change it from say FALSE to TRUE, etc.

All of this arose due to how in BioC 3.11, the clusters of CD4 and CD8 T-cells looked nicely separated in the t-SNE and don't with current versions as shown below.

Hmm. We can't figure out why these two plots are different with the same code, though different R versions.

BioC 3.11? @PeteHaitch https://t.co/8DJM6pagdj

BioC 3.13https://t.co/t8KiPTyU9D @Bioconductor #rstats @yalbi_ibm @AnaBetty2304 pic.twitter.com/P65u7dnVCo
— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) August 10, 2021

And well, we wanted to be able that question when we teach that part of OSCA on Friday morning.

Best,
Leo

The text was updated successfully, but these errors were encountered:

lcolladotor · 2021-08-11T02:34:02Z

From my embedded tweet that doesn't show up properly on GitHub

Peter Hickey's slides, likely made with BioC 3.11

BioC 3.13

I haven't tried the whole thing up to annotating the cell clusters with BioC 3.12, but the shape of the t-SNE is the same as in BioC 3.13.

PeteHaitch · 2021-08-11T02:55:08Z

FWIW I'm pretty certain figures in my slides are just copy+pasted from the version of OSCA at the time I made the slides.

lcolladotor · 2021-08-11T03:06:27Z

Thanks for the info Pete!

Here's a verification that the t-SNE shape issue is due to DropletUtils::emptyDrops() since saving the e.out <- emptyDrops() output in BioC 3.11 and using it with 3.13 returns the same shape of the t-SNE.

The annotated clusters and all that are different, likely due to other changes.

e.out_BioC3.11.rds.zip

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
# e.out <- emptyDrops(counts(sce.pbmc))
e.out <- readRDS("e.out_BioC3.11.rds")

table(e.out$FDR < 0.001, useNA = "ifany")





library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)

library(celldex)
ref <- celldex::BlueprintEncodeData()

library(SingleR)
pred <- SingleR(
    test = sce.pbmc, ref = ref,
    labels = ref$label.main
)

sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")

Left: BioC 3.13 with e.out from BioC 3.11
Middle: Pete's slides, likely with BioC 3.11
Right: BioC 3.13

lcolladotor · 2021-08-11T03:25:23Z

Actually, the colors was due to

plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")

vs

plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels")

Left: BioC 3.13 with e.out from BioC 3.11
Right: Pete's slides

Show below is the difference between using BioC 3.13 with the old output from DropletUtils::emptyDrops(), which matches the BioC 3.11 output, and the new BioC 3.13 output. In terms of explaining methods, I don't think that the BioC 3.13 output is bad in any way. So we could potentially close this issue. Though in terms of reproducing DropletUtils::emptyDrops() output from BioC 3.11 with newer versions, well, that's a different story. My guess is that the answer is that it simply can't, like I said earlier.

Left: BioC 3.13 with e.out from BioC 3.111
Middle: Pete's slides with BioC 3.11
Right: BioC 3.13 only

Yalbibalderas · 2021-08-11T03:35:46Z

Ohh thanks!! El mar., 10 de agosto de 2021 22:25, Leonardo Collado-Torres < ***@***.***> escribió:

…

Actually, the colors was due to plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels") vs plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels") [image: Screen Shot 2021-08-10 at 11 10 30 PM] <https://user-images.githubusercontent.com/2288213/128963828-314c8272-bc68-488a-877b-1356e2aa7a44.png> Left: BioC 3.13 with e.out from BioC 3.11 Right: Pete's slides Show below is the difference between using BioC 3.13 with the old output from DropletUtils::emptyDrops(), which matches the BioC 3.11 output, and the new BioC 3.13 output. In terms of explaining methods, I don't think that the BioC 3.13 output is bad in any way. So we could potentially close this issue. Though in terms of reproducing DropletUtils::emptyDrops() output from BioC 3.11 with newer versions, well, that's a different story. My guess is that the answer is that it simply can't, like I said earlier. [image: Screen Shot 2021-08-10 at 11 20 16 PM] <https://user-images.githubusercontent.com/2288213/128964570-6873f9c5-af6e-41bf-b326-23224e5b0302.png> Left: BioC 3.13 with e.out from BioC 3.111 Middle: Pete's slides with BioC 3.11 Right: BioC 3.13 only — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#67 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUHQZXGRU6IS2FGRQOTYIDT4HUS5ANCNFSM5B5IWV7A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

LTLA · 2021-08-11T06:52:27Z

Well, if your scope is "somewhere between April and Oct 2020", then the more impactful change is that of #48, which adjusts the spline fit and presumably shifts the retain threshold a bit. You could attempt reverting this change by setting barcode.args=list(exclude.from=0), see the argument description in ?barcodeRanks.

lcolladotor · 2021-08-11T14:21:04Z

I see, thanks Aaron. I wasn't aware of that change, though well, with BioC 3.13 I get the same results with and without barcode.args=list(exclude.from=0). That is:

> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991

> library(scran)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

    windows

Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

Loading required package: scuttle
> library(DropletUtils)
> ?emptyDrops
> # Usemos datos de pbmc4k
> library(BiocFileCache)
Loading required package: dbplyr
> bfc <- BiocFileCache()
> raw.path <- bfcrpath(bfc, file.path(
+     "http://cf.10xgenomics.com/samples",
+     "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
> untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
> 
> library(DropletUtils)
> library(Matrix)

Attaching package: ‘Matrix’

The following object is masked from ‘package:S4Vectors’:

    expand

> fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
> sce.pbmc <- read10xCounts(fname, col.names = TRUE)
> # Anotación de los genes
> library(scater)
Loading required package: ggplot2
> rownames(sce.pbmc) <- uniquifyFeatureNames(
+     rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
> library(EnsDb.Hsapiens.v86)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: AnnotationFilter

Attaching package: ‘AnnotationFilter’

The following object is masked from ‘package:testthat’:

    not


Attaching package: 'ensembldb'

The following object is masked from 'package:stats':

    filter

> location <- mapIds(EnsDb.Hsapiens.v86,
+     keys = rowData(sce.pbmc)$ID,
+     column = "SEQNAME", keytype = "GENEID"
+ )
Warning message:
Unable to map 144 of 33694 requested IDs. 
> # Detección de _droplets_ con células
> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991 
> options(width = 120)
> sessioninfo::session_info()
- Session info -------------------------------------------------------------------------------------------------------
 setting  value                                    
 version  R version 4.1.0 alpha (2021-04-20 r80202)
 os       Windows 10 x64                           
 system   x86_64, mingw32                          
 ui       RStudio                                  
 language (EN)                                     
 collate  English_United States.1252               
 ctype    English_United States.1252               
 tz       America/New_York                         
 date     2021-08-11                               

- Packages -----------------------------------------------------------------------------------------------------------
 ! package                * version    date       lib source                             
   AnnotationDbi          * 1.54.1     2021-06-08 [1] Bioconductor                       
   AnnotationFilter       * 1.16.0     2021-05-19 [1] Bioconductor                       
   AnnotationHub            3.0.1      2021-06-20 [1] Bioconductor                       
   assertthat               0.2.1      2019-03-21 [1] CRAN (R 4.1.0)                     
   backports                1.2.1      2020-12-09 [1] CRAN (R 4.1.0)                     
   beachmat                 2.8.0      2021-05-19 [1] Bioconductor                       
   beeswarm                 0.4.0      2021-06-01 [1] CRAN (R 4.1.0)                     
   Biobase                * 2.52.0     2021-05-19 [1] Bioconductor                       
   BiocFileCache          * 2.0.0      2021-05-19 [1] Bioconductor                       
   BiocGenerics           * 0.38.0     2021-05-19 [1] Bioconductor                       
   BiocIO                   1.2.0      2021-05-19 [1] Bioconductor                       
   BiocManager              1.30.16    2021-06-15 [1] CRAN (R 4.1.0)                     
   BiocNeighbors            1.10.0     2021-05-19 [1] Bioconductor                       
   BiocParallel             1.26.1     2021-07-04 [1] Bioconductor                       
   BiocSingular             1.8.1      2021-06-08 [1] Bioconductor                       
   biocthis                 1.2.0      2021-05-19 [1] Bioconductor                       
   BiocVersion              3.13.1     2021-03-08 [1] Bioconductor                       
   biomaRt                  2.48.2     2021-07-01 [1] Bioconductor                       
   Biostrings               2.60.2     2021-08-05 [1] Bioconductor                       
   bit                      4.0.4      2020-08-04 [1] CRAN (R 4.1.0)                     
   bit64                    4.0.5      2020-08-30 [1] CRAN (R 4.1.0)                     
   bitops                   1.0-7      2021-04-24 [1] CRAN (R 4.1.0)                     
   blob                     1.2.2      2021-07-23 [1] CRAN (R 4.1.0)                     
   bluster                  1.2.1      2021-05-27 [1] Bioconductor                       
   bookdown                 0.22       2021-04-22 [1] CRAN (R 4.1.0)                     
   bslib                    0.2.5.1    2021-05-18 [1] CRAN (R 4.1.0)                     
   cachem                   1.0.5      2021-05-15 [1] CRAN (R 4.1.0)                     
   Cairo                    1.5-12.2   2020-07-07 [1] CRAN (R 4.1.0)                     
   callr                    3.7.0      2021-04-20 [1] CRAN (R 4.1.0)                     
   celldex                  1.2.0      2021-05-20 [1] Bioconductor                       
   circlize                 0.4.13     2021-06-09 [1] CRAN (R 4.1.0)                     
   cli                      3.0.1      2021-07-17 [1] CRAN (R 4.1.0)                     
   clue                     0.3-59     2021-04-16 [1] CRAN (R 4.1.0)                     
   cluster                  2.1.2      2021-04-17 [1] CRAN (R 4.1.0)                     
   codetools                0.2-18     2020-11-04 [1] CRAN (R 4.1.0)                     
   colorspace               2.0-2      2021-06-24 [1] CRAN (R 4.1.0)                     
   colourpicker             1.1.0      2020-09-14 [1] CRAN (R 4.1.0)                     
   ComplexHeatmap           2.8.0      2021-05-19 [1] Bioconductor                       
   cowplot                  1.1.1      2020-12-30 [1] CRAN (R 4.1.0)                     
   crayon                   1.4.1      2021-02-08 [1] CRAN (R 4.1.0)                     
   curl                     4.3.2      2021-06-23 [1] CRAN (R 4.1.0)                     
   data.table               1.14.0     2021-02-21 [1] CRAN (R 4.1.0)                     
   DBI                      1.1.1      2021-01-15 [1] CRAN (R 4.1.0)                     
   dbplyr                 * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)                     
   DelayedArray             0.18.0     2021-05-19 [1] Bioconductor                       
   DelayedMatrixStats       1.14.2     2021-08-08 [1] Bioconductor                       
   desc                     1.3.0      2021-03-05 [1] CRAN (R 4.1.0)                     
   devtools               * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)                     
   digest                   0.6.27     2020-10-24 [1] CRAN (R 4.1.0)                     
   doParallel               1.0.16     2020-10-16 [1] CRAN (R 4.1.0)                     
   dplyr                    1.0.7      2021-06-18 [1] CRAN (R 4.1.0)                     
   dqrng                    0.3.0      2021-05-01 [1] CRAN (R 4.1.0)                     
   DropletUtils           * 1.12.2     2021-07-22 [1] Bioconductor                       
   DT                       0.18       2021-04-14 [1] CRAN (R 4.1.0)                     
   edgeR                    3.34.0     2021-05-19 [1] Bioconductor                       
   ellipsis                 0.3.2      2021-04-29 [1] CRAN (R 4.1.0)                     
   EnsDb.Hsapiens.v86     * 2.99.0     2021-08-08 [1] Bioconductor                       
   ensembldb              * 2.16.4     2021-08-05 [1] Bioconductor                       
   evaluate                 0.14       2019-05-28 [1] CRAN (R 4.1.0)                     
   ExperimentHub            2.0.0      2021-05-19 [1] Bioconductor                       
   fansi                    0.5.0      2021-05-25 [1] CRAN (R 4.1.0)                     
   fastmap                  1.1.0      2021-01-25 [1] CRAN (R 4.1.0)                     
   filelock                 1.0.2      2018-10-05 [1] CRAN (R 4.1.0)                     
   foreach                  1.5.1      2020-10-15 [1] CRAN (R 4.1.0)                     
   fs                       1.5.0      2020-07-31 [1] CRAN (R 4.1.0)                     
   generics                 0.1.0      2020-10-31 [1] CRAN (R 4.1.0)                     
   GenomeInfoDb           * 1.28.1     2021-07-01 [1] Bioconductor                       
   GenomeInfoDbData         1.2.6      2021-08-07 [1] Bioconductor                       
   GenomicAlignments        1.28.0     2021-05-19 [1] Bioconductor                       
   GenomicFeatures        * 1.44.0     2021-05-19 [1] Bioconductor                       
   GenomicRanges          * 1.44.0     2021-05-19 [1] Bioconductor                       
   GetoptLong               1.0.5      2020-12-15 [1] CRAN (R 4.1.0)                     
   ggbeeswarm               0.6.0      2017-08-07 [1] CRAN (R 4.1.0)                     
   ggplot2                * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)                     
   ggrepel                  0.9.1      2021-01-15 [1] CRAN (R 4.1.0)                     
   GlobalOptions            0.1.2      2020-06-10 [1] CRAN (R 4.1.0)                     
   glue                     1.4.2      2020-08-27 [1] CRAN (R 4.1.0)                     
   gridExtra                2.3        2017-09-09 [1] CRAN (R 4.1.0)                     
   gtable                   0.3.0      2019-03-25 [1] CRAN (R 4.1.0)                     
   HDF5Array                1.20.0     2021-05-19 [1] Bioconductor                       
   here                     1.0.1      2020-12-13 [1] CRAN (R 4.1.0)                     
   hms                      1.1.0      2021-05-17 [1] CRAN (R 4.1.0)                     
   htmltools                0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)                     
   htmlwidgets              1.5.3      2020-12-10 [1] CRAN (R 4.1.0)                     
   httpuv                   1.6.1      2021-05-07 [1] CRAN (R 4.1.0)                     
   httr                     1.4.2      2020-07-20 [1] CRAN (R 4.1.0)                     
   igraph                   1.2.6      2020-10-06 [1] CRAN (R 4.1.0)                     
   interactiveDisplayBase   1.30.0     2021-05-19 [1] Bioconductor                       
   IRanges                * 2.26.0     2021-05-19 [1] Bioconductor                       
   irlba                    2.3.3      2019-02-05 [1] CRAN (R 4.1.0)                     
   iSEE                     2.4.0      2021-05-19 [1] Bioconductor                       
   iterators                1.0.13     2020-10-15 [1] CRAN (R 4.1.0)                     
   jquerylib                0.1.4      2021-04-26 [1] CRAN (R 4.1.0)                     
   jsonlite                 1.7.2      2020-12-09 [1] CRAN (R 4.1.0)                     
   KEGGREST                 1.32.0     2021-05-19 [1] Bioconductor                       
   knitr                    1.33       2021-04-24 [1] CRAN (R 4.1.0)                     
   later                    1.2.0      2021-04-23 [1] CRAN (R 4.1.0)                     
   lattice                  0.20-44    2021-05-02 [1] CRAN (R 4.1.0)                     
   lazyeval                 0.2.2      2019-03-15 [1] CRAN (R 4.1.0)                     
   lifecycle                1.0.0      2021-02-15 [1] CRAN (R 4.1.0)                     
   limma                    3.48.2     2021-08-08 [1] Bioconductor                       
   locfit                   1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)                     
   lubridate                1.7.10     2021-02-26 [1] CRAN (R 4.1.0)                     
   magrittr                 2.0.1      2020-11-17 [1] CRAN (R 4.1.0)                     
   Matrix                 * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)                     
   MatrixGenerics         * 1.4.2      2021-08-08 [1] Bioconductor                       
   matrixStats            * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)                     
   memoise                  2.0.0      2021-01-26 [1] CRAN (R 4.1.0)                     
   metapod                  1.0.0      2021-05-19 [1] Bioconductor                       
   mgcv                     1.8-36     2021-06-01 [1] CRAN (R 4.1.0)                     
   mime                     0.11       2021-06-23 [1] CRAN (R 4.1.0)                     
   miniUI                   0.1.1.1    2018-05-18 [1] CRAN (R 4.1.0)                     
   munsell                  0.5.0      2018-06-12 [1] CRAN (R 4.1.0)                     
   nlme                     3.1-152    2021-02-04 [1] CRAN (R 4.1.0)                     
   PCAtools                 2.4.0      2021-05-19 [1] Bioconductor                       
   pillar                   1.6.2      2021-07-29 [1] CRAN (R 4.1.0)                     
   pkgbuild                 1.2.0      2020-12-15 [1] CRAN (R 4.1.0)                     
   pkgconfig                2.0.3      2019-09-22 [1] CRAN (R 4.1.0)                     
   pkgload                  1.2.1      2021-04-06 [1] CRAN (R 4.1.0)                     
   plyr                     1.8.6      2020-03-03 [1] CRAN (R 4.1.0)                     
   png                      0.1-7      2013-12-03 [1] CRAN (R 4.1.0)                     
   prettyunits              1.1.1      2020-01-24 [1] CRAN (R 4.1.0)                     
   processx                 3.5.2      2021-04-30 [1] CRAN (R 4.1.0)                     
   progress                 1.2.2      2019-05-16 [1] CRAN (R 4.1.0)                     
   promises                 1.2.0.1    2021-02-11 [1] CRAN (R 4.1.0)                     
   ProtGenerics             1.24.0     2021-05-19 [1] Bioconductor                       
   ps                       1.6.0      2021-02-28 [1] CRAN (R 4.1.0)                     
   purrr                    0.3.4      2020-04-17 [1] CRAN (R 4.1.0)                     
   R.methodsS3              1.8.1      2020-08-26 [1] CRAN (R 4.1.0)                     
   R.oo                     1.24.0     2020-08-26 [1] CRAN (R 4.1.0)                     
   R.utils                  2.10.1     2020-08-26 [1] CRAN (R 4.1.0)                     
   R6                       2.5.0      2020-10-28 [1] CRAN (R 4.1.0)                     
   rappdirs                 0.3.3      2021-01-31 [1] CRAN (R 4.1.0)                     
   RColorBrewer             1.1-2      2014-12-07 [1] CRAN (R 4.1.0)                     
   Rcpp                     1.0.7      2021-07-07 [1] CRAN (R 4.1.0)                     
   RCurl                    1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)                     
   remotes                  2.4.0      2021-06-02 [1] CRAN (R 4.1.0)                     
   reshape2                 1.4.4      2020-04-09 [1] CRAN (R 4.1.0)                     
   restfulr                 0.0.13     2017-08-06 [1] CRAN (R 4.1.0)                     
   rhdf5                    2.36.0     2021-05-19 [1] Bioconductor                       
 D rhdf5filters             1.4.0      2021-05-19 [1] Bioconductor                       
   Rhdf5lib                 1.14.2     2021-07-06 [1] Bioconductor                       
   rintrojs                 0.3.0      2021-06-06 [1] CRAN (R 4.1.0)                     
   rjson                    0.2.20     2018-06-08 [1] CRAN (R 4.1.0)                     
   rlang                    0.4.11     2021-04-30 [1] CRAN (R 4.1.0)                     
   rmarkdown                2.10       2021-08-06 [1] CRAN (R 4.1.0)                     
   rprojroot                2.0.2      2020-11-15 [1] CRAN (R 4.1.0)                     
   Rsamtools                2.8.0      2021-05-19 [1] Bioconductor                       
   RSQLite                  2.2.7      2021-04-22 [1] CRAN (R 4.1.0)                     
   rsthemes                 0.2.1.9000 2021-04-22 [1] Github (gadenbuie/rsthemes@19299e5)
   rstudioapi               0.13       2020-11-12 [1] CRAN (R 4.1.0)                     
   rsvd                     1.0.5      2021-04-16 [1] CRAN (R 4.1.0)                     
   rtracklayer              1.52.0     2021-05-19 [1] Bioconductor                       
   S4Vectors              * 0.30.0     2021-05-19 [1] Bioconductor                       
   sass                     0.4.0      2021-05-12 [1] CRAN (R 4.1.0)                     
   ScaledMatrix             1.0.0      2021-05-19 [1] Bioconductor                       
   scales                   1.1.1      2020-05-11 [1] CRAN (R 4.1.0)                     
   scater                 * 1.20.1     2021-06-15 [1] Bioconductor                       
   scran                  * 1.20.1     2021-05-24 [1] Bioconductor                       
   scuttle                * 1.2.1      2021-08-05 [1] Bioconductor                       
   sessioninfo              1.1.1      2018-11-05 [1] CRAN (R 4.1.0)                     
   shape                    1.4.6      2021-05-19 [1] CRAN (R 4.1.0)                     
   shiny                    1.6.0      2021-01-25 [1] CRAN (R 4.1.0)                     
   shinyAce                 0.4.1      2019-09-24 [1] CRAN (R 4.1.0)                     
   shinydashboard           0.7.1      2018-10-17 [1] CRAN (R 4.1.0)                     
   shinyjs                  2.0.0      2020-09-09 [1] CRAN (R 4.1.0)                     
   shinyWidgets             0.6.0      2021-03-15 [1] CRAN (R 4.1.0)                     
   SingleCellExperiment   * 1.14.1     2021-05-21 [1] Bioconductor                       
   sparseMatrixStats        1.4.2      2021-08-08 [1] Bioconductor                       
   statmod                  1.4.36     2021-05-10 [1] CRAN (R 4.1.0)                     
   stringi                  1.7.3      2021-07-16 [1] CRAN (R 4.1.0)                     
   stringr                  1.4.0      2019-02-10 [1] CRAN (R 4.1.0)                     
   styler                   1.5.1      2021-07-13 [1] CRAN (R 4.1.0)                     
   SummarizedExperiment   * 1.22.0     2021-05-19 [1] Bioconductor                       
   suncalc                  0.5.0      2019-04-03 [1] CRAN (R 4.1.0)                     
   testthat               * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)                     
   tibble                   3.1.3      2021-07-23 [1] CRAN (R 4.1.0)                     
   tidyselect               1.1.1      2021-04-30 [1] CRAN (R 4.1.0)                     
   usethis                * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)                     
   utf8                     1.2.2      2021-07-24 [1] CRAN (R 4.1.0)                     
   vctrs                    0.3.8      2021-04-29 [1] CRAN (R 4.1.0)                     
   vipor                    0.4.5      2017-03-22 [1] CRAN (R 4.1.0)                     
   viridis                  0.6.1      2021-05-11 [1] CRAN (R 4.1.0)                     
   viridisLite              0.4.0      2021-04-13 [1] CRAN (R 4.1.0)                     
   withr                    2.4.2      2021-04-18 [1] CRAN (R 4.1.0)                     
   xfun                     0.25       2021-08-06 [1] CRAN (R 4.1.0)                     
   XML                      3.99-0.6   2021-03-16 [1] CRAN (R 4.1.0)                     
   xml2                     1.3.2      2020-04-23 [1] CRAN (R 4.1.0)                     
   xtable                   1.8-4      2019-04-21 [1] CRAN (R 4.1.0)                     
   XVector                  0.32.0     2021-05-19 [1] Bioconductor                       
   yaml                     2.2.1      2020-02-01 [1] CRAN (R 4.1.0)                     
   zlibbioc                 1.38.0     2021-05-19 [1] Bioconductor                       

[1] C:/R/R-4.1.0alpha/library

 D -- DLL MD5 mismatch, broken installation.

lcolladotor · 2021-08-11T15:25:56Z

Under a first inspection, it doesn't seem to be that the following commit is the problem 71b161e#diff-cf7a6dd4088835e532ce821c61ceb633ccc99ee3ffe8a812ab3c0a7f46dcaa8b either based on the test below and from looking at https://github.com/LTLA/beachmat/blob/d3e4bafbc840df2d106197d7163c3f3aa6fb1fb1/R/whichNonZero.R#L42-L44.

set.seed(100)
p.n0 <- runif(100)
j <- sample(1:10, 100, TRUE)
mat <- matrix(ncol = 10)
dim(mat)
#> [1]  1 10
max(j)
#> [1] 10

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  4.126877  4.634267
#>  [8]  4.671363  4.275538 12.068117

j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  4.126877  4.634267
#>  [8]  4.671363  4.275538 12.068117

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

^{Created on 2021-08-11 by the reprex package (v2.0.1)}

However, in that case, the full matrix mat has an equal number of columns as the max value of j and all columns of mat have j values.

More complicated case

However, let's say that the full matrix has 15 columns with columns 6 to 10 being all zeros.

## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)
## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1]  1 15

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  0.000000  0.000000
#>  [8]  0.000000  0.000000  0.000000  4.126877  4.634267  4.671363  4.275538
#> [15] 12.068117

j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561        NA        NA
#>  [8]        NA        NA        NA  4.126877  4.634267  4.671363  4.275538
#> [15] 12.068117

testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [10] 0 - NA == NA
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [10] 0 - NA == NA

^{Created on 2021-08-11 by the reprex package (v2.0.1)}

In that case, we end up having NAs in the obs.P_new object. The current version of the code would be https://github.com/MarioniLab/DropletUtils/blob/master/R/emptyDrops.R#L311-L327.

With the data used for this issue, I see that well, we do have lots of non-zero columns as expected.

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
max(nonzero$j)

where max(j) corresponds to ncol(mat).

> library(beachmat)
> 
> mat <- counts(sce.pbmc)
> dim(mat)
[1]  33694 737280
> nonzero <- whichNonZero(mat)
> length(unique(nonzero$j))
[1] 272442
> max(nonzero$j)
[1] 737280

lcolladotor · 2021-08-11T15:26:18Z

pressed the wrong button while drafting my message....

lcolladotor · 2021-08-11T16:08:50Z

Ok, this does seem to be the issue with the actual data as shown in the reprex below with BioC 3.13.

I imagine that it doesn't matter for most droplets since well, they already had NA p-values in the past.

# Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#> 
#>     windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
#> [1]  33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280


m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)

i <- nonzero$i
j <- nonzero$j
x <- nonzero$x


set.seed(100)
p.n0 <- runif(length(j))


by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
summary(obs.P_new)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>     0.0     0.6     1.3    15.6     9.1  2622.2  464838

testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2]  0 - NA == NA
#> [4]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ...
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2]  0 - NA == NA
#> [4]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ...

^{Created on 2021-08-11 by the reprex package (v2.0.1)}

lcolladotor · 2021-08-11T16:23:47Z

Adding obs.P[is.na(obs.P)] <- 0 or obs.P[!seq_len(ncol(block)) %in% unique(j)] <- 0 resolves the issue (with object names from https://github.com/MarioniLab/DropletUtils/blob/master/R/emptyDrops.R#L311-L327) in the reprex.

## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)

## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1]  1 15

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x


j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)


## This works
# obs.P_new[is.na(obs.P_new)] <- 0
## But well, maybe there are NAs in the original p.n0 already

## Here's a more robust fix to simply add 0
## to the columns with no data
obs.P_new[!seq_len(ncol(mat)) %in% unique(j)] <- 0

data.frame(obs.P, obs.P_new)
#>        obs.P obs.P_new
#> 1   3.998302  3.998302
#> 2   4.518524  4.518524
#> 3   2.379189  2.379189
#> 4   5.237262  5.237262
#> 5   6.077561  6.077561
#> 6   0.000000  0.000000
#> 7   0.000000  0.000000
#> 8   0.000000  0.000000
#> 9   0.000000  0.000000
#> 10  0.000000  0.000000
#> 11  4.126877  4.126877
#> 12  4.634267  4.634267
#> 13  4.671363  4.671363
#> 14  4.275538  4.275538
#> 15 12.068117 12.068117

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

^{Created on 2021-08-11 by the reprex package (v2.0.1)}

This is true with the full data too.

# Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#> 
#>     windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
#> [1]  33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280


m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)

i <- nonzero$i
j <- nonzero$j
x <- nonzero$x


set.seed(100)
p.n0 <- runif(length(j))


by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new[!seq_len(ncol(m)) %in% unique(j)] <- 0
summary(obs.P_new)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

^{Created on 2021-08-11 by the reprex package (v2.0.1)}

Next, I'll try on a fresh clone of DropletUtils with the above fix and see if it works.

lcolladotor · 2021-08-11T17:02:05Z

Bah, I still can't get the BioC 3.11 results even with these changes :/

> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991

It didn't work with lcolladotor@790d099

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2021-08-11

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version    date       lib source
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 beachmat               2.9.0      2021-05-19 [1] Bioconductor
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.1.0)
 Biobase              * 2.53.0     2021-05-19 [1] Bioconductor
 BiocFileCache        * 2.1.1      2021-06-23 [1] Bioconductor
 BiocGenerics         * 0.39.1     2021-06-08 [1] Bioconductor
 BiocNeighbors          1.11.0     2021-05-19 [1] Bioconductor
 BiocParallel           1.27.2     2021-07-12 [1] Bioconductor
 BiocSingular           1.9.1      2021-06-08 [1] Bioconductor
 bit                    4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.2      2021-07-23 [1] CRAN (R 4.1.0)
 cachem                 1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr                  3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli                    3.0.1      2021-07-17 [1] CRAN (R 4.1.0)
 colorout               1.2-2      2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
 colorspace             2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
 crayon                 1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 DBI                    1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr               * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.19.1     2021-06-25 [1] Bioconductor
 DelayedMatrixStats     1.15.2     2021-08-05 [1] Bioconductor
 desc                   1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 devtools             * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
 dqrng                  0.3.0      2021-05-01 [1] CRAN (R 4.1.0)
 DropletUtils         * 1.13.3     2021-08-11 [1] Github (lcolladotor/DropletUtils@790d099)
 edgeR                  3.35.0     2021-05-19 [1] Bioconductor
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 fansi                  0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2      2018-10-05 [1] CRAN (R 4.1.0)
 fs                     1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.29.3     2021-07-01 [1] Bioconductor
 GenomeInfoDbData       1.2.6      2021-05-24 [1] Bioconductor
 GenomicRanges        * 1.45.0     2021-05-19 [1] Bioconductor
 ggbeeswarm             0.6.0      2017-08-07 [1] CRAN (R 4.1.0)
 ggplot2              * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
 ggrepel                0.9.1      2021-01-15 [1] CRAN (R 4.1.0)
 glue                   1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 gridExtra              2.3        2017-09-09 [1] CRAN (R 4.1.0)
 gtable                 0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 HDF5Array              1.21.0     2021-05-19 [1] Bioconductor
 httr                   1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
 IRanges              * 2.27.0     2021-05-19 [1] Bioconductor
 irlba                  2.3.3      2019-02-05 [1] CRAN (R 4.1.0)
 lattice                0.20-44    2021-05-02 [1] CRAN (R 4.1.0)
 lifecycle              1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 limma                  3.49.4     2021-08-08 [1] Bioconductor
 locfit                 1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)
 magrittr               2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 Matrix               * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.5.3      2021-08-05 [1] Bioconductor
 matrixStats          * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)
 memoise                2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar                 1.6.2      2021-07-29 [1] CRAN (R 4.1.0)
 pkgbuild               1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits            1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 prompt                 1.0.1      2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
 ps                     1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R.methodsS3            1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
 R.oo                   1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
 R.utils                2.10.1     2020-08-26 [1] CRAN (R 4.1.0)
 R6                     2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)
 remotes                2.4.0      2021-06-02 [1] CRAN (R 4.1.0)
 rhdf5                  2.37.0     2021-05-19 [1] Bioconductor
 rhdf5filters           1.5.0      2021-05-19 [1] Bioconductor
 Rhdf5lib               1.15.2     2021-07-01 [1] Bioconductor
 rlang                  0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 RSQLite                2.2.7      2021-04-22 [1] CRAN (R 4.1.0)
 rsthemes               0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
 rstudioapi             0.13       2020-11-12 [1] CRAN (R 4.1.0)
 rsvd                   1.0.5      2021-04-16 [1] CRAN (R 4.1.0)
 S4Vectors            * 0.31.0     2021-05-19 [1] Bioconductor
 ScaledMatrix           1.1.0      2021-05-19 [1] Bioconductor
 scales                 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 scater               * 1.21.3     2021-08-01 [1] Bioconductor
 scuttle              * 1.3.1      2021-08-05 [1] Bioconductor
 sessioninfo            1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 SingleCellExperiment * 1.15.1     2021-05-21 [1] Bioconductor
 sparseMatrixStats      1.5.2      2021-08-05 [1] Bioconductor
 SummarizedExperiment * 1.23.1     2021-06-24 [1] Bioconductor
 testthat             * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)
 tibble                 3.1.3      2021-07-23 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis              * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 vipor                  0.4.5      2017-03-22 [1] CRAN (R 4.1.0)
 viridis                0.6.1      2021-05-11 [1] CRAN (R 4.1.0)
 viridisLite            0.4.0      2021-04-13 [1] CRAN (R 4.1.0)
 withr                  2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
 XVector                0.33.0     2021-05-19 [1] Bioconductor
 zlibbioc               1.39.0     2021-05-19 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library

nor lcolladotor@2b0bb4b, both with BioC 3.14 (bioc-devel).

> library(BiocFileCache)
Loading required package: dbplyr
fix_emptyDrops_bioc3.11_vs_3.12 > bfc <- BiocFileCache()
fix_emptyDrops_bioc3.11_vs_3.12 > raw.path <- bfcrpath(bfc, file.path(
+     "http://cf.10xgenomics.com/samples",
+     "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
fix_emptyDrops_bioc3.11_vs_3.12 > untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(DropletUtils)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

fix_emptyDrops_bioc3.11_vs_3.12 > library(Matrix)

Attaching package: ‘Matrix’

The following object is masked from ‘package:S4Vectors’:

    expand

fix_emptyDrops_bioc3.11_vs_3.12 > fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
fix_emptyDrops_bioc3.11_vs_3.12 > sce.pbmc <- read10xCounts(fname, col.names = TRUE)
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(scater)
Loading required package: scuttle
Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
fix_emptyDrops_bioc3.11_vs_3.12 > rownames(sce.pbmc) <- uniquifyFeatureNames(
+     rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > # Detección de _droplets_ con células
fix_emptyDrops_bioc3.11_vs_3.12 > set.seed(100)
fix_emptyDrops_bioc3.11_vs_3.12 > #e.out <- readRDS("e.out_BioC3.11.rds")
fix_emptyDrops_bioc3.11_vs_3.12 > e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
fix_emptyDrops_bioc3.11_vs_3.12 > table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2021-08-11

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version    date       lib source
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 beachmat               2.9.0      2021-05-19 [1] Bioconductor
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.1.0)
 Biobase              * 2.53.0     2021-05-19 [1] Bioconductor
 BiocFileCache        * 2.1.1      2021-06-23 [1] Bioconductor
 BiocGenerics         * 0.39.1     2021-06-08 [1] Bioconductor
 BiocNeighbors          1.11.0     2021-05-19 [1] Bioconductor
 BiocParallel           1.27.2     2021-07-12 [1] Bioconductor
 BiocSingular           1.9.1      2021-06-08 [1] Bioconductor
 bit                    4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.2      2021-07-23 [1] CRAN (R 4.1.0)
 cachem                 1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr                  3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli                    3.0.1      2021-07-17 [1] CRAN (R 4.1.0)
 colorout               1.2-2      2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
 colorspace             2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
 crayon                 1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 DBI                    1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr               * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.19.1     2021-06-25 [1] Bioconductor
 DelayedMatrixStats     1.15.2     2021-08-05 [1] Bioconductor
 desc                   1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 devtools             * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
 dqrng                  0.3.0      2021-05-01 [1] CRAN (R 4.1.0)
 DropletUtils         * 1.13.3     2021-08-11 [1] Github (lcolladotor/DropletUtils@2b0bb4b)
 edgeR                  3.35.0     2021-05-19 [1] Bioconductor
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 fansi                  0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2      2018-10-05 [1] CRAN (R 4.1.0)
 fs                     1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.29.3     2021-07-01 [1] Bioconductor
 GenomeInfoDbData       1.2.6      2021-05-24 [1] Bioconductor
 GenomicRanges        * 1.45.0     2021-05-19 [1] Bioconductor
 ggbeeswarm             0.6.0      2017-08-07 [1] CRAN (R 4.1.0)
 ggplot2              * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
 ggrepel                0.9.1      2021-01-15 [1] CRAN (R 4.1.0)
 glue                   1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 gridExtra              2.3        2017-09-09 [1] CRAN (R 4.1.0)
 gtable                 0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 HDF5Array              1.21.0     2021-05-19 [1] Bioconductor
 httr                   1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
 IRanges              * 2.27.0     2021-05-19 [1] Bioconductor
 irlba                  2.3.3      2019-02-05 [1] CRAN (R 4.1.0)
 lattice                0.20-44    2021-05-02 [1] CRAN (R 4.1.0)
 lifecycle              1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 limma                  3.49.4     2021-08-08 [1] Bioconductor
 locfit                 1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)
 magrittr               2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 Matrix               * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.5.3      2021-08-05 [1] Bioconductor
 matrixStats          * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)
 memoise                2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar                 1.6.2      2021-07-29 [1] CRAN (R 4.1.0)
 pkgbuild               1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits            1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 prompt                 1.0.1      2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
 ps                     1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R.methodsS3            1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
 R.oo                   1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
 R.utils                2.10.1     2020-08-26 [1] CRAN (R 4.1.0)
 R6                     2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)
 remotes                2.4.0      2021-06-02 [1] CRAN (R 4.1.0)
 rhdf5                  2.37.0     2021-05-19 [1] Bioconductor
 rhdf5filters           1.5.0      2021-05-19 [1] Bioconductor
 Rhdf5lib               1.15.2     2021-07-01 [1] Bioconductor
 rlang                  0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 RSQLite                2.2.7      2021-04-22 [1] CRAN (R 4.1.0)
 rsthemes               0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
 rstudioapi             0.13       2020-11-12 [1] CRAN (R 4.1.0)
 rsvd                   1.0.5      2021-04-16 [1] CRAN (R 4.1.0)
 S4Vectors            * 0.31.0     2021-05-19 [1] Bioconductor
 ScaledMatrix           1.1.0      2021-05-19 [1] Bioconductor
 scales                 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 scater               * 1.21.3     2021-08-01 [1] Bioconductor
 scuttle              * 1.3.1      2021-08-05 [1] Bioconductor
 sessioninfo            1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 SingleCellExperiment * 1.15.1     2021-05-21 [1] Bioconductor
 sparseMatrixStats      1.5.2      2021-08-05 [1] Bioconductor
 SummarizedExperiment * 1.23.1     2021-06-24 [1] Bioconductor
 testthat             * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)
 tibble                 3.1.3      2021-07-23 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis              * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 vipor                  0.4.5      2017-03-22 [1] CRAN (R 4.1.0)
 viridis                0.6.1      2021-05-11 [1] CRAN (R 4.1.0)
 viridisLite            0.4.0      2021-04-13 [1] CRAN (R 4.1.0)
 withr                  2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
 XVector                0.33.0     2021-05-19 [1] Bioconductor
 zlibbioc               1.39.0     2021-05-19 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library

LTLA · 2021-08-12T07:44:53Z

I would suggest looking at 584320d. Reverting that change and running:

> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
  1056   4233 731991

This is mentioned briefly in the NEWS.

lcolladotor · 2021-08-12T07:57:56Z

Oh interesting, thanks Aaron! I'll do that tomorrow morning. Thanks!

lcolladotor · 2021-08-17T16:41:38Z

Hi Aaron,

Reverting the bugfix reproduces prior results

Using lcolladotor@eb8eb45 which reverts 584320d on BioC 3.14 I do indeed get the same results I get with BioC 3.13 using the emptyDrops() output from BioC 3.11. Thanks!

e.out_BioC3.14_lcolladotor_eb8eb45.rds.zip

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
#e.out <- readRDS("~/Desktop/e.out_BioC3.11.rds")
table(e.out$FDR < 0.001, useNA = "ifany")
# FALSE   TRUE   <NA>
#  1056   4233 731991

saveRDS(e.out, file = "~/Desktop/e.out_BioC3.14_lcolladotor_eb8eb45.rds")


library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)

library(celldex)
ref <- celldex::BlueprintEncodeData()

library(SingleR)
pred <- SingleR(
    test = sce.pbmc, ref = ref,
    labels = ref$label.main
)

sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")


plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels")

I did notice a few small differences vs Pete's slides, like those few red cells to the right of the monocytes on his version and the shape of the B-cells cluster, but well, that's likely to other changes.

Wrapping up

So indeed, 584320d was the main source of the discrepancy. Since it's a bug fix, like you noted above and at 24e8073, we can close this issue on the reproducibility of BioC 3.11 results.

`obs.P` calculation

However, regarding the obs.P calculation, while it didn't affect this issue, due to #67 (comment) and #67 (comment), should I send a (clean) PR for either lcolladotor@790d099 or lcolladotor@2b0bb4b (whichever version you prefer). Or is the current obs.P behavior what you expect?

Best,
Leo

LTLA · 2021-08-23T07:16:42Z

What's wrong with the obs.P?

lcolladotor added a commit to ComunidadBioInfo/cdsb2021_scRNAseq that referenced this issue Aug 11, 2021

Agregar tweet e historia relacionada a MarioniLab/DropletUtils#67

cafba4e

lcolladotor closed this as completed Aug 11, 2021

lcolladotor reopened this Aug 11, 2021

lcolladotor added a commit to lcolladotor/DropletUtils that referenced this issue Aug 11, 2021

v1.13.3 -- attempt to fix MarioniLab#67

790d099

lcolladotor added a commit to lcolladotor/DropletUtils that referenced this issue Aug 17, 2021

Revert 584320d as suggested at MarioniLab#67 (comment)

eb8eb45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

PeteHaitch commented Aug 11, 2021 •

edited

Loading

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

Yalbibalderas commented Aug 11, 2021 via email

LTLA commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021 •

edited

Loading

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

LTLA commented Aug 12, 2021

lcolladotor commented Aug 12, 2021

lcolladotor commented Aug 17, 2021

LTLA commented Aug 23, 2021

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

Comments

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

PeteHaitch commented Aug 11, 2021 • edited Loading

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

Yalbibalderas commented Aug 11, 2021 via email

LTLA commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021 • edited Loading

More complicated case

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

lcolladotor commented Aug 11, 2021

LTLA commented Aug 12, 2021

lcolladotor commented Aug 12, 2021

lcolladotor commented Aug 17, 2021

Reverting the bugfix reproduces prior results

Wrapping up

obs.P calculation

LTLA commented Aug 23, 2021

PeteHaitch commented Aug 11, 2021 •

edited

Loading

lcolladotor commented Aug 11, 2021 •

edited

Loading

`obs.P` calculation