Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emptyDrops BioC 3.11 vs 3.12: different results with the OSCA pbmc 4k dataset #67

Open
lcolladotor opened this issue Aug 11, 2021 · 16 comments

Comments

@lcolladotor
Copy link

Hi Aaron et al,

We (@Yalbibalderas @AnaBVA) noticed that the following code from OSCA returns different t-SNE plots. Eventually we narrowed down the issue to BioC 3.11 vs 3.12 (3.13 is the same as 3.12). In particular, the issue is with changes between those versions for DropletUtils.

The path to DropletUtils is documented at https://twitter.com/lcolladotor/status/1425242252872409092?s=20.

You can reproduce this with:

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")

where BioC 3.11 returns

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
  1056   4233 731991 

and later versions return

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991 

The full code is below for getting to the tSNE (adapted from https://comunidadbioinfo.github.io/cdsb2021_scRNAseq/anotaci%C3%B3n-de-clusters-de-c%C3%A9lulas.html and ultimately from OSCA at https://bioconductor.org/books/release/OSCA/unfiltered-human-pbmcs-10x-genomics.html) is below.

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc))

table(e.out$FDR < 0.001, useNA = "ifany")





library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



options(width = 120)
sessioninfo::session_info()

We think that it comes down to DropletUtils::emptyDrops() and between April 27 and October 27 2020 (when BioC 3.12 was bioc-devel), there's 2 commits that altered it, in particular we think that 71b161e might be the root of the difference.

Screen Shot 2021-08-10 at 10 06 04 PM

Based on the history, we see that commit was early in the BioC 3.12 devel cycle. With that in mind, I did a chimera installation with BioC 3.12 and the BioC 3.11 version of DropletUtils, specifically 51d00b0 (remotes::install_github("MarioniLab/DropletUtils@51d00b0")). That version reproduced the t-SNE and DropletUtils::emptyDrops() we see with BioC 3.11.

> options(width = 120)
main* > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                                      
 version  R version 4.0.5 Patched (2021-03-31 r80247)
 os       macOS Big Sur 11.5.1                       
 system   x86_64, darwin17.0                         
 ui       RStudio                                    
 language (EN)                                       
 collate  en_US.UTF-8                                
 ctype    en_US.UTF-8                                
 tz       America/New_York                           
 date     2021-08-10Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source                                  
 AnnotationDbi        * 1.52.0   2020-10-27 [1] Bioconductor                            
 AnnotationFilter     * 1.14.0   2020-10-27 [1] Bioconductor                            
 askpass                1.1      2019-01-13 [1] CRAN (R 4.0.2)                          
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.0.2)                          
 backports              1.2.1    2020-12-09 [1] CRAN (R 4.0.2)                          
 beachmat               2.6.4    2020-12-20 [1] Bioconductor                            
 beeswarm               0.3.1    2021-03-07 [1] CRAN (R 4.0.2)                          
 Biobase              * 2.50.0   2020-10-27 [1] Bioconductor                            
 BiocFileCache        * 1.14.0   2020-10-27 [1] Bioconductor                            
 BiocGenerics         * 0.36.1   2021-04-16 [1] Bioconductor                            
 BiocNeighbors          1.8.2    2020-12-07 [1] Bioconductor                            
 BiocParallel           1.24.1   2020-11-06 [1] Bioconductor                            
 BiocSingular           1.6.0    2020-10-27 [1] Bioconductor                            
 biocthis               1.0.10   2021-02-26 [1] Bioconductor                            
 biomaRt                2.46.3   2021-02-09 [1] Bioconductor                            
 Biostrings             2.58.0   2020-10-27 [1] Bioconductor                            
 bit                    4.0.4    2020-08-04 [1] CRAN (R 4.0.2)                          
 bit64                  4.0.5    2020-08-30 [1] CRAN (R 4.0.2)                          
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.0.2)                          
 blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.2)                          
 bluster                1.0.0    2020-10-27 [1] Bioconductor                            
 bookdown               0.22     2021-04-22 [1] CRAN (R 4.0.2)                          
 cachem                 1.0.4    2021-02-13 [1] CRAN (R 4.0.2)                          
 callr                  3.7.0    2021-04-20 [1] CRAN (R 4.0.2)                          
 cli                    2.5.0    2021-04-26 [1] CRAN (R 4.0.5)                          
 colorout               1.2-2    2020-11-03 [1] Github (jalvesaq/colorout@726d681)      
 colorspace             2.0-0    2020-11-11 [1] CRAN (R 4.0.2)                          
 cowplot                1.1.1    2020-12-30 [1] CRAN (R 4.0.2)                          
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.0.2)                          
 curl                   4.3.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 data.table             1.14.0   2021-02-21 [1] CRAN (R 4.0.2)                          
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.0.2)                          
 dbplyr               * 2.1.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 DelayedArray           0.16.3   2021-03-24 [1] Bioconductor                            
 DelayedMatrixStats     1.12.3   2021-02-03 [1] Bioconductor                            
 desc                   1.3.0    2021-03-05 [1] CRAN (R 4.0.2)                          
 devtools             * 2.4.0    2021-04-07 [1] CRAN (R 4.0.2)                          
 digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.2)                          
 dplyr                  1.0.5    2021-03-05 [1] CRAN (R 4.0.2)                          
 dqrng                  0.3.0    2021-05-01 [1] CRAN (R 4.0.5)                          
 DropletUtils         * 1.9.0    2021-08-11 [1] Github (MarioniLab/DropletUtils@51d00b0)
 edgeR                  3.32.1   2021-01-14 [1] Bioconductor                            
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.0.2)                          
 EnsDb.Hsapiens.v86   * 2.99.0   2021-08-11 [1] Bioconductor                            
 ensembldb            * 2.14.1   2021-04-19 [1] Bioconductor                            
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.0.1)                          
 fansi                  0.4.2    2021-01-15 [1] CRAN (R 4.0.2)                          
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.0.2)                          
 FNN                    1.1.3    2019-02-15 [1] CRAN (R 4.0.2)                          
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.2)                          
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.0.2)                          
 GenomeInfoDb         * 1.26.7   2021-04-08 [1] Bioconductor                            
 GenomeInfoDbData       1.2.4    2020-11-03 [1] Bioconductor                            
 GenomicAlignments      1.26.0   2020-10-27 [1] Bioconductor                            
 GenomicFeatures      * 1.42.3   2021-04-04 [1] Bioconductor                            
 GenomicRanges        * 1.42.0   2020-10-27 [1] Bioconductor                            
 ggbeeswarm             0.6.0    2017-08-07 [1] CRAN (R 4.0.2)                          
 ggplot2              * 3.3.3    2020-12-30 [1] CRAN (R 4.0.2)                          
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)                          
 gridExtra              2.3      2017-09-09 [1] CRAN (R 4.0.2)                          
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.0.2)                          
 HDF5Array              1.18.1   2021-02-04 [1] Bioconductor                            
 hms                    1.0.0    2021-01-13 [1] CRAN (R 4.0.2)                          
 htmltools              0.5.1.1  2021-01-22 [1] CRAN (R 4.0.2)                          
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.0.2)                          
 igraph                 1.2.6    2020-10-06 [1] CRAN (R 4.0.2)                          
 IRanges              * 2.24.1   2020-12-12 [1] Bioconductor                            
 irlba                  2.3.3    2019-02-05 [1] CRAN (R 4.0.2)                          
 knitr                  1.33     2021-04-24 [1] CRAN (R 4.0.2)                          
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.0.2)                          
 lattice                0.20-44  2021-05-02 [1] CRAN (R 4.0.5)                          
 lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.0.2)                          
 lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.0.2)                          
 limma                  3.46.0   2020-10-27 [1] Bioconductor                            
 locfit                 1.5-9.4  2020-03-25 [1] CRAN (R 4.0.2)                          
 lubridate              1.7.10   2021-02-26 [1] CRAN (R 4.0.2)                          
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.0.2)                          
 Matrix               * 1.3-2    2021-01-06 [1] CRAN (R 4.0.5)                          
 MatrixGenerics       * 1.2.1    2021-01-30 [1] Bioconductor                            
 matrixStats          * 0.58.0   2021-01-29 [1] CRAN (R 4.0.2)                          
 memoise                2.0.0    2021-01-26 [1] CRAN (R 4.0.3)                          
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.0.2)                          
 openssl                1.4.4    2021-04-30 [1] CRAN (R 4.0.2)                          
 pillar                 1.6.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.0.3)                          
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.2)                          
 pkgload                1.2.1    2021-04-06 [1] CRAN (R 4.0.2)                          
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.0.2)                          
 processx               3.5.2    2021-04-30 [1] CRAN (R 4.0.2)                          
 progress               1.2.2    2019-05-16 [1] CRAN (R 4.0.2)                          
 prompt                 1.0.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 ProtGenerics           1.22.0   2020-10-27 [1] Bioconductor                            
 ps                     1.6.0    2021-02-28 [1] CRAN (R 4.0.2)                          
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.0.2)                          
 R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.0.2)                          
 R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.0.2)                          
 R.utils                2.10.1   2020-08-26 [1] CRAN (R 4.0.2)                          
 R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.2)                          
 rappdirs               0.3.3    2021-01-31 [1] CRAN (R 4.0.2)                          
 Rcpp                   1.0.6    2021-01-15 [1] CRAN (R 4.0.2)                          
 RCurl                  1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)                          
 remotes                2.3.0    2021-04-01 [1] CRAN (R 4.0.2)                          
 rhdf5                  2.34.0   2020-10-27 [1] Bioconductor                            
 rhdf5filters           1.2.0    2020-10-27 [1] Bioconductor                            
 Rhdf5lib               1.12.1   2021-01-26 [1] Bioconductor                            
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.0.2)                          
 rmarkdown              2.7      2021-02-19 [1] CRAN (R 4.0.4)                          
 rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.0.2)                          
 Rsamtools              2.6.0    2020-10-27 [1] Bioconductor                            
 RSpectra               0.16-0   2019-12-01 [1] CRAN (R 4.0.2)                          
 RSQLite                2.2.7    2021-04-22 [1] CRAN (R 4.0.2)                          
 rsthemes               0.1.0    2020-11-03 [1] Github (gadenbuie/rsthemes@6391fe5)     
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.0.2)                          
 rsvd                   1.0.5    2021-04-16 [1] CRAN (R 4.0.2)                          
 rtracklayer            1.50.0   2020-10-27 [1] Bioconductor                            
 Rtsne                  0.15     2018-11-10 [1] CRAN (R 4.0.2)                          
 S4Vectors            * 0.28.1   2020-12-09 [1] Bioconductor                            
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.0.2)                          
 scater               * 1.18.6   2021-02-26 [1] Bioconductor                            
 scran                * 1.18.7   2021-04-16 [1] Bioconductor                            
 scuttle                1.0.4    2020-12-17 [1] Bioconductor                            
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.0.2)                          
 SingleCellExperiment * 1.12.0   2020-10-27 [1] Bioconductor                            
 sparseMatrixStats      1.2.1    2021-02-02 [1] Bioconductor                            
 statmod                1.4.35   2020-10-19 [1] CRAN (R 4.0.2)                          
 stringi                1.5.3    2020-09-09 [1] CRAN (R 4.0.2)                          
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.0.2)                          
 styler                 1.4.1    2021-03-30 [1] CRAN (R 4.0.2)                          
 SummarizedExperiment * 1.20.0   2020-10-27 [1] Bioconductor                            
 suncalc                0.5.0    2019-04-03 [1] CRAN (R 4.0.2)                          
 testthat             * 3.0.2    2021-02-14 [1] CRAN (R 4.0.2)                          
 tibble                 3.1.1    2021-04-18 [1] CRAN (R 4.0.2)                          
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.0.2)                          
 usethis              * 2.0.1    2021-02-10 [1] CRAN (R 4.0.2)                          
 utf8                   1.2.1    2021-03-12 [1] CRAN (R 4.0.2)                          
 uwot                   0.1.10   2020-12-15 [1] CRAN (R 4.0.2)                          
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.0.2)                          
 vipor                  0.4.5    2017-03-22 [1] CRAN (R 4.0.2)                          
 viridis                0.6.0    2021-04-15 [1] CRAN (R 4.0.2)                          
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.0.2)                          
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.0.2)                          
 xfun                   0.22     2021-03-11 [1] CRAN (R 4.0.2)                          
 XML                    3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)                          
 xml2                   1.3.2    2020-04-23 [1] CRAN (R 4.0.2)                          
 XVector                0.30.0   2020-10-28 [1] Bioconductor                            
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.2)                          
 zlibbioc               1.36.0   2020-10-28 [1] Bioconductor                            

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
main* > 

I tried installing this version of DropletUtils 71b161e but I can't due some compilation errors related to scuttle, which makes sense since in BioC 3.12 you changed many packages. I also tried a chimera of BioC 3.11 and upgrading DropletUtils to BioC 3.12 versions (with remotes::install_github()) around the time of the suspected root commit and related packages, but well, I couldn't get it to work tonight.

In any case, do you think that there's a way to get the same emptyDrops() results with current versions of DropletUtils as those we were getting back in BioC 3.11? My sense is that the answer is no since the internals of the function changed. Aka, it's not like a parameter changed and we can just change it from say FALSE to TRUE, etc.

All of this arose due to how in BioC 3.11, the clusters of CD4 and CD8 T-cells looked nicely separated in the t-SNE and don't with current versions as shown below.

Hmm. We can't figure out why these two plots are different with the same code, though different R versions.

BioC 3.11? @PeteHaitch https://t.co/8DJM6pagdj

BioC 3.13https://t.co/t8KiPTyU9D@Bioconductor #rstats @yalbi_ibm @AnaBetty2304 pic.twitter.com/P65u7dnVCo

— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) August 10, 2021
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

And well, we wanted to be able that question when we teach that part of OSCA on Friday morning.

Best,
Leo

@lcolladotor
Copy link
Author

From my embedded tweet that doesn't show up properly on GitHub

Peter Hickey's slides, likely made with BioC 3.11

image

BioC 3.13

image

I haven't tried the whole thing up to annotating the cell clusters with BioC 3.12, but the shape of the t-SNE is the same as in BioC 3.13.

@PeteHaitch
Copy link

PeteHaitch commented Aug 11, 2021

FWIW I'm pretty certain figures in my slides are just copy+pasted from the version of OSCA at the time I made the slides.

@lcolladotor
Copy link
Author

Thanks for the info Pete!


Here's a verification that the t-SNE shape issue is due to DropletUtils::emptyDrops() since saving the e.out <- emptyDrops() output in BioC 3.11 and using it with 3.13 returns the same shape of the t-SNE.

The annotated clusters and all that are different, likely due to other changes.

e.out_BioC3.11.rds.zip

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
# e.out <- emptyDrops(counts(sce.pbmc))
e.out <- readRDS("e.out_BioC3.11.rds")

table(e.out$FDR < 0.001, useNA = "ifany")





library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)

library(celldex)
ref <- celldex::BlueprintEncodeData()

library(SingleR)
pred <- SingleR(
    test = sce.pbmc, ref = ref,
    labels = ref$label.main
)

sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")

Screen Shot 2021-08-10 at 10 58 17 PM

Left: BioC 3.13 with e.out from BioC 3.11
Middle: Pete's slides, likely with BioC 3.11
Right: BioC 3.13

@lcolladotor
Copy link
Author

Actually, the colors was due to

plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")

vs

plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels")

Screen Shot 2021-08-10 at 11 10 30 PM

Left: BioC 3.13 with e.out from BioC 3.11
Right: Pete's slides

Show below is the difference between using BioC 3.13 with the old output from DropletUtils::emptyDrops(), which matches the BioC 3.11 output, and the new BioC 3.13 output. In terms of explaining methods, I don't think that the BioC 3.13 output is bad in any way. So we could potentially close this issue. Though in terms of reproducing DropletUtils::emptyDrops() output from BioC 3.11 with newer versions, well, that's a different story. My guess is that the answer is that it simply can't, like I said earlier.

Screen Shot 2021-08-10 at 11 20 16 PM

Left: BioC 3.13 with e.out from BioC 3.111
Middle: Pete's slides with BioC 3.11
Right: BioC 3.13 only

@Yalbibalderas
Copy link

Yalbibalderas commented Aug 11, 2021 via email

lcolladotor added a commit to ComunidadBioInfo/cdsb2021_scRNAseq that referenced this issue Aug 11, 2021
@LTLA
Copy link
Collaborator

LTLA commented Aug 11, 2021

Well, if your scope is "somewhere between April and Oct 2020", then the more impactful change is that of #48, which adjusts the spline fit and presumably shifts the retain threshold a bit. You could attempt reverting this change by setting barcode.args=list(exclude.from=0), see the argument description in ?barcodeRanks.

@lcolladotor
Copy link
Author

I see, thanks Aaron. I wasn't aware of that change, though well, with BioC 3.13 I get the same results with and without barcode.args=list(exclude.from=0). That is:

> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991 
> library(scran)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package:MatrixGenericsThe following objects are masked frompackage:matrixStats:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package:S4VectorsThe following objects are masked frompackage:base:

    expand.grid, I, unname

Loading required package: IRanges

Attaching package:IRangesThe following object is masked frompackage:grDevices:

    windows

Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package:BiobaseThe following object is masked frompackage:MatrixGenerics:

    rowMedians

The following objects are masked frompackage:matrixStats:

    anyMissing, rowMedians

Loading required package: scuttle
> library(DropletUtils)
> ?emptyDrops
> # Usemos datos de pbmc4k
> library(BiocFileCache)
Loading required package: dbplyr
> bfc <- BiocFileCache()
> raw.path <- bfcrpath(bfc, file.path(
+     "http://cf.10xgenomics.com/samples",
+     "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
> untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
> 
> library(DropletUtils)
> library(Matrix)

Attaching package:MatrixThe following object is masked frompackage:S4Vectors:

    expand

> fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
> sce.pbmc <- read10xCounts(fname, col.names = TRUE)
> # Anotación de los genes
> library(scater)
Loading required package: ggplot2
> rownames(sce.pbmc) <- uniquifyFeatureNames(
+     rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
> library(EnsDb.Hsapiens.v86)
Loading required package: ensembldb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: AnnotationFilter

Attaching package:AnnotationFilterThe following object is masked frompackage:testthat:

    not


Attaching package: 'ensembldb'

The following object is masked from 'package:stats':

    filter

> location <- mapIds(EnsDb.Hsapiens.v86,
+     keys = rowData(sce.pbmc)$ID,
+     column = "SEQNAME", keytype = "GENEID"
+ )
Warning message:
Unable to map 144 of 33694 requested IDs. 
> # Detección de _droplets_ con células
> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
   989   4300 731991 
> options(width = 120)
> sessioninfo::session_info()
- Session info -------------------------------------------------------------------------------------------------------
 setting  value                                    
 version  R version 4.1.0 alpha (2021-04-20 r80202)
 os       Windows 10 x64                           
 system   x86_64, mingw32                          
 ui       RStudio                                  
 language (EN)                                     
 collate  English_United States.1252               
 ctype    English_United States.1252               
 tz       America/New_York                         
 date     2021-08-11                               

- Packages -----------------------------------------------------------------------------------------------------------
 ! package                * version    date       lib source                             
   AnnotationDbi          * 1.54.1     2021-06-08 [1] Bioconductor                       
   AnnotationFilter       * 1.16.0     2021-05-19 [1] Bioconductor                       
   AnnotationHub            3.0.1      2021-06-20 [1] Bioconductor                       
   assertthat               0.2.1      2019-03-21 [1] CRAN (R 4.1.0)                     
   backports                1.2.1      2020-12-09 [1] CRAN (R 4.1.0)                     
   beachmat                 2.8.0      2021-05-19 [1] Bioconductor                       
   beeswarm                 0.4.0      2021-06-01 [1] CRAN (R 4.1.0)                     
   Biobase                * 2.52.0     2021-05-19 [1] Bioconductor                       
   BiocFileCache          * 2.0.0      2021-05-19 [1] Bioconductor                       
   BiocGenerics           * 0.38.0     2021-05-19 [1] Bioconductor                       
   BiocIO                   1.2.0      2021-05-19 [1] Bioconductor                       
   BiocManager              1.30.16    2021-06-15 [1] CRAN (R 4.1.0)                     
   BiocNeighbors            1.10.0     2021-05-19 [1] Bioconductor                       
   BiocParallel             1.26.1     2021-07-04 [1] Bioconductor                       
   BiocSingular             1.8.1      2021-06-08 [1] Bioconductor                       
   biocthis                 1.2.0      2021-05-19 [1] Bioconductor                       
   BiocVersion              3.13.1     2021-03-08 [1] Bioconductor                       
   biomaRt                  2.48.2     2021-07-01 [1] Bioconductor                       
   Biostrings               2.60.2     2021-08-05 [1] Bioconductor                       
   bit                      4.0.4      2020-08-04 [1] CRAN (R 4.1.0)                     
   bit64                    4.0.5      2020-08-30 [1] CRAN (R 4.1.0)                     
   bitops                   1.0-7      2021-04-24 [1] CRAN (R 4.1.0)                     
   blob                     1.2.2      2021-07-23 [1] CRAN (R 4.1.0)                     
   bluster                  1.2.1      2021-05-27 [1] Bioconductor                       
   bookdown                 0.22       2021-04-22 [1] CRAN (R 4.1.0)                     
   bslib                    0.2.5.1    2021-05-18 [1] CRAN (R 4.1.0)                     
   cachem                   1.0.5      2021-05-15 [1] CRAN (R 4.1.0)                     
   Cairo                    1.5-12.2   2020-07-07 [1] CRAN (R 4.1.0)                     
   callr                    3.7.0      2021-04-20 [1] CRAN (R 4.1.0)                     
   celldex                  1.2.0      2021-05-20 [1] Bioconductor                       
   circlize                 0.4.13     2021-06-09 [1] CRAN (R 4.1.0)                     
   cli                      3.0.1      2021-07-17 [1] CRAN (R 4.1.0)                     
   clue                     0.3-59     2021-04-16 [1] CRAN (R 4.1.0)                     
   cluster                  2.1.2      2021-04-17 [1] CRAN (R 4.1.0)                     
   codetools                0.2-18     2020-11-04 [1] CRAN (R 4.1.0)                     
   colorspace               2.0-2      2021-06-24 [1] CRAN (R 4.1.0)                     
   colourpicker             1.1.0      2020-09-14 [1] CRAN (R 4.1.0)                     
   ComplexHeatmap           2.8.0      2021-05-19 [1] Bioconductor                       
   cowplot                  1.1.1      2020-12-30 [1] CRAN (R 4.1.0)                     
   crayon                   1.4.1      2021-02-08 [1] CRAN (R 4.1.0)                     
   curl                     4.3.2      2021-06-23 [1] CRAN (R 4.1.0)                     
   data.table               1.14.0     2021-02-21 [1] CRAN (R 4.1.0)                     
   DBI                      1.1.1      2021-01-15 [1] CRAN (R 4.1.0)                     
   dbplyr                 * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)                     
   DelayedArray             0.18.0     2021-05-19 [1] Bioconductor                       
   DelayedMatrixStats       1.14.2     2021-08-08 [1] Bioconductor                       
   desc                     1.3.0      2021-03-05 [1] CRAN (R 4.1.0)                     
   devtools               * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)                     
   digest                   0.6.27     2020-10-24 [1] CRAN (R 4.1.0)                     
   doParallel               1.0.16     2020-10-16 [1] CRAN (R 4.1.0)                     
   dplyr                    1.0.7      2021-06-18 [1] CRAN (R 4.1.0)                     
   dqrng                    0.3.0      2021-05-01 [1] CRAN (R 4.1.0)                     
   DropletUtils           * 1.12.2     2021-07-22 [1] Bioconductor                       
   DT                       0.18       2021-04-14 [1] CRAN (R 4.1.0)                     
   edgeR                    3.34.0     2021-05-19 [1] Bioconductor                       
   ellipsis                 0.3.2      2021-04-29 [1] CRAN (R 4.1.0)                     
   EnsDb.Hsapiens.v86     * 2.99.0     2021-08-08 [1] Bioconductor                       
   ensembldb              * 2.16.4     2021-08-05 [1] Bioconductor                       
   evaluate                 0.14       2019-05-28 [1] CRAN (R 4.1.0)                     
   ExperimentHub            2.0.0      2021-05-19 [1] Bioconductor                       
   fansi                    0.5.0      2021-05-25 [1] CRAN (R 4.1.0)                     
   fastmap                  1.1.0      2021-01-25 [1] CRAN (R 4.1.0)                     
   filelock                 1.0.2      2018-10-05 [1] CRAN (R 4.1.0)                     
   foreach                  1.5.1      2020-10-15 [1] CRAN (R 4.1.0)                     
   fs                       1.5.0      2020-07-31 [1] CRAN (R 4.1.0)                     
   generics                 0.1.0      2020-10-31 [1] CRAN (R 4.1.0)                     
   GenomeInfoDb           * 1.28.1     2021-07-01 [1] Bioconductor                       
   GenomeInfoDbData         1.2.6      2021-08-07 [1] Bioconductor                       
   GenomicAlignments        1.28.0     2021-05-19 [1] Bioconductor                       
   GenomicFeatures        * 1.44.0     2021-05-19 [1] Bioconductor                       
   GenomicRanges          * 1.44.0     2021-05-19 [1] Bioconductor                       
   GetoptLong               1.0.5      2020-12-15 [1] CRAN (R 4.1.0)                     
   ggbeeswarm               0.6.0      2017-08-07 [1] CRAN (R 4.1.0)                     
   ggplot2                * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)                     
   ggrepel                  0.9.1      2021-01-15 [1] CRAN (R 4.1.0)                     
   GlobalOptions            0.1.2      2020-06-10 [1] CRAN (R 4.1.0)                     
   glue                     1.4.2      2020-08-27 [1] CRAN (R 4.1.0)                     
   gridExtra                2.3        2017-09-09 [1] CRAN (R 4.1.0)                     
   gtable                   0.3.0      2019-03-25 [1] CRAN (R 4.1.0)                     
   HDF5Array                1.20.0     2021-05-19 [1] Bioconductor                       
   here                     1.0.1      2020-12-13 [1] CRAN (R 4.1.0)                     
   hms                      1.1.0      2021-05-17 [1] CRAN (R 4.1.0)                     
   htmltools                0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)                     
   htmlwidgets              1.5.3      2020-12-10 [1] CRAN (R 4.1.0)                     
   httpuv                   1.6.1      2021-05-07 [1] CRAN (R 4.1.0)                     
   httr                     1.4.2      2020-07-20 [1] CRAN (R 4.1.0)                     
   igraph                   1.2.6      2020-10-06 [1] CRAN (R 4.1.0)                     
   interactiveDisplayBase   1.30.0     2021-05-19 [1] Bioconductor                       
   IRanges                * 2.26.0     2021-05-19 [1] Bioconductor                       
   irlba                    2.3.3      2019-02-05 [1] CRAN (R 4.1.0)                     
   iSEE                     2.4.0      2021-05-19 [1] Bioconductor                       
   iterators                1.0.13     2020-10-15 [1] CRAN (R 4.1.0)                     
   jquerylib                0.1.4      2021-04-26 [1] CRAN (R 4.1.0)                     
   jsonlite                 1.7.2      2020-12-09 [1] CRAN (R 4.1.0)                     
   KEGGREST                 1.32.0     2021-05-19 [1] Bioconductor                       
   knitr                    1.33       2021-04-24 [1] CRAN (R 4.1.0)                     
   later                    1.2.0      2021-04-23 [1] CRAN (R 4.1.0)                     
   lattice                  0.20-44    2021-05-02 [1] CRAN (R 4.1.0)                     
   lazyeval                 0.2.2      2019-03-15 [1] CRAN (R 4.1.0)                     
   lifecycle                1.0.0      2021-02-15 [1] CRAN (R 4.1.0)                     
   limma                    3.48.2     2021-08-08 [1] Bioconductor                       
   locfit                   1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)                     
   lubridate                1.7.10     2021-02-26 [1] CRAN (R 4.1.0)                     
   magrittr                 2.0.1      2020-11-17 [1] CRAN (R 4.1.0)                     
   Matrix                 * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)                     
   MatrixGenerics         * 1.4.2      2021-08-08 [1] Bioconductor                       
   matrixStats            * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)                     
   memoise                  2.0.0      2021-01-26 [1] CRAN (R 4.1.0)                     
   metapod                  1.0.0      2021-05-19 [1] Bioconductor                       
   mgcv                     1.8-36     2021-06-01 [1] CRAN (R 4.1.0)                     
   mime                     0.11       2021-06-23 [1] CRAN (R 4.1.0)                     
   miniUI                   0.1.1.1    2018-05-18 [1] CRAN (R 4.1.0)                     
   munsell                  0.5.0      2018-06-12 [1] CRAN (R 4.1.0)                     
   nlme                     3.1-152    2021-02-04 [1] CRAN (R 4.1.0)                     
   PCAtools                 2.4.0      2021-05-19 [1] Bioconductor                       
   pillar                   1.6.2      2021-07-29 [1] CRAN (R 4.1.0)                     
   pkgbuild                 1.2.0      2020-12-15 [1] CRAN (R 4.1.0)                     
   pkgconfig                2.0.3      2019-09-22 [1] CRAN (R 4.1.0)                     
   pkgload                  1.2.1      2021-04-06 [1] CRAN (R 4.1.0)                     
   plyr                     1.8.6      2020-03-03 [1] CRAN (R 4.1.0)                     
   png                      0.1-7      2013-12-03 [1] CRAN (R 4.1.0)                     
   prettyunits              1.1.1      2020-01-24 [1] CRAN (R 4.1.0)                     
   processx                 3.5.2      2021-04-30 [1] CRAN (R 4.1.0)                     
   progress                 1.2.2      2019-05-16 [1] CRAN (R 4.1.0)                     
   promises                 1.2.0.1    2021-02-11 [1] CRAN (R 4.1.0)                     
   ProtGenerics             1.24.0     2021-05-19 [1] Bioconductor                       
   ps                       1.6.0      2021-02-28 [1] CRAN (R 4.1.0)                     
   purrr                    0.3.4      2020-04-17 [1] CRAN (R 4.1.0)                     
   R.methodsS3              1.8.1      2020-08-26 [1] CRAN (R 4.1.0)                     
   R.oo                     1.24.0     2020-08-26 [1] CRAN (R 4.1.0)                     
   R.utils                  2.10.1     2020-08-26 [1] CRAN (R 4.1.0)                     
   R6                       2.5.0      2020-10-28 [1] CRAN (R 4.1.0)                     
   rappdirs                 0.3.3      2021-01-31 [1] CRAN (R 4.1.0)                     
   RColorBrewer             1.1-2      2014-12-07 [1] CRAN (R 4.1.0)                     
   Rcpp                     1.0.7      2021-07-07 [1] CRAN (R 4.1.0)                     
   RCurl                    1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)                     
   remotes                  2.4.0      2021-06-02 [1] CRAN (R 4.1.0)                     
   reshape2                 1.4.4      2020-04-09 [1] CRAN (R 4.1.0)                     
   restfulr                 0.0.13     2017-08-06 [1] CRAN (R 4.1.0)                     
   rhdf5                    2.36.0     2021-05-19 [1] Bioconductor                       
 D rhdf5filters             1.4.0      2021-05-19 [1] Bioconductor                       
   Rhdf5lib                 1.14.2     2021-07-06 [1] Bioconductor                       
   rintrojs                 0.3.0      2021-06-06 [1] CRAN (R 4.1.0)                     
   rjson                    0.2.20     2018-06-08 [1] CRAN (R 4.1.0)                     
   rlang                    0.4.11     2021-04-30 [1] CRAN (R 4.1.0)                     
   rmarkdown                2.10       2021-08-06 [1] CRAN (R 4.1.0)                     
   rprojroot                2.0.2      2020-11-15 [1] CRAN (R 4.1.0)                     
   Rsamtools                2.8.0      2021-05-19 [1] Bioconductor                       
   RSQLite                  2.2.7      2021-04-22 [1] CRAN (R 4.1.0)                     
   rsthemes                 0.2.1.9000 2021-04-22 [1] Github (gadenbuie/rsthemes@19299e5)
   rstudioapi               0.13       2020-11-12 [1] CRAN (R 4.1.0)                     
   rsvd                     1.0.5      2021-04-16 [1] CRAN (R 4.1.0)                     
   rtracklayer              1.52.0     2021-05-19 [1] Bioconductor                       
   S4Vectors              * 0.30.0     2021-05-19 [1] Bioconductor                       
   sass                     0.4.0      2021-05-12 [1] CRAN (R 4.1.0)                     
   ScaledMatrix             1.0.0      2021-05-19 [1] Bioconductor                       
   scales                   1.1.1      2020-05-11 [1] CRAN (R 4.1.0)                     
   scater                 * 1.20.1     2021-06-15 [1] Bioconductor                       
   scran                  * 1.20.1     2021-05-24 [1] Bioconductor                       
   scuttle                * 1.2.1      2021-08-05 [1] Bioconductor                       
   sessioninfo              1.1.1      2018-11-05 [1] CRAN (R 4.1.0)                     
   shape                    1.4.6      2021-05-19 [1] CRAN (R 4.1.0)                     
   shiny                    1.6.0      2021-01-25 [1] CRAN (R 4.1.0)                     
   shinyAce                 0.4.1      2019-09-24 [1] CRAN (R 4.1.0)                     
   shinydashboard           0.7.1      2018-10-17 [1] CRAN (R 4.1.0)                     
   shinyjs                  2.0.0      2020-09-09 [1] CRAN (R 4.1.0)                     
   shinyWidgets             0.6.0      2021-03-15 [1] CRAN (R 4.1.0)                     
   SingleCellExperiment   * 1.14.1     2021-05-21 [1] Bioconductor                       
   sparseMatrixStats        1.4.2      2021-08-08 [1] Bioconductor                       
   statmod                  1.4.36     2021-05-10 [1] CRAN (R 4.1.0)                     
   stringi                  1.7.3      2021-07-16 [1] CRAN (R 4.1.0)                     
   stringr                  1.4.0      2019-02-10 [1] CRAN (R 4.1.0)                     
   styler                   1.5.1      2021-07-13 [1] CRAN (R 4.1.0)                     
   SummarizedExperiment   * 1.22.0     2021-05-19 [1] Bioconductor                       
   suncalc                  0.5.0      2019-04-03 [1] CRAN (R 4.1.0)                     
   testthat               * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)                     
   tibble                   3.1.3      2021-07-23 [1] CRAN (R 4.1.0)                     
   tidyselect               1.1.1      2021-04-30 [1] CRAN (R 4.1.0)                     
   usethis                * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)                     
   utf8                     1.2.2      2021-07-24 [1] CRAN (R 4.1.0)                     
   vctrs                    0.3.8      2021-04-29 [1] CRAN (R 4.1.0)                     
   vipor                    0.4.5      2017-03-22 [1] CRAN (R 4.1.0)                     
   viridis                  0.6.1      2021-05-11 [1] CRAN (R 4.1.0)                     
   viridisLite              0.4.0      2021-04-13 [1] CRAN (R 4.1.0)                     
   withr                    2.4.2      2021-04-18 [1] CRAN (R 4.1.0)                     
   xfun                     0.25       2021-08-06 [1] CRAN (R 4.1.0)                     
   XML                      3.99-0.6   2021-03-16 [1] CRAN (R 4.1.0)                     
   xml2                     1.3.2      2020-04-23 [1] CRAN (R 4.1.0)                     
   xtable                   1.8-4      2019-04-21 [1] CRAN (R 4.1.0)                     
   XVector                  0.32.0     2021-05-19 [1] Bioconductor                       
   yaml                     2.2.1      2020-02-01 [1] CRAN (R 4.1.0)                     
   zlibbioc                 1.38.0     2021-05-19 [1] Bioconductor                       

[1] C:/R/R-4.1.0alpha/library

 D -- DLL MD5 mismatch, broken installation.

@lcolladotor
Copy link
Author

lcolladotor commented Aug 11, 2021

Under a first inspection, it doesn't seem to be that the following commit is the problem 71b161e#diff-cf7a6dd4088835e532ce821c61ceb633ccc99ee3ffe8a812ab3c0a7f46dcaa8b either based on the test below and from looking at https://github.com/LTLA/beachmat/blob/d3e4bafbc840df2d106197d7163c3f3aa6fb1fb1/R/whichNonZero.R#L42-L44.

set.seed(100)
p.n0 <- runif(100)
j <- sample(1:10, 100, TRUE)
mat <- matrix(ncol = 10)
dim(mat)
#> [1]  1 10
max(j)
#> [1] 10

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  4.126877  4.634267
#>  [8]  4.671363  4.275538 12.068117

j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  4.126877  4.634267
#>  [8]  4.671363  4.275538 12.068117

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

Created on 2021-08-11 by the reprex package (v2.0.1)

However, in that case, the full matrix mat has an equal number of columns as the max value of j and all columns of mat have j values.

More complicated case

However, let's say that the full matrix has 15 columns with columns 6 to 10 being all zeros.

## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)
## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1]  1 15

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
obs.P
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561  0.000000  0.000000
#>  [8]  0.000000  0.000000  0.000000  4.126877  4.634267  4.671363  4.275538
#> [15] 12.068117

j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new
#>  [1]  3.998302  4.518524  2.379189  5.237262  6.077561        NA        NA
#>  [8]        NA        NA        NA  4.126877  4.634267  4.671363  4.275538
#> [15] 12.068117

testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [10] 0 - NA == NA
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 5/15 mismatches (average diff: NaN)
#> [6]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [10] 0 - NA == NA

Created on 2021-08-11 by the reprex package (v2.0.1)

In that case, we end up having NAs in the obs.P_new object. The current version of the code would be https://github.com/MarioniLab/DropletUtils/blob/master/R/emptyDrops.R#L311-L327.

With the data used for this issue, I see that well, we do have lots of non-zero columns as expected.

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
max(nonzero$j)

where max(j) corresponds to ncol(mat).

> library(beachmat)
> 
> mat <- counts(sce.pbmc)
> dim(mat)
[1]  33694 737280
> nonzero <- whichNonZero(mat)
> length(unique(nonzero$j))
[1] 272442
> max(nonzero$j)
[1] 737280

@lcolladotor
Copy link
Author

pressed the wrong button while drafting my message....

@lcolladotor
Copy link
Author

Ok, this does seem to be the issue with the actual data as shown in the reprex below with BioC 3.13.

I imagine that it doesn't matter for most droplets since well, they already had NA p-values in the past.

# Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#> 
#>     windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
#> [1]  33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280


m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)

i <- nonzero$i
j <- nonzero$j
x <- nonzero$x


set.seed(100)
p.n0 <- runif(length(j))


by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
summary(obs.P_new)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>     0.0     0.6     1.3    15.6     9.1  2622.2  464838

testthat::expect_equal(obs.P, obs.P_new)
#> Error: `obs.P` not equal to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2]  0 - NA == NA
#> [4]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ...
testthat::expect_equivalent(obs.P, obs.P_new)
#> Error: `obs.P` not equivalent to `obs.P_new`.
#> 464838/737280 mismatches (average diff: NaN)
#> [2]  0 - NA == NA
#> [4]  0 - NA == NA
#> [7]  0 - NA == NA
#> [8]  0 - NA == NA
#> [9]  0 - NA == NA
#> [11] 0 - NA == NA
#> [16] 0 - NA == NA
#> [17] 0 - NA == NA
#> [18] 0 - NA == NA
#> ...

Created on 2021-08-11 by the reprex package (v2.0.1)

@lcolladotor
Copy link
Author

Adding obs.P[is.na(obs.P)] <- 0 or obs.P[!seq_len(ncol(block)) %in% unique(j)] <- 0 resolves the issue (with object names from https://github.com/MarioniLab/DropletUtils/blob/master/R/emptyDrops.R#L311-L327) in the reprex.

## 100 random p-values for 100 random j columns
## in a sparse matrix
set.seed(100)
p.n0 <- runif(100)

## Let's say that we have empty colums 6 through 10
j <- sample(c(1:5, 11:15), 100, TRUE)
mat <- matrix(ncol = max(j))
dim(mat)
#> [1]  1 15

by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x


j_new <- factor(j, levels=seq_len(ncol(mat)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)


## This works
# obs.P_new[is.na(obs.P_new)] <- 0
## But well, maybe there are NAs in the original p.n0 already

## Here's a more robust fix to simply add 0
## to the columns with no data
obs.P_new[!seq_len(ncol(mat)) %in% unique(j)] <- 0

data.frame(obs.P, obs.P_new)
#>        obs.P obs.P_new
#> 1   3.998302  3.998302
#> 2   4.518524  4.518524
#> 3   2.379189  2.379189
#> 4   5.237262  5.237262
#> 5   6.077561  6.077561
#> 6   0.000000  0.000000
#> 7   0.000000  0.000000
#> 8   0.000000  0.000000
#> 9   0.000000  0.000000
#> 10  0.000000  0.000000
#> 11  4.126877  4.126877
#> 12  4.634267  4.634267
#> 13  4.671363  4.671363
#> 14  4.275538  4.275538
#> 15 12.068117 12.068117

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

Created on 2021-08-11 by the reprex package (v2.0.1)

This is true with the full data too.

# Usemos datos de pbmc4k
library(BiocFileCache)
#> Loading required package: dbplyr
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:grDevices':
#> 
#>     windows
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

# Anotación de los genes
library(scater)
#> Loading required package: scuttle
#> Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

library(beachmat)

mat <- counts(sce.pbmc)
dim(mat)
#> [1]  33694 737280
nonzero <- whichNonZero(mat)
length(unique(nonzero$j))
#> [1] 272442
max(nonzero$j)
#> [1] 737280


m <- DropletUtils:::.rounded_to_integer(mat, 0)
nonzero <- whichNonZero(m)

i <- nonzero$i
j <- nonzero$j
x <- nonzero$x


set.seed(100)
p.n0 <- runif(length(j))


by.col <- aggregate(p.n0, list(Col=j), sum)
obs.P <- numeric(ncol(mat))
obs.P[by.col$Col] <- by.col$x
summary(obs.P)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

j_new <- factor(j, levels=seq_len(ncol(m)))
obs.P_new <- tapply(p.n0, INDEX=j_new, FUN=sum)
obs.P_new <- as.numeric(obs.P_new)
obs.P_new[!seq_len(ncol(m)) %in% unique(j)] <- 0
summary(obs.P_new)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>    0.0000    0.0000    0.0000    5.7792    0.7371 2622.1900

testthat::expect_equal(obs.P, obs.P_new)
testthat::expect_equivalent(obs.P, obs.P_new)

Created on 2021-08-11 by the reprex package (v2.0.1)

Next, I'll try on a fresh clone of DropletUtils with the above fix and see if it works.

lcolladotor added a commit to lcolladotor/DropletUtils that referenced this issue Aug 11, 2021
@lcolladotor
Copy link
Author

Bah, I still can't get the BioC 3.11 results even with these changes :/

> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991

It didn't work with lcolladotor@790d099

> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2021-08-11Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version    date       lib source
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 beachmat               2.9.0      2021-05-19 [1] Bioconductor
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.1.0)
 Biobase              * 2.53.0     2021-05-19 [1] Bioconductor
 BiocFileCache        * 2.1.1      2021-06-23 [1] Bioconductor
 BiocGenerics         * 0.39.1     2021-06-08 [1] Bioconductor
 BiocNeighbors          1.11.0     2021-05-19 [1] Bioconductor
 BiocParallel           1.27.2     2021-07-12 [1] Bioconductor
 BiocSingular           1.9.1      2021-06-08 [1] Bioconductor
 bit                    4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.2      2021-07-23 [1] CRAN (R 4.1.0)
 cachem                 1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr                  3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli                    3.0.1      2021-07-17 [1] CRAN (R 4.1.0)
 colorout               1.2-2      2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
 colorspace             2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
 crayon                 1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 DBI                    1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr               * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.19.1     2021-06-25 [1] Bioconductor
 DelayedMatrixStats     1.15.2     2021-08-05 [1] Bioconductor
 desc                   1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 devtools             * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
 dqrng                  0.3.0      2021-05-01 [1] CRAN (R 4.1.0)
 DropletUtils         * 1.13.3     2021-08-11 [1] Github (lcolladotor/DropletUtils@790d099)
 edgeR                  3.35.0     2021-05-19 [1] Bioconductor
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 fansi                  0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2      2018-10-05 [1] CRAN (R 4.1.0)
 fs                     1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.29.3     2021-07-01 [1] Bioconductor
 GenomeInfoDbData       1.2.6      2021-05-24 [1] Bioconductor
 GenomicRanges        * 1.45.0     2021-05-19 [1] Bioconductor
 ggbeeswarm             0.6.0      2017-08-07 [1] CRAN (R 4.1.0)
 ggplot2              * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
 ggrepel                0.9.1      2021-01-15 [1] CRAN (R 4.1.0)
 glue                   1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 gridExtra              2.3        2017-09-09 [1] CRAN (R 4.1.0)
 gtable                 0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 HDF5Array              1.21.0     2021-05-19 [1] Bioconductor
 httr                   1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
 IRanges              * 2.27.0     2021-05-19 [1] Bioconductor
 irlba                  2.3.3      2019-02-05 [1] CRAN (R 4.1.0)
 lattice                0.20-44    2021-05-02 [1] CRAN (R 4.1.0)
 lifecycle              1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 limma                  3.49.4     2021-08-08 [1] Bioconductor
 locfit                 1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)
 magrittr               2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 Matrix               * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.5.3      2021-08-05 [1] Bioconductor
 matrixStats          * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)
 memoise                2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar                 1.6.2      2021-07-29 [1] CRAN (R 4.1.0)
 pkgbuild               1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits            1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 prompt                 1.0.1      2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
 ps                     1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R.methodsS3            1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
 R.oo                   1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
 R.utils                2.10.1     2020-08-26 [1] CRAN (R 4.1.0)
 R6                     2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)
 remotes                2.4.0      2021-06-02 [1] CRAN (R 4.1.0)
 rhdf5                  2.37.0     2021-05-19 [1] Bioconductor
 rhdf5filters           1.5.0      2021-05-19 [1] Bioconductor
 Rhdf5lib               1.15.2     2021-07-01 [1] Bioconductor
 rlang                  0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 RSQLite                2.2.7      2021-04-22 [1] CRAN (R 4.1.0)
 rsthemes               0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
 rstudioapi             0.13       2020-11-12 [1] CRAN (R 4.1.0)
 rsvd                   1.0.5      2021-04-16 [1] CRAN (R 4.1.0)
 S4Vectors            * 0.31.0     2021-05-19 [1] Bioconductor
 ScaledMatrix           1.1.0      2021-05-19 [1] Bioconductor
 scales                 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 scater               * 1.21.3     2021-08-01 [1] Bioconductor
 scuttle              * 1.3.1      2021-08-05 [1] Bioconductor
 sessioninfo            1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 SingleCellExperiment * 1.15.1     2021-05-21 [1] Bioconductor
 sparseMatrixStats      1.5.2      2021-08-05 [1] Bioconductor
 SummarizedExperiment * 1.23.1     2021-06-24 [1] Bioconductor
 testthat             * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)
 tibble                 3.1.3      2021-07-23 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis              * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 vipor                  0.4.5      2017-03-22 [1] CRAN (R 4.1.0)
 viridis                0.6.1      2021-05-11 [1] CRAN (R 4.1.0)
 viridisLite            0.4.0      2021-04-13 [1] CRAN (R 4.1.0)
 withr                  2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
 XVector                0.33.0     2021-05-19 [1] Bioconductor
 zlibbioc               1.39.0     2021-05-19 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library

nor lcolladotor@2b0bb4b, both with BioC 3.14 (bioc-devel).

> library(BiocFileCache)
Loading required package: dbplyr
fix_emptyDrops_bioc3.11_vs_3.12 > bfc <- BiocFileCache()
fix_emptyDrops_bioc3.11_vs_3.12 > raw.path <- bfcrpath(bfc, file.path(
+     "http://cf.10xgenomics.com/samples",
+     "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
+ ))
fix_emptyDrops_bioc3.11_vs_3.12 > untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(DropletUtils)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package:MatrixGenericsThe following objects are masked frompackage:matrixStats:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package:BiocGenericsThe following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package:S4VectorsThe following objects are masked frompackage:base:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package:BiobaseThe following object is masked frompackage:MatrixGenerics:

    rowMedians

The following objects are masked frompackage:matrixStats:

    anyMissing, rowMedians

fix_emptyDrops_bioc3.11_vs_3.12 > library(Matrix)

Attaching package:MatrixThe following object is masked frompackage:S4Vectors:

    expand

fix_emptyDrops_bioc3.11_vs_3.12 > fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
fix_emptyDrops_bioc3.11_vs_3.12 > sce.pbmc <- read10xCounts(fname, col.names = TRUE)
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > library(scater)
Loading required package: scuttle
Loading required package: ggplot2
rownames(sce.pbmc) <- uniquifyFeatureNames(
fix_emptyDrops_bioc3.11_vs_3.12 > rownames(sce.pbmc) <- uniquifyFeatureNames(
+     rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
+ )
fix_emptyDrops_bioc3.11_vs_3.12 >
fix_emptyDrops_bioc3.11_vs_3.12 > # Detección de _droplets_ con células
fix_emptyDrops_bioc3.11_vs_3.12 > set.seed(100)
fix_emptyDrops_bioc3.11_vs_3.12 > #e.out <- readRDS("e.out_BioC3.11.rds")
fix_emptyDrops_bioc3.11_vs_3.12 > e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
fix_emptyDrops_bioc3.11_vs_3.12 > table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA>
   989   4300 731991
fix_emptyDrops_bioc3.11_vs_3.12 > options(width = 120)
fix_emptyDrops_bioc3.11_vs_3.12 > sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2021-08-11Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version    date       lib source
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
 beachmat               2.9.0      2021-05-19 [1] Bioconductor
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.1.0)
 Biobase              * 2.53.0     2021-05-19 [1] Bioconductor
 BiocFileCache        * 2.1.1      2021-06-23 [1] Bioconductor
 BiocGenerics         * 0.39.1     2021-06-08 [1] Bioconductor
 BiocNeighbors          1.11.0     2021-05-19 [1] Bioconductor
 BiocParallel           1.27.2     2021-07-12 [1] Bioconductor
 BiocSingular           1.9.1      2021-06-08 [1] Bioconductor
 bit                    4.0.4      2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.2      2021-07-23 [1] CRAN (R 4.1.0)
 cachem                 1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
 callr                  3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
 cli                    3.0.1      2021-07-17 [1] CRAN (R 4.1.0)
 colorout               1.2-2      2021-05-24 [1] Github (jalvesaq/colorout@79931fd)
 colorspace             2.0-2      2021-06-24 [1] CRAN (R 4.1.0)
 crayon                 1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
 DBI                    1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr               * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.19.1     2021-06-25 [1] Bioconductor
 DelayedMatrixStats     1.15.2     2021-08-05 [1] Bioconductor
 desc                   1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
 devtools             * 2.4.2      2021-06-07 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7      2021-06-18 [1] CRAN (R 4.1.0)
 dqrng                  0.3.0      2021-05-01 [1] CRAN (R 4.1.0)
 DropletUtils         * 1.13.3     2021-08-11 [1] Github (lcolladotor/DropletUtils@2b0bb4b)
 edgeR                  3.35.0     2021-05-19 [1] Bioconductor
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
 fansi                  0.5.0      2021-05-25 [1] CRAN (R 4.1.0)
 fastmap                1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2      2018-10-05 [1] CRAN (R 4.1.0)
 fs                     1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.29.3     2021-07-01 [1] Bioconductor
 GenomeInfoDbData       1.2.6      2021-05-24 [1] Bioconductor
 GenomicRanges        * 1.45.0     2021-05-19 [1] Bioconductor
 ggbeeswarm             0.6.0      2017-08-07 [1] CRAN (R 4.1.0)
 ggplot2              * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)
 ggrepel                0.9.1      2021-01-15 [1] CRAN (R 4.1.0)
 glue                   1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
 gridExtra              2.3        2017-09-09 [1] CRAN (R 4.1.0)
 gtable                 0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
 HDF5Array              1.21.0     2021-05-19 [1] Bioconductor
 httr                   1.4.2      2020-07-20 [1] CRAN (R 4.1.0)
 IRanges              * 2.27.0     2021-05-19 [1] Bioconductor
 irlba                  2.3.3      2019-02-05 [1] CRAN (R 4.1.0)
 lattice                0.20-44    2021-05-02 [1] CRAN (R 4.1.0)
 lifecycle              1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
 limma                  3.49.4     2021-08-08 [1] Bioconductor
 locfit                 1.5-9.4    2020-03-25 [1] CRAN (R 4.1.0)
 magrittr               2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
 Matrix               * 1.3-4      2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.5.3      2021-08-05 [1] Bioconductor
 matrixStats          * 0.60.0     2021-07-26 [1] CRAN (R 4.1.0)
 memoise                2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
 pillar                 1.6.2      2021-07-29 [1] CRAN (R 4.1.0)
 pkgbuild               1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
 prettyunits            1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
 prompt                 1.0.1      2021-08-03 [1] Github (gaborcsardi/prompt@fc2ac94)
 ps                     1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
 R.methodsS3            1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
 R.oo                   1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
 R.utils                2.10.1     2020-08-26 [1] CRAN (R 4.1.0)
 R6                     2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
 rappdirs               0.3.3      2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.7      2021-07-07 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3   2021-03-16 [1] CRAN (R 4.1.0)
 remotes                2.4.0      2021-06-02 [1] CRAN (R 4.1.0)
 rhdf5                  2.37.0     2021-05-19 [1] Bioconductor
 rhdf5filters           1.5.0      2021-05-19 [1] Bioconductor
 Rhdf5lib               1.15.2     2021-07-01 [1] Bioconductor
 rlang                  0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
 RSQLite                2.2.7      2021-04-22 [1] CRAN (R 4.1.0)
 rsthemes               0.2.1.9000 2021-05-24 [1] Github (gadenbuie/rsthemes@19299e5)
 rstudioapi             0.13       2020-11-12 [1] CRAN (R 4.1.0)
 rsvd                   1.0.5      2021-04-16 [1] CRAN (R 4.1.0)
 S4Vectors            * 0.31.0     2021-05-19 [1] Bioconductor
 ScaledMatrix           1.1.0      2021-05-19 [1] Bioconductor
 scales                 1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
 scater               * 1.21.3     2021-08-01 [1] Bioconductor
 scuttle              * 1.3.1      2021-08-05 [1] Bioconductor
 sessioninfo            1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
 SingleCellExperiment * 1.15.1     2021-05-21 [1] Bioconductor
 sparseMatrixStats      1.5.2      2021-08-05 [1] Bioconductor
 SummarizedExperiment * 1.23.1     2021-06-24 [1] Bioconductor
 testthat             * 3.0.4      2021-07-01 [1] CRAN (R 4.1.0)
 tibble                 3.1.3      2021-07-23 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
 usethis              * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
 vipor                  0.4.5      2017-03-22 [1] CRAN (R 4.1.0)
 viridis                0.6.1      2021-05-11 [1] CRAN (R 4.1.0)
 viridisLite            0.4.0      2021-04-13 [1] CRAN (R 4.1.0)
 withr                  2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
 XVector                0.33.0     2021-05-19 [1] Bioconductor
 zlibbioc               1.39.0     2021-05-19 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.1devel/Resources/library

@LTLA
Copy link
Collaborator

LTLA commented Aug 12, 2021

I would suggest looking at 584320d. Reverting that change and running:

> set.seed(100)
> e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
> table(e.out$FDR < 0.001, useNA = "ifany")

 FALSE   TRUE   <NA> 
  1056   4233 731991 

This is mentioned briefly in the NEWS.

@lcolladotor
Copy link
Author

Oh interesting, thanks Aaron! I'll do that tomorrow morning. Thanks!

lcolladotor added a commit to lcolladotor/DropletUtils that referenced this issue Aug 17, 2021
@lcolladotor
Copy link
Author

Hi Aaron,

Reverting the bugfix reproduces prior results

Using lcolladotor@eb8eb45 which reverts 584320d on BioC 3.14 I do indeed get the same results I get with BioC 3.13 using the emptyDrops() output from BioC 3.11. Thanks!

e.out_BioC3.14_lcolladotor_eb8eb45.rds.zip

# Usemos datos de pbmc4k
library(BiocFileCache)
bfc <- BiocFileCache()
raw.path <- bfcrpath(bfc, file.path(
    "http://cf.10xgenomics.com/samples",
    "cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"
))
untar(raw.path, exdir = file.path(tempdir(), "pbmc4k"))

library(DropletUtils)
library(Matrix)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")
sce.pbmc <- read10xCounts(fname, col.names = TRUE)

library(scater)
rownames(sce.pbmc) <- uniquifyFeatureNames(
    rowData(sce.pbmc)$ID, rowData(sce.pbmc)$Symbol
)

# Detección de _droplets_ con células
set.seed(100)
e.out <- emptyDrops(counts(sce.pbmc), barcode.args=list(exclude.from=0))
#e.out <- readRDS("~/Desktop/e.out_BioC3.11.rds")
table(e.out$FDR < 0.001, useNA = "ifany")
# FALSE   TRUE   <NA>
#  1056   4233 731991

saveRDS(e.out, file = "~/Desktop/e.out_BioC3.14_lcolladotor_eb8eb45.rds")


library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86,
    keys = rowData(sce.pbmc)$ID,
    column = "SEQNAME", keytype = "GENEID"
)

sce.pbmc <- sce.pbmc[, which(e.out$FDR <= 0.001)]

# Control de calidad
stats <- perCellQCMetrics(sce.pbmc,
    subsets = list(Mito = which(location == "MT"))
)
high.mito <- isOutlier(stats$subsets_Mito_percent,
    type = "higher"
)
sce.pbmc <- sce.pbmc[, !high.mito]

# Normalización de los datos
library(scran)
set.seed(1000)
clusters <- quickCluster(sce.pbmc)
sce.pbmc <- computeSumFactors(sce.pbmc, cluster = clusters)
sce.pbmc <- logNormCounts(sce.pbmc)

## Identificación de genes altamente variables
set.seed(1001)
dec.pbmc <- modelGeneVarByPoisson(sce.pbmc)
top.pbmc <- getTopHVGs(dec.pbmc, prop = 0.1)

## Reducción de dimensiones
set.seed(10000)
sce.pbmc <- denoisePCA(sce.pbmc,
    subset.row = top.pbmc,
    technical = dec.pbmc
)
set.seed(100000)
sce.pbmc <- runTSNE(sce.pbmc, dimred = "PCA")
set.seed(1000000)
sce.pbmc <- runUMAP(sce.pbmc, dimred = "PCA")

plotTSNE(sce.pbmc)



# clustering
g <- buildSNNGraph(sce.pbmc, k = 10, use.dimred = "PCA")
clust <- igraph::cluster_walktrap(g)$membership
sce.pbmc$cluster <- factor(clust)

library(celldex)
ref <- celldex::BlueprintEncodeData()

library(SingleR)
pred <- SingleR(
    test = sce.pbmc, ref = ref,
    labels = ref$label.main
)

sce.pbmc$labels <- pred$labels
plotTSNE(sce.pbmc, colour_by = "labels", text_by = "labels")


plotTSNE(sce.pbmc, colour_by = "cluster", text_by = "labels")

Screen Shot 2021-08-17 at 12 27 41 PM

I did notice a few small differences vs Pete's slides, like those few red cells to the right of the monocytes on his version and the shape of the B-cells cluster, but well, that's likely to other changes.

Wrapping up

So indeed, 584320d was the main source of the discrepancy. Since it's a bug fix, like you noted above and at 24e8073, we can close this issue on the reproducibility of BioC 3.11 results.

obs.P calculation

However, regarding the obs.P calculation, while it didn't affect this issue, due to #67 (comment) and #67 (comment), should I send a (clean) PR for either lcolladotor@790d099 or lcolladotor@2b0bb4b (whichever version you prefer). Or is the current obs.P behavior what you expect?

Best,
Leo

@LTLA
Copy link
Collaborator

LTLA commented Aug 23, 2021

What's wrong with the obs.P?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants