Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emptyDropsCellRanger index out of bounds style error #109

Open
jamesnemesh opened this issue Jun 3, 2024 · 1 comment
Open

emptyDropsCellRanger index out of bounds style error #109

jamesnemesh opened this issue Jun 3, 2024 · 1 comment

Comments

@jamesnemesh
Copy link

jamesnemesh commented Jun 3, 2024

Hi!

I'm getting a crash when running emptyDropsCellRanger.

The call:

DropletUtils::emptyDropsCellRanger(dge)

Where dge is a matrix of cells by genes. I'll note this data set is compatible with your standard emptyDrops(dge) function.

The stack trace:
image

The problem with the data is that the number of cell barcodes in the data set is too low, and the program runs into issues during the call of ambient.FUN.

In particular, the default bounds for the start and end of the ambient cell barcodes:
ind.min=45000
ind.max=90000

When this line is run, the function gets into trouble:

ambient[o[min(ind.min, length(totals)):min(ind.max, length(totals))]] <- TRUE

length(totals)=24101

So ambient only has a single positive entry, as the min and max resolve to the same index.

table (ambient)
ambient
FALSE  TRUE 
24100     1 

This later causes the calculated ambient profile to be pretty strange, and it contains NaN values, which are rarely a good thing:

ambient.prob[13080:13090,]
       CDC34        CDC37      CDC37L1   CDC37L1-DT        CDC40        CDC42    CDC42-AS1    CDC42-IT1 
3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 3.130576e-05 
    CDC42BPA     CDC42BPB     CDC42BPG 
         NaN 3.130576e-05 3.130576e-05 

Eventually, the code tries

any(still.zero)

And since still.zero contains NAs (inherited from ambient.prob), the result is NA, which produces the reported error at the top of the stack.

I don't expect the program to magically find an answer to this, but it might be nice to check the bounds and throw an exception that the selected indexes are too large for the provided data set.

I'm going to add my own data sanity checks before I run emptyDropsCellRanger as well.

@jonathangriffiths
Copy link
Collaborator

Thanks for reporting this edge-case. A warning might be the most appropriate answer. I'll see if I can find the time to put it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants