-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read10xCounts(..., type = "HDF5") doesn't produce TENxMatrix? #15
Comments
The two ops are a Both could be skipped, I guess. |
Do you see this as an issue for DropletUtils, scater, or DelayedArray? |
It is a DelayedArray and TENxMatrix issue for me. There are multiple levels, though. Starting at the base; it seems like it would be sensible for A <- DelayedArray(matrix(1,10,10))
showtree(cbind(A))
## 10x10 double: DelayedMatrix object
## └─ 10x10 double: Abind (along=2) <- don't want this guy.
## └─ 10x10 double: [seed] matrix object The error observed in scater arises from beachmat, which attempts to call contentIsPristine(cbind(A)) # TRUE. Well, fair enough.
contentIsPristine(cbind(A, A)) # Still TRUE, but it's got a content-modifying delayed op! The first The second orthogonal issue is that The fact that |
Better rope in @hpages. |
Hi guys, I agree that unary
These are not pristine DelayedArray objects:
Note that
If you want a more stringent test that also checks that the array values in DelayedArray object Anyway, @LTLA I have a feeling that you have a use case for this test. I could rename the current |
Also I'm open to suggestions for a better name for Or should we have a function that does the opposite e.g. Thanks! |
Having thought about it, I'm happy for On the dimnames setting; this is not a problem for beachmat, which is smart enough to see through the delayed op and avoid block processing. I'm more worried about R-level code - imagine a developer is careful enough to write a DA backend that specializes operations like matrix multiplication for efficiency, but fails to write a specialized It is for this reason that all BiocSingular matrix classes overwrite |
Turns out I was already doing the check with |
If we do something like this:
then there is no need for DeferredMatrix or any DA backend to overwrite The seed represents a read-only dataset (everything in the dataset is read-only, including the dimnames) so modifying the dimnames at the seed level is a violation of that. The consequence typically being that the seed is no longer in sync with the dataset that it represents so things like I'll modify |
Is that a policy for all DA backends, or just for file-backed backends? In some of the BiocSingular backends, I have specialized some operations on the To give a concrete example, I do a lot of transpositions in BiocSingular. To avoid everything getting wrapped in |
Yeah the "never touch the seed" principle in DelayedArray is a central one. No matter what operations are performed on You might be fine for now with BiocSingular, hard to know, but it's a risky game. I will emphasize the "never touch the seed" principle in |
Hmmmm. Is there any way to divert For example, transposition could just dispatch to |
I hope so. In the meantime, I've added the "Implementing optimized backend-specific methods" subsection to |
Hi Aaron,
I used
read10xCounts(..., type = "HDF5")
to load a HDF5 file from CellRanger v3.I was surprised to get an error when I then ran
scater::calculateQCMetrics()
on the result.Ultimately, I think the issue is that
read10xCounts(..., type = "HDF5")
doesn't return the counts matrix as a 'pristine' TENxMatrix.I encountered this using
v1.4.1
of DropletUtils and can't test on devel right now, but I didn't see any recent changes around this behaviour.This example hopefully illustrates the issue.
What do you think?
Thanks!
Created on 2019-06-15 by the reprex package (v0.3.0)
Session info
The text was updated successfully, but these errors were encountered: