.create() not recognizing sparse matrices #145

bschilder opened this issue Dec 16, 2020 · 5 comments

I'm trying to create a new loom file after making edits to an existing one (the adult mouse brain dataset from,L5_All.loom ).

However when I got to use .create() for the new loom file, it doesn't seem to recognize the input matrix, no matter what format I put the matrix in.


# looms is a dictionary of loom connections
x = "Zeisel2018"
ds = looms[x]

loom_path = os.path.join("raw_data/scRNAseq",x,x+"_formatted.loom")

loompy.create(filename = loom_path,
                layers = sparse.coo_matrix(ds[:,:]), # <- Input matrix
                row_attrs =ds.ra,
                col_attrs =

Some other formats I've tried (and confirmed that the matrix is indeed of the intended type).

from scipy import sparse

layers = ds.layers[''].sparse()
layers = ds[:,:].sparse()
layers = sparse.coo_matrix(ds[:,:])
layers = sparse.csr_matrix(ds[:,:])
layers = sparse.csc_matrix(ds[:,:])

Error message

Same message appears for all matrix formats listed above.

ValueError                                Traceback (most recent call last)
<ipython-input-31-a03cf1a1eb1e> in <module>
     19                   layers = sparse.coo_matrix(looms[x][:,:]),
     20                   row_attrs = looms[x].ra,
---> 21                   col_attrs = looms[x].ca)
     23 print("Complete.")

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in create(filename, layers, row_attrs, col_attrs, file_attrs)
   1071                 if os.path.exists(filename):
   1072                         os.remove(filename)
-> 1073                 raise ve

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in create(filename, layers, row_attrs, col_attrs, file_attrs)
   1064                         for key, vals in row_attrs.items():
-> 1065                                 ds.ra[key] = vals
   1067                         for key, vals in col_attrs.items():

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in __setitem__(self, name, val)
    127                 Set the value of a named attribute
    128 		"""
--> 129                 return self.__setattr__(name, val)
    131         def __setattr__(self, name: str, val: np.ndarray) -> None:

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in __setattr__(self, name, val)
    147                 else:
    148                         if self.ds is not None:
--> 149                                 values = loompy.normalize_attr_values(val, compare_loom_spec_version(self.ds._file, "3.0.0") >= 0)
    150                                 a = ["/row_attrs/", "/col_attrs/"][self.axis]
    151                                 if self.ds.shape[self.axis] != 0 and values.shape[0] != self.ds.shape[self.axis]:

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in normalize_attr_values(a, use_object_strings)
     67                 a = np.array([a])
     68                 scalar = True
---> 69         arr = normalize_attr_array(a)
     70         if np.issubdtype(arr.dtype, np.integer) or np.issubdtype(arr.dtype, np.floating):
     71                 pass  # We allow all these types

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/ in normalize_attr_array(a)
     45                 return normalize_attr_array(a.todense())
     46         else:
---> 47                 raise ValueError("Argument must be a list, tuple, numpy matrix, numpy ndarray or sparse matrix.")

ValueError: Argument must be a list, tuple, numpy matrix, numpy ndarray or sparse matrix.

Conda env

Brian Schilder
@ Imperial College London

I'm not sure if sparse matrices are supported as arguments to create(), but if you're anyway loading the dense matrix (using ds[:,:]) then why not just pass that as the argument?

Hi Sten,

Thanks for the quick reply!

I was hoping to convert the entire matrix to a sparse matrix to substantially reduce the file size, both in storage and when it's loaded into memory (without having to convert it each time). Though I guess this isn't possible with loom format at the moment?

Though ultimately, I'm converting the loom files to ExprDataSet objects so that they can be used in the Cell_BLAST pipeline (for deep autoencoder batch correction).

Also of interest to @NathanSkene

FYI, internally it would be stored as chunked and compressed anyway, which may or may not take more space than a sparse format.

I see, got it, thanks!

bschilder commented Dec 18, 2020

So I happened to be converting the loom files I'm using to several different formats (to take advantage of different tools) and noticed that they differ quite a bit in terms of size. I'm sure there's plenty of other differences besides sparsity, as well as advantage/disadvantages to each, but I thought this was interesting and worth posting here!

Data formats


  • All metadata was kept, but I flattened it in the case of LaManno2020 so it could be converted to ExprDataSet (which requires an unnested 2D array, or pandas dataframe).
  • I did remove all but the main layer (raw counts) from LaManno2020, so that also helped reduced its size.
  • Some additional formats for Aerts2018 include the raw uncompressed data matrix from GEO, as well as Seurat object for R.
  • I also merged all three _sparse.h5 files using Cell_BLAST's .merge_datasets() function.
    The merged _sparse.h5 file is 2.5G.

File sizes

-bash-4.2$ tree -h raw_data/scRNAseq/
├── [4.0K]  Aerts2018
│   ├── [142M]  Aerts2018.h5
│   ├── [553M]  Aerts2018.h5ad
│   ├── [129M]  Aerts2018_sparse.h5
│   ├── [677M]  Aerts_Fly_AdultBrain_Filtered_57.loom
│   ├── [4.0K]  GSE107451_DGRP-551_w1118_WholeBrain_57k_0d_1d_3d_6d_9d_15d_30d_50d_10X_DGEM_MEX.mtx
│   │   ├── [3.0M]  annotation.tsv
│   │   ├── [1.7M]  barcodes.tsv
│   │   ├── [334K]  genes.tsv
│   │   └── [898M]  matrix.mtx
│   ├── [4.4M]  GSE107451_DGRP-551_w1118_WholeBrain_57k_Metadata.tsv.gz
│   └── [416M]  SeuratObj.Aerts_Fly_AdultBrain_Filtered_57k.rds
├── [4.0K]  LaManno2020
│   ├── [ 28G]  dev_all.loom
│   ├── [1.6G]  LaManno2020.h5
│   ├── [6.2G]  LaManno2020.h5ad
│   └── [1.8G]  LaManno2020_sparse.h5
└── [4.0K]  Zeisel2018
    ├── [ 18G]  l5_all.loom
    ├── [445M]  Zeisel2018.h5
    ├── [2.0G]  Zeisel2018.h5ad
    └── [509M]  Zeisel2018_sparse.h5


  • LaManno2020 and Zeisel2018 (mouse) looms are from
  • Aerts2018 (fly) loom is from SCope.


