Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.create() not recognizing sparse matrices #145

Open
bschilder opened this issue Dec 16, 2020 · 5 comments
Open

.create() not recognizing sparse matrices #145

bschilder opened this issue Dec 16, 2020 · 5 comments

Comments

@bschilder
Copy link

Hello,

I'm trying to create a new loom file after making edits to an existing one (the adult mouse brain dataset from Mousebrain.org,L5_All.loom ).

However when I got to use .create() for the new loom file, it doesn't seem to recognize the input matrix, no matter what format I put the matrix in.

Input

# looms is a dictionary of loom connections
x = "Zeisel2018"
ds = looms[x]

loom_path = os.path.join("raw_data/scRNAseq",x,x+"_formatted.loom")

loompy.create(filename = loom_path,
                layers = sparse.coo_matrix(ds[:,:]), # <- Input matrix
                row_attrs =ds.ra,
                col_attrs = ds.ca)

Some other formats I've tried (and confirmed that the matrix is indeed of the intended type).

from scipy import sparse

layers = ds.layers[''].sparse()
layers = ds[:,:].sparse()
layers = sparse.coo_matrix(ds[:,:])
layers = sparse.csr_matrix(ds[:,:])
layers = sparse.csc_matrix(ds[:,:])

Error message

Same message appears for all matrix formats listed above.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-a03cf1a1eb1e> in <module>
     19                   layers = sparse.coo_matrix(looms[x][:,:]),
     20                   row_attrs = looms[x].ra,
---> 21                   col_attrs = looms[x].ca)
     22 
     23 print("Complete.")

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/loompy.py in create(filename, layers, row_attrs, col_attrs, file_attrs)
   1071                 if os.path.exists(filename):
   1072                         os.remove(filename)
-> 1073                 raise ve
   1074 
   1075 

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/loompy.py in create(filename, layers, row_attrs, col_attrs, file_attrs)
   1063 
   1064                         for key, vals in row_attrs.items():
-> 1065                                 ds.ra[key] = vals
   1066 
   1067                         for key, vals in col_attrs.items():

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/attribute_manager.py in __setitem__(self, name, val)
    127                 Set the value of a named attribute
    128 		"""
--> 129                 return self.__setattr__(name, val)
    130 
    131         def __setattr__(self, name: str, val: np.ndarray) -> None:

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/attribute_manager.py in __setattr__(self, name, val)
    147                 else:
    148                         if self.ds is not None:
--> 149                                 values = loompy.normalize_attr_values(val, compare_loom_spec_version(self.ds._file, "3.0.0") >= 0)
    150                                 a = ["/row_attrs/", "/col_attrs/"][self.axis]
    151                                 if self.ds.shape[self.axis] != 0 and values.shape[0] != self.ds.shape[self.axis]:

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/normalize.py in normalize_attr_values(a, use_object_strings)
     67                 a = np.array([a])
     68                 scalar = True
---> 69         arr = normalize_attr_array(a)
     70         if np.issubdtype(arr.dtype, np.integer) or np.issubdtype(arr.dtype, np.floating):
     71                 pass  # We allow all these types

~/anaconda3/envs/Cell_BLAST/lib/python3.6/site-packages/loompy/normalize.py in normalize_attr_array(a)
     45                 return normalize_attr_array(a.todense())
     46         else:
---> 47                 raise ValueError("Argument must be a list, tuple, numpy matrix, numpy ndarray or sparse matrix.")
     48 
     49 

ValueError: Argument must be a list, tuple, numpy matrix, numpy ndarray or sparse matrix.

Conda env

Click to expand
# packages in environment at /rds/general/user/bms20/home/anaconda3/envs/Cell_BLAST:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_r-mutex                  1.0.0               anacondar_1  
absl-py                   0.11.0             pyhd3eb1b0_1  
alabaster                 0.7.12                   py36_0  
anndata                   0.7.5            py36h5fab9bb_0    conda-forge
argon2-cffi               20.1.0           py36h7b6447c_1  
astor                     0.8.1                    py36_0  
async_generator           1.10             py36h28b3542_0  
attrs                     20.3.0             pyhd3eb1b0_0  
babel                     2.9.0              pyhd3eb1b0_0  
backcall                  0.2.0                      py_0  
binutils_impl_linux-64    2.31.1               h6176602_1  
binutils_linux-64         2.31.1               h6176602_9  
blas                      1.0                         mkl  
bleach                    1.5.0                    py36_0    conda-forge
blosc                     1.20.1               hd408876_0  
brotli                    1.0.9                    pypi_0    pypi
brotlipy                  0.7.0           py36h27cfd23_1003  
bwidget                   1.9.11                        1  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.17.1               h27cfd23_0  
ca-certificates           2020.12.8            h06a4308_0  
cairo                     1.14.12              h8948797_3  
cell-blast                0.3.8                    pypi_0    pypi
certifi                   2020.12.5        py36h06a4308_0  
cffi                      1.14.4           py36h261ae71_0  
chardet                   3.0.4           py36h06a4308_1003  
click                     7.1.2                    pypi_0    pypi
colorama                  0.4.4              pyhd3eb1b0_0  
cryptography              3.3.1            py36h3c74f83_0  
curl                      7.71.1               he644dc0_8    conda-forge
cycler                    0.10.0                   pypi_0    pypi
dbus                      1.13.18              hb2f20db_0  
decorator                 4.4.2                      py_0  
defusedxml                0.6.0                      py_0  
docutils                  0.16                     py36_1  
entrypoints               0.3                      py36_0  
expat                     2.2.10               he6710b0_2  
fastobo                   0.9.3                    pypi_0    pypi
flask                     1.1.2                    pypi_0    pypi
flask-compress            1.8.0                    pypi_0    pypi
fontconfig                2.13.0               h9420a91_0  
freetype                  2.10.4               h5ab3b9f_0  
fribidi                   1.0.10               h7b6447c_0  
gast                      0.4.0                      py_0  
gcc_impl_linux-64         7.3.0                habb00fd_1  
gcc_linux-64              7.3.0                h553295d_9  
get_version               2.1                        py_1    conda-forge
gevent                    20.9.0                   pypi_0    pypi
gfortran_impl_linux-64    7.3.0                hdf63c60_1  
gfortran_linux-64         7.3.0                h553295d_9  
glib                      2.66.1               h92f7085_0  
graphite2                 1.3.14               h23475e2_0  
greenlet                  0.4.17                   pypi_0    pypi
grpcio                    1.31.0           py36hf8bcb03_0  
gsl                       2.4                  h14c3975_4  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb31296c_0  
gxx_impl_linux-64         7.3.0                hdf63c60_1  
gxx_linux-64              7.3.0                h553295d_9  
h5py                      2.10.0           py36h7918eee_0  
harfbuzz                  2.4.0                hca77d97_1  
hdf5                      1.10.4               hb1b8bf9_0  
html5lib                  0.9999999                py36_0    conda-forge
icu                       58.2                 he6710b0_3  
idna                      2.10                       py_0  
imagesize                 1.2.0                      py_0  
importlib-metadata        2.0.0                      py_1  
importlib_metadata        2.0.0                         1  
iniconfig                 1.1.1                    pypi_0    pypi
intel-openmp              2020.2                      254  
ipykernel                 5.3.4            py36h5ca1d4c_0  
ipython                   7.16.1           py36h5ca1d4c_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
ipywidgets                7.5.1                      py_1  
itsdangerous              1.1.0                    pypi_0    pypi
jedi                      0.17.0                   py36_0  
jinja2                    2.11.2                     py_0  
joblib                    1.0.0              pyhd3eb1b0_0  
jpeg                      9b                   h024ee3a_2  
jsonschema                3.2.0                      py_2  
jupyter                   1.0.0                    py36_7  
jupyter_client            6.1.7                      py_0  
jupyter_console           6.2.0                      py_0  
jupyter_core              4.7.0            py36h06a4308_0  
jupyterlab_pygments       0.1.2                      py_0  
keras                     2.1.5                    py36_0  
kiwisolver                1.3.1                    pypi_0    pypi
krb5                      1.17.1               h173b8e3_0  
lcms2                     2.11                 h396b838_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
legacy-api-wrap           1.2                        py_0    conda-forge
libcurl                   7.71.1               hcdd3856_8    conda-forge
libedit                   3.1.20191231         h14c3975_1  
libev                     4.33                 h7b6447c_0  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libllvm10                 10.0.1               hbcb73fb_5  
libnghttp2                1.41.0               hf8bcb03_2  
libpng                    1.6.37               hbc83047_0  
libprotobuf               3.13.0.1             hd408876_0  
libsodium                 1.0.18               h7b6447c_0  
libssh2                   1.9.0                h1ba5d50_1  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.14                 h7b6447c_0  
libxml2                   2.9.10               hb55368b_3  
llvmlite                  0.35.0                   pypi_0    pypi
loom-viewer               0.32.4                   pypi_0    pypi
loompy                    3.0.6                    pypi_0    pypi
lz4-c                     1.9.2                heb0550a_3  
lzo                       2.10                 h7b6447c_2  
make                      4.2.1                h1bed415_1  
markdown                  3.3.3            py36h06a4308_0  
markupsafe                1.1.1            py36h7b6447c_0  
matplotlib                3.3.3                    pypi_0    pypi
matplotlib-base           3.3.2            py36h817c723_0  
mistune                   0.8.4            py36h7b6447c_0  
mkl                       2020.2                      256  
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.2.0            py36h23d657b_0  
mkl_random                1.1.1            py36h0573a6f_0  
mock                      4.0.3              pyhd3eb1b0_0  
mypy-extensions           0.4.3                    pypi_0    pypi
natsort                   7.1.0              pyhd3eb1b0_0  
nb_conda_kernels          2.3.1            py36h06a4308_0  
nbclient                  0.5.1                      py_0  
nbconvert                 6.0.7                    py36_0  
nbformat                  5.0.8                      py_0  
ncurses                   6.2                  he6710b0_1  
nest-asyncio              1.4.3              pyhd3eb1b0_0  
networkx                  2.5                        py_0  
notebook                  6.1.5            py36h06a4308_0  
numba                     0.52.0                   pypi_0    pypi
numexpr                   2.7.1            py36h63df603_0  
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
numpy-groupies            0.9.13                   pypi_0    pypi
olefile                   0.46                     py36_0  
openssl                   1.1.1i               h27cfd23_0  
packaging                 20.8               pyhd3eb1b0_0  
pandas                    1.1.5            py36ha9443f7_0  
pandoc                    2.11                 hb0f4dca_0  
pandocfilters             1.4.3            py36h06a4308_1  
pango                     1.45.3               hd140c19_0  
parso                     0.8.1              pyhd3eb1b0_0  
patsy                     0.5.1                    pypi_0    pypi
pcre                      8.44                 he6710b0_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    8.0.1            py36he98fc37_0  
pip                       20.3.1           py36h06a4308_0  
pixman                    0.40.0               h7b6447c_0  
plotly                    4.14.1             pyhd3eb1b0_0  
pluggy                    0.13.1                   pypi_0    pypi
prometheus_client         0.9.0              pyhd3eb1b0_0  
prompt-toolkit            3.0.8                      py_0  
prompt_toolkit            3.0.8                         0  
pronto                    2.3.1                    pypi_0    pypi
protobuf                  3.13.0.1         py36he6710b0_1  
ptyprocess                0.6.0              pyhd3eb1b0_2  
py                        1.10.0                   pypi_0    pypi
pycparser                 2.20                       py_2  
pygments                  2.7.3              pyhd3eb1b0_0  
pyopenssl                 20.0.1             pyhd3eb1b0_1  
pyparsing                 2.4.7                      py_0  
pyqt                      5.9.2            py36h05f1152_2  
pyrsistent                0.17.3           py36h7b6447c_0  
pysocks                   1.7.1            py36h06a4308_0  
pytables                  3.6.1            py36h71ec239_0  
pytest                    6.2.1                    pypi_0    pypi
python                    3.6.12               hcff3b4d_2  
python-dateutil           2.8.1                      py_0  
python-igraph             0.8.3                    pypi_0    pypi
python_abi                3.6                     1_cp36m    conda-forge
pytz                      2020.4             pyhd3eb1b0_0  
pyyaml                    5.3.1            py36h7b6447c_1  
pyzmq                     20.0.0           py36h2531618_1  
qt                        5.9.7                h5867ecd_1  
qtconsole                 4.7.7                      py_0  
qtpy                      1.9.0                      py_0  
r                         3.6.0                     r36_0  
r-base                    3.6.1                haffb61f_2  
r-boot                    1.3_20            r36h6115d3f_0  
r-class                   7.3_15            r36h96ca727_0  
r-cluster                 2.0.8             r36ha65eedd_0  
r-codetools               0.2_16            r36h6115d3f_0  
r-foreign                 0.8_71            r36h96ca727_0  
r-kernsmooth              2.23_15           r36ha65eedd_4  
r-lattice                 0.20_38           r36h96ca727_0  
r-mass                    7.3_51.3          r36h96ca727_0  
r-matrix                  1.2_17            r36h96ca727_0  
r-mgcv                    1.8_28            r36h96ca727_0  
r-nlme                    3.1_139           r36ha65eedd_0  
r-nnet                    7.3_12            r36h96ca727_0  
r-recommended             3.6.0                     r36_0  
r-rpart                   4.1_15            r36h96ca727_0  
r-spatial                 7.3_11            r36h96ca727_4  
r-survival                2.44_1.1          r36h96ca727_0  
readline                  8.0                  h7b6447c_0  
requests                  2.25.0             pyhd3eb1b0_0  
retrying                  1.3.3                    py36_2  
rpy2                      3.3.6                    pypi_0    pypi
scanpy                    1.6.0                      py_0    bioconda
scikit-learn              0.23.2           py36h0573a6f_0  
scipy                     1.5.2            py36h0b6359f_0  
seaborn                   0.11.0                     py_0  
send2trash                1.5.0              pyhd3eb1b0_1  
setuptools                51.0.0           py36h06a4308_2  
setuptools-scm            5.0.1              pyhd3eb1b0_0  
setuptools_scm            5.0.1                hd3eb1b0_0  
sinfo                     0.3.1                      py_0    conda-forge
sip                       4.19.8           py36hf484d3e_0  
six                       1.15.0           py36h06a4308_0  
snowballstemmer           2.0.0                      py_0  
sphinx                    3.2.1                      py_0  
sphinxcontrib-applehelp   1.0.2                      py_0  
sphinxcontrib-devhelp     1.0.2                      py_0  
sphinxcontrib-htmlhelp    1.0.3                      py_0  
sphinxcontrib-jsmath      1.0.1                      py_0  
sphinxcontrib-qthelp      1.0.3                      py_0  
sphinxcontrib-serializinghtml 1.1.4                      py_0  
sqlite                    3.33.0               h62c20be_0  
statsmodels               0.12.1           py36h27cfd23_0  
stdlib-list               0.7.0                      py_2    conda-forge
tbb                       2020.3               hfd86e86_0  
tensorboard               1.8.0            py36hf484d3e_0  
tensorflow                1.8.0                         0  
tensorflow-base           1.8.0            py36hee38f2d_0  
termcolor                 1.1.0                    py36_1  
terminado                 0.9.1                    py36_0  
testpath                  0.4.4                      py_0  
texttable                 1.6.3                    pypi_0    pypi
threadpoolctl             2.1.0              pyh5ca1d4c_0  
tk                        8.6.10               hbc83047_0  
tktable                   2.10                 h14c3975_0  
toml                      0.10.2                   pypi_0    pypi
tornado                   6.1              py36h27cfd23_0  
tqdm                      4.54.1             pyhd3eb1b0_0  
traitlets                 4.3.3                    py36_0  
typing                    3.7.4.3                  pypi_0    pypi
tzlocal                   2.1                      pypi_0    pypi
umap-learn                0.4.6            py36h9f0ad1d_0    conda-forge
urllib3                   1.25.11                    py_0  
wcwidth                   0.2.5                      py_0  
webencodings              0.5.1                    py36_1  
werkzeug                  1.0.1                      py_0  
wheel                     0.36.2             pyhd3eb1b0_0  
widgetsnbextension        3.5.1                    py36_0  
xz                        5.2.5                h7b6447c_0  
yaml                      0.2.5                h7b6447c_0  
zeromq                    4.3.3                he6710b0_3  
zipp                      3.4.0              pyhd3eb1b0_0  
zlib                      1.2.11               h7b6447c_3  
zope-event                4.5.0                    pypi_0    pypi
zope-interface            5.2.0                    pypi_0    pypi
zstd                      1.4.5                h9ceee32_0  

Thanks,
Brian Schilder
@ Imperial College London

@slinnarsson
Copy link
Contributor

Hi

I'm not sure if sparse matrices are supported as arguments to create(), but if you're anyway loading the dense matrix (using ds[:,:]) then why not just pass that as the argument?

@bschilder
Copy link
Author

Hi Sten,

Thanks for the quick reply!

I was hoping to convert the entire matrix to a sparse matrix to substantially reduce the file size, both in storage and when it's loaded into memory (without having to convert it each time). Though I guess this isn't possible with loom format at the moment?

Though ultimately, I'm converting the loom files to ExprDataSet objects so that they can be used in the Cell_BLAST pipeline (for deep autoencoder batch correction).

Also of interest to @NathanSkene

@slinnarsson
Copy link
Contributor

FYI, internally it would be stored as chunked and compressed anyway, which may or may not take more space than a sparse format.

@bschilder
Copy link
Author

I see, got it, thanks!

@bschilder
Copy link
Author

bschilder commented Dec 18, 2020

So I happened to be converting the loom files I'm using to several different formats (to take advantage of different tools) and noticed that they differ quite a bit in terms of size. I'm sure there's plenty of other differences besides sparsity, as well as advantage/disadvantages to each, but I thought this was interesting and worth posting here!

Data formats

Notes

  • All metadata was kept, but I flattened it in the case of LaManno2020 so it could be converted to ExprDataSet (which requires an unnested 2D array, or pandas dataframe).
  • I did remove all but the main layer (raw counts) from LaManno2020, so that also helped reduced its size.
  • Some additional formats for Aerts2018 include the raw uncompressed data matrix from GEO, as well as Seurat object for R.
  • I also merged all three _sparse.h5 files using Cell_BLAST's .merge_datasets() function.
    The merged _sparse.h5 file is 2.5G.

File sizes

-bash-4.2$ tree -h raw_data/scRNAseq/
raw_data/scRNAseq/
├── [4.0K]  Aerts2018
│   ├── [142M]  Aerts2018.h5
│   ├── [553M]  Aerts2018.h5ad
│   ├── [129M]  Aerts2018_sparse.h5
│   ├── [677M]  Aerts_Fly_AdultBrain_Filtered_57.loom
│   ├── [4.0K]  GSE107451_DGRP-551_w1118_WholeBrain_57k_0d_1d_3d_6d_9d_15d_30d_50d_10X_DGEM_MEX.mtx
│   │   ├── [3.0M]  annotation.tsv
│   │   ├── [1.7M]  barcodes.tsv
│   │   ├── [334K]  genes.tsv
│   │   └── [898M]  matrix.mtx
│   ├── [4.4M]  GSE107451_DGRP-551_w1118_WholeBrain_57k_Metadata.tsv.gz
│   └── [416M]  SeuratObj.Aerts_Fly_AdultBrain_Filtered_57k.rds
├── [4.0K]  LaManno2020
│   ├── [ 28G]  dev_all.loom
│   ├── [1.6G]  LaManno2020.h5
│   ├── [6.2G]  LaManno2020.h5ad
│   └── [1.8G]  LaManno2020_sparse.h5
└── [4.0K]  Zeisel2018
    ├── [ 18G]  l5_all.loom
    ├── [445M]  Zeisel2018.h5
    ├── [2.0G]  Zeisel2018.h5ad
    └── [509M]  Zeisel2018_sparse.h5

Sources

  • LaManno2020 and Zeisel2018 (mouse) looms are from Mousbrain.org.
  • Aerts2018 (fly) loom is from SCope.

@NathanSkene

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants