Skip to content

Commit

Permalink
Merge pull request #111 from linnarsson-lab/loompy3.0
Browse files Browse the repository at this point in the history
Loompy3.0
  • Loading branch information
slinnarsson authored Sep 23, 2019
2 parents 05862f6 + bef521c commit 710c945
Show file tree
Hide file tree
Showing 23 changed files with 1,180 additions and 535 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ before_install:
- sudo ln -s /run/shm /dev/shm

install:
- conda install --yes python="3.6" numpy scipy cython h5py typing pandas
- conda install --yes python="3.6" numpy scipy cython h5py typing numba
- pip install sphinx_bootstrap_theme
# Install the dev version (1.7) of sphinx that has solved a problem with f-strings
- git clone https://github.com/sphinx-doc/sphinx.git
Expand Down
22 changes: 3 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,10 @@


# loompy 2
# loompy v3.0

⭐ Loompy v2.0 was released Dec. 24, 2017! ([what's new](https://github.com/linnarsson-lab/loompy/releases/tag/v2.0)?)
⭐ Loompy v3.0 was released Sep. 24, 2019!

`.loom` is an efficient file format for very large omics datasets,
consisting of a main matrix, optional additional layers, a variable number of row and column
annotations. Loom also supports sparse graphs. We use loom files to store single-cell gene expression
data: the main matrix contains the actual expression values (one
column per cell, one row per gene); row and column annotations
contain metadata for genes and cells, such as `Name`, `Chromosome`,
`Position` (for genes), and `Strain`, `Sex`, `Age` (for cells).

![Illustration of Loom format structure](/doc/Loom-images.png)

Loom files (`.loom`) are created in the [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) file format, which
supports an internal collection of numerical multidimensional datasets.
HDF5 is supported by many computer languages, including Java, MATLAB,
Mathematica, Python, R, and Julia. `.loom` files are accessible from
any language that supports HDF5.

To get started, head over to [the documentation](http://linnarssonlab.org/loompy/)!
To get started, head over to [the documentation](http://loompy.org)!

Loom, loompy, and the [loom-viewer](https://github.com/linnarsson-lab/loom-viewer) are being developed by members of the [Linnarsson Lab](http://linnarssonlab.org).

32 changes: 22 additions & 10 deletions doc/format/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Loom file format specs
Versions
--------

This specification defines the Loom file format version ``2.0.1``.
This specification defines the Loom file format version ``3.0.0``.


.. _formatinfo:
Expand Down Expand Up @@ -91,15 +91,27 @@ Main matrix and layers
Global attributes
^^^^^^^^^^^^^^^^^

- There can OPTIONALLY be at least one `HDF5
attribute <https://www.hdfgroup.org/HDF5/Tutor/crtatt.html>`__ on the
root ``/`` group, which can be any valid scalar or multidimensional datatype and should be
interpreted as attributes of the whole Loom file.
- There can OPTIONALLY be an `HDF5
attribute <https://www.hdfgroup.org/HDF5/Tutor/crtatt.html>`__ on the
root ``/`` group named ``LOOM_SPEC_VERSION``, a string value giving the
loom file spec version that was followed in creating the file. See top of this
document for the current version of the spec.
- There MUST be an HDF5 group ``/attrs`` containing global attributes.
- There MUST be a HDF5 dataset ``/attrs/LOOM_SPEC_VERSION`` with the value ``v3.0.0``.

Global attributes apply semantically to the whole file, not any specific part of it.
Such attributes are stored in the HDF5 group ``/attrs`` and can be any valid scalar
or multidimensional datatype.

As of Loom file format v3.0.0, only one global attribute is mandatory: the ``LOOM_SPEC_VERSION``
attribute, which is a string value giving the loom file spec version that was followed in creating
the file. See top of this document for the current version of the spec.

Note: previous versions of the loom file format stored global attributes as `HDF5 attributes <https://www.hdfgroup.org/HDF5/Tutor/crtatt.html>`__
on the root ``/`` group. However, such attributes are size-limited, which caused problems for some
applications. For backwards compatibility, readers compatible with Loom v3.0.0 and above MUST first look
for global attributes under the HDF5 group ``/attrs`` (if it exists). If a requested attribute does not exist
as a dataset under that group, the reader MUST then examine the HDF5 attributes on the root ``/`` group.

When writing a global attribute, the writer MUST write only to the ``/attrs`` group if ``LOOM_SPEC_VERSION`` is
``3.0.0`` or higher. The writer MUST write to both the ``/attrs`` group and the HDF5 attributes on the root ``/``
group if ``LOOM_SPEC_VERSION`` is lower than ``3.0.0`` or if it does not exist. This is to preserve a consistent
format for legacy files.


Row and column attributes
Expand Down
8 changes: 5 additions & 3 deletions loompy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
from .utils import *
from .utils import get_loom_spec_version, compare_loom_spec_version, timestamp, deprecated
from .normalize import normalize_attr_values, materialize_attr_values
from .attribute_manager import AttributeManager
from .file_attribute_manager import FileAttributeManager
from .global_attribute_manager import GlobalAttributeManager
from .graph_manager import GraphManager
from .layer_manager import LayerManager
from .loom_view import LoomView
from .loom_layer import MemoryLoomLayer, LoomLayer
from .to_html import to_html
from .view_manager import ViewManager
from .loompy import connect, create, create_append, combine, create_from_cellranger, LoomConnection, new, combine_faster
from .loompy import connect, create, create_append, combine, create_from_cellranger, LoomConnection, new, combine_faster, create_from_matrix_market
from .loom_validator import LoomValidator
from ._version import __version__, loom_spec_version
from .bus_file import create_from_fastq
from .cell_calling import call_cells
4 changes: 2 additions & 2 deletions loompy/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = '2.0.18'
loom_spec_version = '2.0.1'
__version__ = '3.0.0'
loom_spec_version = '3.0.0'
10 changes: 8 additions & 2 deletions loompy/attribute_manager.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from typing import *
import numpy as np
import h5py
import loompy
from loompy import timestamp
from .utils import compare_loom_spec_version


class AttributeManager:
Expand Down Expand Up @@ -144,13 +146,17 @@ def __setattr__(self, name: str, val: np.ndarray) -> None:
raise KeyError("Attribute name cannot contain slash (/)")
else:
if self.ds is not None:
values = loompy.normalize_attr_values(val)
values = loompy.normalize_attr_values(val, compare_loom_spec_version(self.ds._file, "3.0.0") >= 0)
a = ["/row_attrs/", "/col_attrs/"][self.axis]
if self.ds.shape[self.axis] != 0 and values.shape[0] != self.ds.shape[self.axis]:
raise ValueError(f"Attribute '{name}' must have exactly {self.ds.shape[self.axis]} values but {len(values)} were given")
if self.ds._file[a].__contains__(name):
del self.ds._file[a + name]
self.ds._file[a + name] = values # TODO: for 2D arrays, use block compression along columns/rows

if values.dtype == np.object_:
self.ds._file.create_dataset(a + name, data=values, dtype=h5py.string_dtype(encoding="utf-8"))
else:
self.ds._file[a + name] = values
self.ds._file[a + name].attrs["last_modified"] = timestamp()
self.ds._file[a].attrs["last_modified"] = timestamp()
self.ds._file.attrs["last_modified"] = timestamp()
Expand Down
Loading

0 comments on commit 710c945

Please sign in to comment.