Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
slinnarsson committed Aug 27, 2018
1 parent ffc23e9 commit ecb341e
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 40 deletions.
64 changes: 40 additions & 24 deletions doc/apiwalkthrough/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,19 @@ For example, the following creates a loom file with a 100x100 main matrix, one r
col_attrs = { "SomeColAttr": np.arange(100) }
loompy.create(filename, matrix, row_attrs, col_attrs)
Note that ``loompy.create()`` does not return anything. To work with the newly created file, you must ``loompy.connect()`` to it.

You can also create an empty file using `loompy.new()`, which returns a connection to the newly created file. The file can then be populated with data.
This is especially useful when you're building a dataset incrementally, e.g. by pooling subsets of other datasets:

.. code:: python
with loompy.new("outfile.loom") as dsout:
for sample in samples:
with loompy.connect(sample) as dsin:
logging.info(f"Appending {sample}.")
dsout.add_columns(ds.layers, col_attrs=dsin.col_attrs, row_attrs=dsin.row_attrs)
You can also create a file by combining existing loom files. The files will be concatenated along the column
axis, and therefore must have the same number of rows. If the rows are potentially not in the same order,
you can supply a ``key`` argument; the row attribute corresponding to the key will be used to sort the files.
Expand Down Expand Up @@ -155,8 +168,13 @@ You can load the main matrix or any layer as sparse:
.. code:: python
ds.layers["exons"].sparse() # Returns a scipy.sparse.coo_matrix()
ds.layers["unspliced"].sparse(rows, cols) # Returns only the indicated rows and columns (ndarrays of integers)
ds.layers["unspliced"].sparse(rows, cols) # Returns only the indicated rows and columns (ndarrays of integers or bools)
You can assign layers from sparse matrices:

.. code:: python
ds.layers["exons"] = my_sparse_matrix
Global attributes
Expand Down Expand Up @@ -259,7 +277,14 @@ delete any part of the matrix.
ds.add_columns(submatrix, col_attrs)
You need to provide a submatrix corresponding to the new columns, as well as
a dictionary of column attributes with values for all the new columns.
a dictionary of column attributes with values for all the new columns.

Note that if you are adding columns to an empty file, you must also provide row attributes:

.. code:: python
ds.add_columns(submatrix, col_attrs, row_attrs={"Gene": genes})
You can also add the contents of another .loom file:

Expand All @@ -279,27 +304,6 @@ with conflicting values, you can automatically convert such attributes into colu
by passing ``convert_attrs=True`` to the method.


There is also a convenient function to create or append to a loom file:

.. code:: python
loompy.create_append(filename, layers, row_attrs, col_attrs)
This will create the file if it doesn't exist, and append to it if it does. This function
is commonly used when combining several loom files while performing a selection on the columns:

.. code:: python
for f in input_files:
with loompy.connect(f) as ds:
cells = # select the desired columns in ds
for (_, _, view) in ds.scan(items=cells):
loompy.create_append(outout_file, view.layers, view.ra, view.ca)
The code above loops over a number of input files, then scans across each file to select
a desired subset of the columns (cells) and writes them to the output file. Since it uses
``scan()``, it will never load entire datasets in RAM and will work no matter how big the
input datasets are.

.. _loomlayers:

Expand Down Expand Up @@ -334,6 +338,12 @@ expressions are equivalent to the following:
a = ds["spliced"][:, 10] # Assign the 10th column of layer "spliced" to the variable a
del ds["spliced"] # Delete the "spliced" layer
Sometimes you may need to create an empty layer (all zeros), to be filled later. Empty layers
are created by assigning a type to a layer name. For example:

.. code:: python
ds["empty_floats"] = "float32"
ds["empty_ints"] = "int64"
.. _loomoperations:
Expand Down Expand Up @@ -401,7 +411,13 @@ should return a single float or integer value. Internally, ``map()`` uses ``scan
loop across the file.

Note that you must always provide a list of functions, even if it has only one element, and
that the result is a list of vectors, one per function that was supplied.
that the result is a list of vectors, one per function that was supplied. Hence the correct
way to map a single function across the matrix is:

.. code:: python
(means,) = ds.map([np.mean], axis=1)
Permutation
^^^^^^^^^^^
Expand Down
36 changes: 20 additions & 16 deletions doc/cookbook/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,25 +44,27 @@ of lists:
loompy.create(filename, data, df_row_metadata.to_dict("list"), df_col_metadata.to_dict("list"))
Combining data using scan()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Combining data using scan() and new()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We often want to scan through a number of input files (for example, raw
data files from multiple experiments), select a subset of the columns (e.g. cells passing QC)
and write them to a new file. This can be accomplished using the ``scan()`` method.
and write them to a new file. This can be accomplished by creating an empty file using ``loompy.new()`` and
then filling it up using the ``scan()`` method.

For example, let's select cells that have more than 500 detected UMIs in each of several files:

.. code:: python
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1):
loompy.create_append(out_file, view.layers, view.ra, view.ca)
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Note that by using ``create_append`` we will first be creating the new file, then appending columns to it.
Note that by using ``new()`` we will first be creating the new file, then appending columns to it.

But what if the input files do not have their rows in the same order? ``scan()`` accepts a ``key`` argument
to designate a primary key; each view is then returned sorted on the primary key on the *other axis*.
Expand All @@ -74,12 +76,14 @@ are sorted on the accession identifier along rows:

.. code:: python
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1, key="Accession"):
loompy.create_append(out_file, view.layers, view.ra, view.ca)
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1, key="Accession"):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Fitting an incremental PCA
^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down

0 comments on commit ecb341e

Please sign in to comment.