Combined datasets #5

ivirshup · 2022-04-27T17:03:33Z

I’m wondering if there has been much thought on how metadata for combined datasets are handled. Here I’m thinking about multiple datasets measuring the same variables which have been combined.

Typically, this becomes a single concatenated object with a ”batch” or ”dataset” annotation. However, it could be represented as a collection of objects.

Can/ should there be a convention for maintaining experiment level metadata when multiple experiments are combined? This is trivial for the “collection of experiments” object, but is more complicated for the concatenated object.

For a more concrete example, what happens to the dataset id, and external data identified by the dataset id when we concatenate? Another example is the “files” from a muon.atac generated AnnData: scverse/mudata#20

squidpy's solution for concatenated objects

A similar issue came up in squidpy, which we addressed by essentially requiring a ”library_id” annotation for the observations. Image data is stored under .uns/spatial/{library_id}/ to avoid conflicts when merging. E.g.

# These do not conflict
uns/spatial/library1/images/hires: “image1.png”
uns/spatial/library2/images/hires: “image2.png”

# These do
uns/spatial/images/hires: “image1.png”
uns/spatial/images/hires: “image2.png”

Relevant docs:

tutorial on setting up and AnnData to work with squidpy for a more in depth description.
Description of the AnnData merging algorithm

Collection of objects

The collection of objects sidesteps this issue by allowing each constituent object to hold its own metadata. However, my impression is that far more tools expect a single concatenated object. There is also not as much tooling for collections of objects, though this has been changing (e.g. anndata.AnnCollection, snapatac2.AnnDataSet)

Question

Should there be conventions for maintaining metadata with concatenated objects? Should we insist on collections of objects if we want to maintain metadata?

Relating to #3, what would the obs_subset of a concatenated object be?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined datasets #5

Combined datasets #5

ivirshup commented Apr 27, 2022

Combined datasets #5

Combined datasets #5

Comments

ivirshup commented Apr 27, 2022

squidpy's solution for concatenated objects

Collection of objects

Question