Sharing compressed data

Hello, this is such a cool project!

I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like `write_h5ad(path, compression="gzip")` reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue with `adata.obs["donor"]` where it's mixed string and float types, so also saving it with `adata.obs["donor"] = adata.obs["donor"].astype(str)` would be appreciated.

We are working on faster implementations of scvi-tools using jax. In this [notebook](https://colab.research.google.com/drive/16i9MxwYjnWJ7c9wudFOzCP1Yep0Tldua?usp=sharing) we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing compressed data #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sharing compressed data #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions