Skip to content

Sharing compressed data #48

@adamgayoso

Description

@adamgayoso

Hello, this is such a cool project!

I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like write_h5ad(path, compression="gzip") reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue with adata.obs["donor"] where it's mixed string and float types, so also saving it with adata.obs["donor"] = adata.obs["donor"].astype(str) would be appreciated.

We are working on faster implementations of scvi-tools using jax. In this notebook we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions