Hello, this is such a cool project!
I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like write_h5ad(path, compression="gzip") reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue with adata.obs["donor"] where it's mixed string and float types, so also saving it with adata.obs["donor"] = adata.obs["donor"].astype(str) would be appreciated.
We are working on faster implementations of scvi-tools using jax. In this notebook we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!).
Hello, this is such a cool project!
I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like
write_h5ad(path, compression="gzip")reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue withadata.obs["donor"]where it's mixed string and float types, so also saving it withadata.obs["donor"] = adata.obs["donor"].astype(str)would be appreciated.We are working on faster implementations of scvi-tools using jax. In this notebook we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!).