-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Report
Hey I love the pipeline and what you are doing by merging multiple machine learning models to get a popularity vote. Ive been using the package without a problem for about a year now in my single cell pipeline but recently rebuilt my conda env and now everything is broken and I'm having a heck of a time trying to figure out exactly why but this does bring up both a big request and a enhancement request for the documentation to make the pipeline more easy for others in the future.
Here is the code I typically run using the default of all models.
#%% Automated cell type analysis with popv
import popv
# popV needs adata.X raw information no normalization
# use the QC_on_adata_normal() function to remove low quality cells as a pre-processing step
# You will start with the adata_tmp object for this then
# Select a pre-trained model
huggingface_repo = "popV/tabula_sapiens_All_Cells"
# The query batch key is what will be used by bbknn for batch correction
query_batch_key = "run_accession"
#%% Perform annotation useing a premade model
import numba
hmo = popv.hub.HubModel.pull_from_huggingface_hub(huggingface_repo, cache_dir="tmp/tabula_sapiens")
#%%
adata_tmp_an = hmo.annotate_data(
adata_tmp,
query_batch_key=query_batch_key,
prediction_mode="inference", # "fast" does not integrate reference and query.
gene_symbols="feature_name", # "Uncomment if using gene symbols."
)
for col in adata_tmp_an.obs.columns:
adata_tmp_an.obs[col] = adata_tmp_an.obs[col].astype(str)
adata_tmp_an.write("adata_popv_an.h5ad")
But this hit an error when running the OnClass model which has to do with some update with pandas v3.0.0 now making it difficult to write anndata objects with arrow types. I've been banging my head against a wall trying to sort out the dependency issues which seems like if I set pandas=2.2.3 and anndata=0.12.10 the popv step gets further but still fails. I tried removing the onclass model and manually setting all the other models to be used excluding just the onclas model but there is very limited documentation on how to do this in the ipython notebook.
So I need 2 things:
-
Please try to run the pipeline with a fresh install and the most current versions of scanpy, anndata, and pandas. If you can get it to work please tell me your package versions so I can copy that. If it fails let me know and I can also help troubleshoot a fix.
-
Add documentation to the tutorial page showing users how to manually set using one or all of the models to be be used as inputs for the .annotate_data() function. Please also update the API to more clearly list out the name of the models available to be called for the .annotate_data() function. Right now based onthe python files at the source it looks like those need to be called _onclass, _celltypist, _harmony, etc. But that isn't clear and threw errors when I tried setting those in my code.
Version information
Here is the conda env I yaml file I use to make my virtual machine for running popV.
name: sc_pre
channels:
- conda-forge
- bioconda
- plotly
dependencies:
Core Python
- python=3.11
- cython
- numpy
- pandas=2.2.3
- scipy
- scikit-learn
Scanpy ecosystem
- scanpy
- anndata=0.12.10
- leidenalg
- louvain
- python-igraph
- bbknn
- umap-learn
- pynndescent
- fa2
Cell annotation
- celltypist
- cellxgene-census
- scvi-tools
CNV analysis
- gffutils
Visualization
- matplotlib-base
- seaborn
- plotly-orca
- hvplot
- adjusttext
Data handling
- pybiomart
- goatools
- geoparse
SRA tools (for SRAscraper compatibility)
- awscli
- parallel-fastq-dump
- pysradb
- python-wget
- sra-tools>=3.0.0
Fix for the OpenSSL Version Mismatch
- aws-c-cal
- awscrt
- openssl
Utilities
- pyyaml
- jupyter_core
- jupyterlab
MultiQC for reports
- multiqc
GSEA
- bioconda::gseapy
GUI tools (optional, can be removed for headless)
- pyqt
- qt
- firefox
- pygraphviz
pip dependencies
- pip
- pip:
- scrublet # For doublet detection
- popv # For cell annotation consensus
- cytotrace2-py # For stemness scoring (install from github if needed)
- infercnvpy # For CNV analysis