Issue with running pipeline since previous update 5 months ago.

### Report

Hey I love the pipeline and what you are doing by merging multiple machine learning models to get a popularity vote. Ive been using the package without a problem for about a year now in my single cell pipeline but recently rebuilt my conda env and now everything is broken and I'm having a heck of a time trying to figure out exactly why but this does bring up both a big request and a enhancement request for the documentation to make the pipeline more easy for others in the future.

Here is the code I typically run using the default of all models.
```
#%% Automated cell type analysis with popv
import popv
# popV needs adata.X raw information no normalization
# use the QC_on_adata_normal() function to remove low quality cells as a pre-processing step
# You will start with the adata_tmp object for this then

# Select a pre-trained model
huggingface_repo = "popV/tabula_sapiens_All_Cells"
# The query batch key is what will be used by bbknn for batch correction
query_batch_key = "run_accession"
#%% Perform annotation useing a premade model
import numba
hmo = popv.hub.HubModel.pull_from_huggingface_hub(huggingface_repo, cache_dir="tmp/tabula_sapiens")

#%%
adata_tmp_an = hmo.annotate_data(
    adata_tmp,
    query_batch_key=query_batch_key,
    prediction_mode="inference",  # "fast" does not integrate reference and query.
    gene_symbols="feature_name", # "Uncomment if using gene symbols."
)

for col in adata_tmp_an.obs.columns:
    adata_tmp_an.obs[col] = adata_tmp_an.obs[col].astype(str)

adata_tmp_an.write("adata_popv_an.h5ad")
```

But this hit an error when running the OnClass model which has to do with some update with pandas v3.0.0 now making it difficult to write anndata objects with arrow types. I've been banging my head against a wall trying to sort out the dependency issues which seems like if I set pandas=2.2.3 and anndata=0.12.10 the popv step gets further but still fails. I tried removing the onclass model and manually setting all the other models to be used excluding just the onclas model but there is very limited documentation on how to do this in the ipython notebook. 

So I need 2 things:
1. Please try to run the pipeline with a fresh install and the most current versions of scanpy, anndata, and pandas. If you can get it to work please tell me your package versions so I can copy that. If it fails let me know and I can also help troubleshoot a fix.

2. Add documentation to the tutorial page showing users how to manually set using one or all of the models to be be used as inputs for the .annotate_data() function. Please also update the API to more clearly list out the name of the models available to be called for the .annotate_data() function. Right now based onthe python files at the source it looks like those need to be called  _onclass, _celltypist, _harmony, etc. But that isn't clear and threw  errors when I tried setting those in my code.

### Version information

Here is the conda env I yaml file I use to make my virtual machine for running popV.

name: sc_pre
channels:
  - conda-forge
  - bioconda
  - plotly
dependencies:
  # Core Python
  - python=3.11
  - cython
  - numpy
  - pandas=2.2.3
  - scipy
  - scikit-learn

  # Scanpy ecosystem
  - scanpy
  - anndata=0.12.10
  - leidenalg
  - louvain
  - python-igraph
  - bbknn
  - umap-learn
  - pynndescent
  - fa2

  # Cell annotation
  - celltypist
  - cellxgene-census
  - scvi-tools

  # CNV analysis
  - gffutils

  # Visualization
  - matplotlib-base
  - seaborn
  - plotly-orca
  - hvplot
  - adjusttext

  # Data handling
  - pybiomart
  - goatools
  - geoparse

  # SRA tools (for SRAscraper compatibility)
  - awscli
  - parallel-fastq-dump
  - pysradb
  - python-wget
  - sra-tools>=3.0.0

  # Fix for the OpenSSL Version Mismatch
  - aws-c-cal
  - awscrt
  - openssl

  # Utilities
  - pyyaml
  - jupyter_core
  - jupyterlab

  # MultiQC for reports
  - multiqc

  # GSEA
  - bioconda::gseapy

  # GUI tools (optional, can be removed for headless)
  - pyqt
  - qt
  - firefox
  - pygraphviz

  # pip dependencies
  - pip
  - pip:
    - scrublet           # For doublet detection
    - popv               # For cell annotation consensus
    - cytotrace2-py      # For stemness scoring (install from github if needed)
    - infercnvpy         # For CNV analysis


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with running pipeline since previous update 5 months ago. #109

Report

Version information

Core Python

Scanpy ecosystem

Cell annotation

CNV analysis

Visualization

Data handling

SRA tools (for SRAscraper compatibility)

Fix for the OpenSSL Version Mismatch

Utilities

MultiQC for reports

GSEA

GUI tools (optional, can be removed for headless)

pip dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with running pipeline since previous update 5 months ago. #109

Description

Report

Version information

Core Python

Scanpy ecosystem

Cell annotation

CNV analysis

Visualization

Data handling

SRA tools (for SRAscraper compatibility)

Fix for the OpenSSL Version Mismatch

Utilities

MultiQC for reports

GSEA

GUI tools (optional, can be removed for headless)

pip dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions