Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline stops at Computing balanced KNN. TypeError: expected dtype object, got 'numpy.dtype[int64]' #1

Open
NadaAbdelgawad opened this issue Feb 13, 2025 · 0 comments

Comments

@NadaAbdelgawad
Copy link

Hi,

Thanks for the amazing tool, I am trying to use it to analyse my data. But I am having an issue in computing neighbours step.

I ran the pipeline as instructed and it worked till the Computing balanced KNN step

The output and error:
#########################################################################################################

Activating Chromograph environment...
Environment activated! and the source code is in /omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo
the pipeline started
11:56:30 INFO Starting main workflow
11:56:30 INFO Performing the following steps: ['bin_analysis', 'peak_calling', 'peak_analysis', 'prom', 'RNA', 'Impute_RNA', 'motifs', 'bigwigs', 'split'] for build /omics/odcf/analysis/OE0519_projects/human_embryonic_chp_development/group_share/Analysis/ATAC/Chromograph_pipeline/Creating_builds
11:56:30 INFO Starting analysis for subset All
11:56:58 INFO Not including the following column attributes {'Targetnumcells', 'Cellconc', 'TSNE', 'PCA'}
11:56:58 INFO Combined bin file already exists, using this for analysis
11:56:59 INFO Bin_Analysis initialised, saving plots to /omics/odcf/analysis/OE0519_projects/human_embryonic_chp_development/group_share/Analysis/ATAC/Chromograph_pipeline/Creating_builds/All/exported
11:56:59 INFO Running Chromograph Bin-analysis on 30539 cells with 154420 bins of size 20kb_bins
11:57:00 INFO Creating temp file for faster LSI
60it [01:40, 1.68s/it]
11:58:41 INFO Performing TF-IDF on (28662, 30539)
28672it [00:17, 1676.98it/s]
30720it [01:18, 392.48it/s]
12:00:16 INFO Finished fitting TF-IDF
12:00:16 INFO Fitting PCA
12:00:16 INFO Fitting 28662.0 features from 30539 cells to 40 components
30720it [02:27, 208.15it/s]
12:02:44 INFO Transforming 28662.0 features from 30539 cells to 40 components
30720it [00:15, 2009.25it/s]
12:03:00 INFO Found 29 significant components
12:03:00 INFO Components after discarding depth correlates: 28
12:03:00 INFO Finished PCA transformation
12:03:00 INFO Computing balanced KNN (k = 25) space using the 'euclidean' metric
Traceback (most recent call last):
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromograph/chromograph/pipeline/main_workflow.py", line 116, in
bin_analysis.fit(ds)
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromograph/chromograph/pipeline/Bin_analysis.py", line 153, in fit
bnn.fit(decomp)
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/cytograph-dev/cytograph/manifold/balanced_knn.py", line 163, in fit
self.nn = NNDescent(data=self.fitdata, metric=metric_f, n_jobs=-1)
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 838, in init
leaf_array = rptree_leaf_array(self._rp_forest)
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo/lib/python3.7/site-packages/pynndescent/rp_trees.py", line 990, in rptree_leaf_array
return np.vstack(rptree_leaf_array_parallel(rp_forest))
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo/lib/python3.7/site-packages/pynndescent/rp_trees.py", line 982, in rptree_leaf_array_parallel
joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo/lib/python3.7/site-packages/joblib/parallel.py", line 1863, in call
return output if self.return_generator else list(output)
File "/omics/groups/OE0519/internal_temp/nada/micromamba/envs/chromo/lib/python3.7/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
TypeError: expected dtype object, got 'numpy.dtype[int64]'

#########################################################################################################

I tried to check for the types of the data that are used in the balanced_knn.py from cytograph : Type of fitdata, is <class 'numpy.ndarray'>
and Data dtype: float64 Type of ds.ca.LSI_b: <class 'numpy.ndarray'>, ds.ca.LSI_btype: float64
Type of decomp: <class 'numpy.ndarray'>, decomp type: float32

what can I do to fix the error ?

here are my env info:
Python 3.7.8

argcomplete 3.1.2
argh 0.27.2
asciitree 0.3.3
backcall 0.2.0
bx-python 0.10.0
chromograph 0.0.1
click 8.1.8
cooler 0.9.3
cycler 0.11.0
Cython 3.0.11
cytograph 2.0.1
cytoolz 0.12.3
debugpy 1.7.0
decorator 5.1.1
dill 0.3.7
entrypoints 0.4
exceptiongroup 1.2.2
fisher 0.1.14
fonttools 4.38.0
future 1.0.0
gffutils 0.13
h5py 3.8.0
harmony-pytorch 0.1.7
hdbscan 0.8.28
HiCMatrix 17.2
igraph 0.10.8
importlib-metadata 6.7.0
iniconfig 2.0.0
intervaltree 3.1.0
ipykernel 6.16.2
ipython 7.34.0
jedi 0.19.2
joblib 1.3.2
jupyter_client 7.4.9
jupyter_core 4.12.0
kiwisolver 1.4.5
kneed 0.8.5
leidenalg 0.10.2
llvmlite 0.32.1
loompy 3.0.8
MACS2 2.2.7.1
matplotlib 3.5.1
matplotlib-inline 0.1.6
multiprocess 0.70.15
nest-asyncio 1.6.0
networkx 2.6.3
numba 0.49.1
numexpr 2.8.6
numpy 1.21.6
numpy_groupies 0.9.6
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
openTSNE 1.0.1
packaging 24.0
pandas 1.1.5
parso 0.8.4
patsy 1.0.1
pexpect 4.9.0
pickleshare 0.7.5
Pillow 9.5.0
pip 24.0
pluggy 1.2.0
prompt_toolkit 3.0.48
psutil 6.1.1
ptyprocess 0.7.0
pybedtools 0.8.1
pyBigWig 0.3.22
pyfaidx 0.8.1.3
pyGenomeTracks 3.7
Pygments 2.17.2
pynndescent 0.4.8
pyparsing 3.1.4
pysam 0.23.0
pytest 7.4.4
python-dateutil 2.9.0
python-louvain 0.16
pytz 2024.2
PyYAML 6.0.1
pyzmq 26.2.1
scikit-learn 1.0.2
scikit-network 0.28.2
scipy 1.7.3
setuptools 59.8.0
simplejson 3.19.3
six 1.16.0
sortedcontainers 2.4.0
statsmodels 0.13.5
tables 3.7.0
texttable 1.7.0
threadpoolctl 3.1.0
tomli 2.0.1
toolz 0.12.1
torch 1.13.1
tornado 6.2
tqdm 4.67.1
traitlets 5.9.0
typing_extensions 4.7.1
umap-learn 0.4.6
unidip 0.1.1
wcwidth 0.2.13
wheel 0.42.0
zipp 3.15.0

Thanks in advance! and I hope for more detailed instruction for beginner users in the future.

Nada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant