-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError when read in loom #484
Comments
Hi! I'm not completely sure, though at first glance it seems like an issue in Ideally something like: import anndata
from sinfo import sinfo
sinfo(dependencies=True) Would you be able to share this file, or another one which gives you the same issue? |
Thanks! I had problems running sinfo
You can test the loom file at https://drive.google.com/file/d/1qYAhnunhtCdFbQxU_2zIXim25BloXzFY/view?usp=sharing |
Hmm, didn't know this had issues. Maybe just your environment info then? (Also, is this a different environment? Your previous example was with python 3.6, but this one is 3.7.)
Thanks for linking that! It looks like I don't have permissions to access it at the moment, would you mind changing those? |
yes, this one is on HPC, the previous one is on my local computer, but both gave me the same error. Thanks for looking into it. |
It looks like this is an issue in loompy (linnarsson-lab/loompy#141). Somehow the column was written with unicode values, but told hdf5 the values are ascii ( If you're using If you're using [x.decode() for x in f["col_attrs"]["annotation"]] I'd say you should open an issue with whatever tool wrote this file, since it looks like the bug originated there. I'm not sure what solutions are here past manually reading the values out of the file. If you copy the file, but encode strings as unicode, loompy throws a value error (at least with The errorimport anndata as ad
import loompy
import h5py
from functools import partial
def copy_elem(f, key, value):
if isinstance(value, h5py.Group):
f.create_group(key)
elif isinstance(value, h5py.Dataset) and value.dtype.char == "S":
f[key] = value.asstr(encoding="utf-8")[:]
else:
f.create_dataset(key, data=value)
with h5py.File("./test.loom", "r") as orig, h5py.File("./result.loom", "w") as result:
orig.visititems(partial(copy_elem, result))
result = ad.read_loom("./result.loom") ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-8d3722a2a313> in <module>
----> 1 result = ad.read_loom("./result.loom")
~/github/anndata/anndata/_io/read.py in read_loom(filename, sparse, cleanup, X_name, obs_names, obsm_names, var_names, varm_names, dtype, **kwargs)
192 from loompy import connect
193
--> 194 with connect(filename, "r", **kwargs) as lc:
195 if X_name not in lc.layers.keys():
196 X_name = ""
/usr/local/lib/python3.8/site-packages/loompy/loompy.py in connect(filename, mode, validate, spec_version)
1387 Note: if validation is requested, an exception is raised if validation fails.
1388 """
-> 1389 return LoomConnection(filename, mode, validate=validate)
/usr/local/lib/python3.8/site-packages/loompy/loompy.py in __init__(self, filename, mode, validate)
80 lv = loompy.LoomValidator()
81 if not lv.validate(filename):
---> 82 raise ValueError("\n".join(lv.errors) + f"\n{filename} does not appead to be a valid Loom file according to Loom spec version '{lv.version}'")
83
84 self._file = h5py.File(filename, mode)
ValueError: Row attribute 'Gene' dtype object is not allowed
Column attribute 'CellID' dtype object is not allowed
Column attribute 'ClusterName' dtype object is not allowed
Column attribute 'RNA_snn_res_1_5' dtype object is not allowed
Column attribute 'annotation' dtype object is not allowed
Column attribute 'bms_subj_id' dtype object is not allowed
Column attribute 'bor_by_irrc_may_2018' dtype object is not allowed
Column attribute 'cd3_neg_cell_number' dtype object is not allowed
Column attribute 'cd3_plus_cell_number' dtype object is not allowed
Column attribute 'cd3_status' dtype object is not allowed
Column attribute 'cohort' dtype object is not allowed
Column attribute 'cohort2' dtype object is not allowed
Column attribute 'group' dtype object is not allowed
Column attribute 'index' dtype object is not allowed
Column attribute 'orig_ident' dtype object is not allowed
Column attribute 'pbmc_sample_id' dtype object is not allowed
Column attribute 'pool_id' dtype object is not allowed
Column attribute 'seurat_clusters' dtype object is not allowed
Column attribute 'singleR_cluster' dtype object is not allowed
Column attribute 'singleR_cluster_main' dtype object is not allowed
Column attribute 'subject_id' dtype object is not allowed
Column attribute 'tigl_id' dtype object is not allowed
Column attribute 'treatment_cycle' dtype object is not allowed
Column attribute 'type' dtype object is not allowed
For help, see http://linnarssonlab.org/loompy/format/
./result.loom does not appead to be a valid Loom file according to Loom spec version '0.0.0' |
Thank you! finding the offending characters (naive) helped a lot. I will fix that on my side. |
Hi,
When I read in the loom file
It is a different subset of the Seurat object I converted to loom. I used the same code. The previous loom file can be read in without problems.
I googled and found https://stackoverflow.com/questions/10406135/unicodedecodeerror-ascii-codec-cant-decode-byte-0xd1-in-position-2-ordinal
How can I fix this?
Thanks!
The text was updated successfully, but these errors were encountered: