Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: ufunc 'isnan' not supported for the input types #356

Open
TomNicholas opened this issue Dec 17, 2024 · 3 comments
Open

TypeError: ufunc 'isnan' not supported for the input types #356

TomNicholas opened this issue Dec 17, 2024 · 3 comments
Labels
bug Something isn't working readers

Comments

@TomNicholas
Copy link
Member

Trying to open a local copy of one of the files comprising the CWorthy OAE atlas (xref #132) (stored in the cloud here) with @sharkinsspatial 's HDF reader raises an error

In [4]: vds = open_virtual_dataset('../experimentation/virtualizarr/oae/alk-forcing.000-1999-01.pop.h.347.nc', backend=HDFVirtualBackend)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 vds = open_virtual_dataset('../experimentation/virtualizarr/oae/alk-forcing.000-1999-01.pop.h.347.nc', backend=HDFVirtualBackend)

File ~/Documents/Work/Code/virtualizarr/virtualizarr/backend.py:217, in open_virtual_dataset(filepath, filetype, group, drop_variables, loadable_variables, decode_times, cftime_variables, indexes, virtual_array_class, virtual_backend_kwargs, reader_options, backend)
    214 if backend_cls is None:
    215     raise NotImplementedError(f"Unsupported file type: {filetype.name}")
--> 217 vds = backend_cls.open_virtual_dataset(
    218     filepath,
    219     group=group,
    220     drop_variables=drop_variables,
    221     loadable_variables=loadable_variables,
    222     decode_times=decode_times,
    223     indexes=indexes,
    224     virtual_backend_kwargs=virtual_backend_kwargs,
    225     reader_options=reader_options,
    226 )
    228 return vds

File ~/Documents/Work/Code/virtualizarr/virtualizarr/readers/hdf/hdf.py:64, in HDFVirtualBackend.open_virtual_dataset(filepath, group, drop_variables, loadable_variables, decode_times, indexes, virtual_backend_kwargs, reader_options)
     55 drop_variables, loadable_variables = check_for_collisions(
     56     drop_variables,
     57     loadable_variables,
     58 )
     60 filepath = validate_and_normalize_path_to_uri(
     61     filepath, fs_root=Path.cwd().as_uri()
     62 )
---> 64 virtual_vars = HDFVirtualBackend._virtual_vars_from_hdf(
     65     path=filepath,
     66     group=group,
     67     drop_variables=drop_variables + loadable_variables,
     68     reader_options=reader_options,
     69 )
     71 loadable_vars, indexes = open_loadable_vars_and_indexes(
     72     filepath,
     73     loadable_variables=loadable_variables,
   (...)
     78     decode_times=decode_times,
     79 )
     81 attrs = HDFVirtualBackend._get_group_attrs(
     82     path=filepath, reader_options=reader_options, group=group
     83 )

File ~/Documents/Work/Code/virtualizarr/virtualizarr/readers/hdf/hdf.py:354, in HDFVirtualBackend._virtual_vars_from_hdf(path, group, drop_variables, reader_options)
    352 if key not in drop_variables:
    353     if isinstance(g[key], Dataset):
--> 354         variable = HDFVirtualBackend._dataset_to_variable(path, g[key])
    355         if variable is not None:
    356             variables[key] = variable

File ~/Documents/Work/Code/virtualizarr/virtualizarr/readers/hdf/hdf.py:284, in HDFVirtualBackend._dataset_to_variable(path, dataset)
    282 if isinstance(fill_value, np.ndarray):
    283     fill_value = fill_value[0]
--> 284 if np.isnan(fill_value):
    285     fill_value = float("nan")
    286 if isinstance(fill_value, np.generic):

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The error message is not very helpful. Numpy could have at least told me what the type it received was, and the HDF reader could have added context about which variable it was trying to parse when it failed (so that I could choose to load it instead).

@TomNicholas TomNicholas added bug Something isn't working readers labels Dec 17, 2024
@sharkinsspatial
Copy link
Collaborator

@TomNicholas This will be fixed in an upcoming PR. Based on our discussions with @rabernat I have been misinterpreting the semantic relationship between the HDF definition of fillvalue and the Zarr definition of fillvalue (which as @rabernat elaborated has changed over time and has been further complicated by v3 changes, see pydata/xarray#5475 for an excellent discussion of the topic). To repeat @rabernat 's advice

  • The HDF fillvalue definition is reserved for the return value for uninitialized chunks so in the case of partially written chunks or
    when a dataset is created but not yet populated with actual data.

  • In the virtualzarr context we should be using CF convention _FillValue (if present) to populate our zarray metadata fillvalue.

So this block will be completely removed https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/readers/hdf/hdf.py#L282-L287. I have the changes made during the hack day changed and I'll try to submit a PR tomorrow.

@TomNicholas
Copy link
Member Author

Wonderful thank you @sharkinsspatial ! Does that mean there is a pre-existing issue for this?

the HDF reader could have added context about which variable it was trying to parse when it failed (so that I could choose to load it instead).

Separately this might still be useful, but could be tracked in another issue.

@sharkinsspatial
Copy link
Collaborator

@TomNicholas This is a good point that we hit when dealing with the other null fillvalue we hit. I'll try to wrap exceptions with additional per variable information so that users can have an easier diagnosis in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working readers
Projects
None yet
Development

No branches or pull requests

2 participants