-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr Python v3 and Zarr v3 #355
Comments
Here is a summary of the current situation with zarr-python (my best understanding, I think ~90% correct but subject to uncertainty and changes througout the last/next few weeks). Short version:
Longer version: 1.1 API changes in 1.2: With
Variable and fixed length strings (Unicode or not) are not supported and produce warnings like the following, depending on the type of string:
There is this open issue/proposal to support fixed-length strings: https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html we'd have support for fixed-with (Unicode) strings. Looking at the Zarr v3 data types page: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#data-types:
Then the data types page: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#data-types says "Under construction". The current behavior in Regardless of what is the final type(s) for strings (#294), I think that we are anyway affected by this issue with strings ("not yet ported to v3"). Example of what happens with fixed-with strings and >>> xds = xr.Dataset(data_vars={"ant": np.array(["a1", "b1", "c2"]).astype("S2")})
>>> xds.to_zarr("foo_ant.zarr", mode="w", zarr_format=3)
>>> xds_reloaded = xr.open_zarr("foo_ant.zarr")
# .../venv_xradio_python312/lib/python3.12/site-packages/zarr/codecs/vlen_utf8.py:99: UserWarning: The codec
# `vlen-bytes` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr
# implementations and may change in the future.
# return cls(**configuration_parsed)
>>> xds_reloaded
Out[23]:
<xarray.Dataset> Size: 24B
Dimensions: (ant: 3)
Coordinates:
* ant (ant) object 24B b'a1' b'b1' b'c2'
Data variables:
*empty* Example with UTF-8 encoded fixed-width strings written with >>> ant_names = [ant.encode("utf-8") for ant in ["a1", "b3", "c2"]]
>>> xds = xr.Dataset(data_vars={"ant": np.array(ant_names)})
>>> xds.to_zarr("foo_ant_utf8.zarr", mode="w", zarr_format=3)
# .../venv_xradio_python312/lib/python3.12/site-packages/zarr/codecs/vlen_utf8.py:99: UserWarning: The codec
# `vlen-bytes` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr
# implementations and may change in the future.
# return cls(**configuration_parsed)
#
# .../venv_xradio_python312/lib/python3.12/site-packages/zarr/core/array.py:3985: UserWarning: The dtype `|S2`
# is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations
# and may change in the future.
# meta = AsyncArray._create_metadata_v3(
#
# .../venv_xradio_python312/lib/python3.12/site-packages/zarr/api/asynchronous.py:197: UserWarning:
# Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by
# other zarr implementations and may change in the future.
>>> xds_reloaded = xr.open_zarr("foo_ant_utf8.zarr")
# .../venv_xradio_python312/lib/python3.12/site-packages/zarr/codecs/vlen_utf8.py:99: UserWarning: The codec
# `vlen-bytes` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr
# implementations and may change in the future.
# return cls(**configuration_parsed)
>>> xds_reloaded
Out[12]:
<xarray.Dataset> Size: 24B
Dimensions: (ant: 3)
Coordinates:
* ant (ant) object 24B b'a1' b'b3' b'c2'
Data variables:
*empty* There is a nice quick summary of how similar strings are supported in And finally, for reference, more details about the changes in the
zarr_format=3, example
From https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#array-metadata, In addition to this, in the compressor(s) value one needs to pass codecs from the new zarr.codecs list, instead of numcodecs. There are some examples in the xarray tests: https://github.com/pydata/xarray/blob/03c1014d6e7e413598603b73eac538137583db8d/xarray/tests/test_backends.py#L2703-L2726 This doesn't seem to be yet stable and well documented. The xarry doc examples produce similar exceptions as we initially get in xradio when trying to use
Also, some xarray tests need to silence zarr warnings (about types mostly I think): https://github.com/pydata/xarray/pull/9920/files |
Based on the comment above, unless we find something different/new, what I'd in principle include in this branch is:
|
After discussing this in the XRADIO meeting, we have decided on holding off upgrading to zarr-python 3 until DataTree offers full support: |
Currently, XRADIO makes use of Python Zarr v2 and Zarr format v2. Python Zarr v3 was recently released (https://zarr.dev/blog/zarr-python-3-release/ ) and offers full support for Zarr format v3, but it requires Python >= 3.11. This issue aims to look into what changes would need to be made to support Zarr format v3 and Zarr format v3. Note that zarr_version=3 is required for all the to_zarr calls.
The text was updated successfully, but these errors were encountered: