Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Its possible to write zarr 2 format zarr's using zarr3 that can't be read by zarr v2 #2773

Open
benritchie opened this issue Jan 27, 2025 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@benritchie
Copy link

benritchie commented Jan 27, 2025

Zarr version

v3.0.1

Numcodecs version

15.0

Python Version

3.12.8

Operating System

MacOs

Installation

conda

Description

Hi

I'm Loving Zarr v3 :) - thanks.

I'm trying ot use zarr 3 to create zarr 2 format Zarr's, via xarrray.

i'm using a numcodecs compressor (PCodec), and left the code in "zarr 3 style". - i.e. imported numcodecs.zarr3.PCodec

using this setup, zarr allows me to write the array in zarr 2 format, but when I try to read it, I get an error:

ImportError: zarr 3.0.0 or later is required to use the numcodecs zarr integration

I think zarr 3 should have some firewalling in to ensure that it only allows zarr 2 compatable compressors if writing a zarr2 array.

If I import numcodecs.PCodec instead (while still using zarr3, but writing zarr_format as 2, everything works fine)

Steps to reproduce

zarr creation code below (run using zarr 3.0.1:

import zarr
import xarray
import numcodecs.zarr3 as numcodecsz3

#Open the zarr (any zarr should work here - just set "test_band_ below to a band from the zarr you are using)
ds = xarray.open_zarr('./hls_20200101_20201231_99cc_50q', consolidated=True)

# We want to write the DataSet with different encodings than the source, therefore
# clear all encoding config first.  
ds = ds.drop_encoding()

encoding = {}
encoding['test_band'] = {"serializer": numcodecsz3.PCodec(level=5)}

ds.to_zarr('./test2', mode="w", encoding = encoding, zarr_format=2)

read code:
(using latest version of zarr 2)

import zarr
import xarray
import numcodecs.zarr3 as numcodecsz3

# Open file
z2 = zarr.open('test2', mode='r')

print(z2['ndvi'][:])

Additional output

No response

@Princekumarofficial
Copy link

Princekumarofficial commented Feb 2, 2025

i guess the PCodec compressor utilizes functionalities exclusive to Zarr v3, making it incompatible with the v2 format. Did you try Blosc

@benritchie
Copy link
Author

Sorry, Clarified above.
Summarising:
PCodec works fine with zarr2.
issue is that if you are using zarr3, and use e.g. numcodecs.zarr3.PCodec, and then set zarr_format=2, you produce a zarr that v2 can't read. I think the same applies for any of the other zarr3 wrappers also.
if you use numcodecs.PCodec in zarr3, with format set to 2, all works fine.

so suggestion is just that zarr3 should check ofr these shings when writing format=2, and either translate things so that they do work for zarr2, or raise an exception.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 5, 2025

sorry about this confusion. I don't think the way we handle the v2 / v3 codec distinction is very good -- I'm pretty sure users should not have to think about "v2" or "v3" codecs, when the basic behavior of the codec is the same. we definitely need a better approach here.

we do have an issue tracking this: #2654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants