-
Notifications
You must be signed in to change notification settings - Fork 58
Closed
Description
What happened?
On a case insensitive filesystem (e.g. macOS) icechunk let's you create a lowercase/capital version of an existing array, which results in an overwrite. In contrast doing the same with zarr-python raises an exception.
The key difference in the below output is that zarr errors when I try to write. A second difference is that zarr will then allow me to access the underlying data via either 'A' or 'a' while icechunk no longer allows access to 'A'
=== Pure Zarr LocalStore ===
Created A with [1, 2]
EXCEPTION: An array exists in store LocalStore('file:///var/folders/tc/fkgp35zn7z913f9cmsxcl6pc0000gn/T/tmpx958thu4') at path 'a'.
Keys in store: ['A']
'A' in root: True
'a' in root: True
root['A'][:] = [1 2]
root['a'][:] = [1 2]
=== Icechunk local_filesystem_storage ===
Created A with [1, 2]
Created a with [3, 4]
Keys in store: ['a']
'A' in root: False
'a' in root: True
root['a'][:] = [3 4]
What did you expect to happen?
Icechunk throws and exception, or magically redirects the A vs a to different storage locations for me
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "icechunk<2",
# "zarr",
# ]
#
# [[tool.uv.index]]
# name = "scientific-python-nightly-wheels"
# url = "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/"
#
# [tool.uv.sources]
# icechunk = { index = "scientific-python-nightly-wheels" }
# zarr = { index = "scientific-python-nightly-wheels" }
#
# [tool.uv]
# prerelease = "allow"
# ///
#
# This script automatically imports the development branch of icechunk to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import tempfile
import zarr
def show_results(root):
"""Show what keys exist and their values."""
print(f"Keys in store: {list(root.keys())}")
print(f"'A' in root: {'A' in root}")
print(f"'a' in root: {'a' in root}")
if "A" in root:
print(f"root['A'][:] = {root['A'][:]}")
if "a" in root:
print(f"root['a'][:] = {root['a'][:]}")
def test_zarr_local_store_case_sensitivity():
"""Test pure zarr-python with LocalStore on case-insensitive filesystem."""
tmpdir = tempfile.mkdtemp()
store = zarr.storage.LocalStore(tmpdir)
root = zarr.open_group(store, mode="w")
# Create array at path 'A'
arr_A = root.create_array("A", shape=(2,), dtype="i4")
arr_A[:] = [1, 2]
print("Created A with [1, 2]")
# Create array at path 'a' - same path on case-insensitive FS
try:
arr_a = root.create_array("a", shape=(2,), dtype="i4")
arr_a[:] = [3, 4]
print("Created a with [3, 4]")
except zarr.errors.ContainsArrayError as e:
print(f"EXCEPTION: {e}")
root = zarr.open_group(store, mode="r")
show_results(root)
def test_icechunk_case_sensitivity():
"""Test icechunk with local_filesystem_storage on case-insensitive filesystem."""
import icechunk
icechunk.set_logs_filter("icechunk::storage::object_store=error")
tmpdir = tempfile.mkdtemp()
storage = icechunk.local_filesystem_storage(tmpdir)
repo = icechunk.Repository.create(storage)
# Create array at path 'A'
session = repo.writable_session("main")
root = zarr.open_group(session.store, mode="w")
arr_A = root.create_array("A", shape=(2,), dtype="i4")
arr_A[:] = [1, 2]
session.commit("created A")
print("Created A with [1, 2]")
# Create array at path 'a' - same path on case-insensitive FS
session = repo.writable_session("main")
root = zarr.open_group(session.store, mode="w")
arr_a = root.create_array("a", shape=(2,), dtype="i4")
arr_a[:] = [3, 4]
session.commit("created a")
print("Created a with [3, 4]")
session = repo.readonly_session(branch="main")
root = zarr.open_group(session.store, mode="r")
show_results(root)
if __name__ == "__main__":
print("=== Pure Zarr LocalStore ===")
test_zarr_local_store_case_sensitivity()
print()
print("=== Icechunk local_filesystem_storage ===")
test_icechunk_case_sensitivity()MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in icechunk.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of icechunk and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
nightly
Metadata
Metadata
Assignees
Labels
No labels