-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Description
What happened?
If you upgrade a repo with delete_unused_v1_files=False then v1 icechunk still can write to it, but in a way that only it can see.
then v1 and v2 can diverge for the same repo
What did you expect to happen?
v1 fails to read. Or we just dont ahve that option.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "icechunk>=2.0.0.dev0",
# "icechunk_v1",
# "zarr>=3",
# ]
#
# [tool.uv]
# extra-index-url = [
# "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple",
# "http://127.0.0.1:8123/simple/",
# ]
# index-strategy = "unsafe-best-match"
# ///
"""Demo: v1 can still write after upgrade when delete_unused_v1_files=False.
Requires the wheel-rename proxy server running:
uvx --from "wheel-rename[server] @ git+https://github.com/earth-mover/rename-wheel" \
wheel-rename serve \
-u https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \
-r "icechunk=icechunk_v1:<2" \
--port 8123
"""
import tempfile
import icechunk
import icechunk_v1
import zarr
def write_array(repo, value, msg):
session = repo.writable_session("main")
root = zarr.open_group(session.store, mode="a")
arr = root.require_array("test_array", shape=(2, 2), dtype="int32")
arr[:] = value
return session.commit(msg)
def read_array(repo):
session = repo.readonly_session("main")
root = zarr.open_group(session.store, mode="r")
return session.snapshot_id, root["test_array"][:].tolist()
with tempfile.TemporaryDirectory() as tmpdir:
# Create v1-format repo, upgrade to v2 keeping v1 files
storage = icechunk.local_filesystem_storage(tmpdir)
repo = icechunk.Repository.create(storage, spec_version=1)
session = repo.writable_session("main")
zarr.group(session.store)
session.commit("init")
icechunk.upgrade_icechunk_repository(
repo, dry_run=False, delete_unused_v1_files=False
)
# Both write after upgrade
repo_v1 = icechunk_v1.Repository.open(icechunk_v1.local_filesystem_storage(tmpdir))
repo_v2 = icechunk.Repository.open(icechunk.local_filesystem_storage(tmpdir))
v1_commit = write_array(repo_v1, 42, "v1 write")
v2_commit = write_array(repo_v2, 99, "v2 write")
# What does each see?
repo_v1 = icechunk_v1.Repository.open(icechunk_v1.local_filesystem_storage(tmpdir))
repo_v2 = icechunk.Repository.open(icechunk.local_filesystem_storage(tmpdir))
v1_tip, v1_value = read_array(repo_v1)
v2_tip, v2_value = read_array(repo_v2)
print(f"v1 wrote: {v1_commit}, v2 wrote: {v2_commit}")
print(f"v1 sees: {v1_tip}, value={v1_value}")
print(f"v2 sees: {v2_tip}, value={v2_value}")MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in icechunk.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of icechunk and its dependencies.
Relevant log output
v1 wrote: ZJH2B8MS09MFW2Q4PAR0, v2 wrote: VKN0R15RJESXR65YJJ60
v1 sees: ZJH2B8MS09MFW2Q4PAR0, value=[[42, 42], [42, 42]]
v2 sees: VKN0R15RJESXR65YJJ60, value=[[99, 99], [99, 99]]Anything else we need to know?
No response
Environment
Details
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status