Skip to content

Conversation

@dcherian
Copy link
Contributor

No description provided.

}) {
let ours = self.config().manifest().splitting().clone();
let theirs = self.config().manifest().splitting().clone();
let theirs = other_config.manifest().splitting().clone();
Copy link
Contributor Author

@dcherian dcherian Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spotted this minor error while reading the code

self.snapshot_id = snap_id;

// we have rebased on top of `snap_id`, now we update splits to match the config in that snap.
let mut new_splits: HashMap<NodeId, ManifestSplits> = HashMap::new();
Copy link
Contributor Author

@dcherian dcherian Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered somehow using self.merge here but the splits update logic is different. This current logic does what I need it to do.

HOWEVER, ChangeSet has set_chunks: BTreeMap<NodeId, HashMap<ManifestExtents, SplitManifest>>, and those ManifestExtents are out of date. (I updated a debug_assert! to an assert! below to enforce this invariant at runtime). So I need to recreate this ChangeSet.

To me, that should be the responsibility of the ConflictSolver. But to do that, I need to call get_split_sizes exactly as below to get updated splits yet I don't have access to NodePath exactly as in here.

  • Should PathFinder be a public utility? That seems wrong, but this NodeId vs NodePath issue seems irreversibly exposed to outside users.
  • we could pass splits from previous_repo: Session to the solver; but we are careful to only set them for arrays with chunk changes I believe, so it's not necessarily complete enough to cover changes in current_repo: Session
  • we could make the Changeset consistent here; but that seems yucky to me (the user has provided us a "fixed" changeset, we should not modify it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my mind; this whole thing seems too hard to punt to a user so now rebase remakes the set_chunks HashMap with the same chunk refs and (possibly new) splits.

@dcherian
Copy link
Contributor Author

dcherian commented Dec 14, 2025

The test failure is interesting. The test makes 3 commits on session1; makes the same 3 changes on session2; and then attempts to commit with UseTheirs.

At the last rebase, the changeset is effectively empty; but technically not-empty I guess because we've drained the inner ChunkCoords: Option<ChunkPayload> hashmap.

[icechunk/src/session.rs:1461:21] &patched_changeset = Edit(
    EditChanges {
        new_groups: {},
        new_arrays: {},
        updated_arrays: {},
        updated_groups: {},
        set_chunks: {
            8NMJ9AJ2WT6WJ: {
                ManifestExtents(
                    [
                        0..5,
                    ],
                ): {},
            },
        },
        deleted_chunks_outside_bounds: {},
        deleted_groups: {},
        deleted_arrays: {},
    },
)

But my new "replay chunk refs" has nothing to replay given that input ^ so the new rebased changeset is truly empty, triggering the "no empty commit allowed" error.

[icechunk/src/session.rs:1487:21] &patched_changeset = Edit(
    EditChanges {
        new_groups: {},
        new_arrays: {},
        updated_arrays: {},
        updated_groups: {},
        set_chunks: {},
        deleted_chunks_outside_bounds: {},
        deleted_groups: {},
        deleted_arrays: {},
    },
)

As solution:

  1. We could allow "empty" commits on rebase with a warning perhaps?
  2. Update the "is_empty" on Edits to check whether the appropriate iterator over set_chunks yields Some.
  3. I also wonder if logic like (2) is needed on the arrays_with_chunk_changes iterator

};
}
// now we update the changeset to be consistent with these new splits
patched_changeset.replay_set_chunk_refs(&new_splits)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expensive, can we do it only for the arrays that had a change to their splits?

.manifest()
.splitting()
.get_split_sizes(&node_path, &shape, &dimension_names);
new_splits.insert(node_id.clone(), new_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if both changes updated the splits?

@dcherian dcherian closed this Dec 22, 2025
@dcherian dcherian deleted the fix-manifest-splits-rebase branch December 22, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants