Alternative way to do manifest split without caching the splits #1498

paraseba · 2025-12-19T20:38:56Z

No description provided.

dcherian · 2025-12-22T18:37:13Z

icechunk/src/session.rs

+    //    parent chunks by extent, dropping chunk [10] because it's outside [0..5)
+    // 8. Result: chunk [10] is lost
+    //
+    // The fix requires updating Session.splits during rebase to match the new


Suggested change

// The fix requires updating Session.splits during rebase to match the new

// With IC1, doing this correctly requires updating Session.splits during rebase to match the new

dcherian · 2025-12-22T18:37:51Z

icechunk/src/session.rs


-    for (node_path, node_id) in flush_data.change_set.new_arrays() {
+    for (node_path, node_id, array_data) in flush_data.change_set.new_arrays() {
+        // FIXME: do we need to cache this?


don't think so since we iterate over each node once and query get_split_sizes once.

dcherian · 2025-12-22T18:44:20Z

icechunk/src/session.rs

-        let splits =
-            self.splits.get(&node.id).expect("splits should exist for this node.");
+        let init: HashMap<ManifestExtents, Vec<ChunkInfo>> = Default::default();
+        // FIXME: this duplicates chunk storage for the array


yeah IIUC this loads all on-disk manifests to memory instead of simply rewriting the ones that need to be rewritten. if so, that should not be needed. We just need to load the existing manifest for a split in line 2030

ups you're right, this is wrong in that way

dcherian · 2025-12-22T18:45:35Z

icechunk/src/session.rs


-        let modified_splits = self
+        //let modified_splits = classified_chunks.keys().collect::<HashSet<_>>();
+        // FIXME: this is another pass through all chunks


but this is a pass over all chunks in the changeset, so that's fine. IIRC our goal was to have flush scale with size of Changeset, not size of array

dcherian · 2025-12-22T19:44:17Z

icechunk/src/session.rs

                    self.change_set
-                        .new_array_chunk_iterator(node_id, node_path, extent.clone())
+                        .new_array_chunk_iterator(node_id, node_path)
+                        // FIXME: do we need to optimize this so we don't need multiple passes over all chunks calling


we should just do the groupby into hashmaps once outside, and pass the appropriate iterator in?

Alternative way to do manifest split without caching the splits

5f0e5d3

paraseba requested a review from dcherian December 19, 2025 20:39

dcherian reviewed Dec 22, 2025

View reviewed changes

paraseba and others added 2 commits December 23, 2025 18:07

Merge main

c921aee

Fix bad merge.

2b13177

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternative way to do manifest split without caching the splits #1498

Alternative way to do manifest split without caching the splits #1498

Uh oh!

paraseba commented Dec 19, 2025

Uh oh!

dcherian Dec 22, 2025

Uh oh!

dcherian Dec 22, 2025

Uh oh!

dcherian Dec 22, 2025 •

edited

Loading

Uh oh!

paraseba Dec 22, 2025

Uh oh!

dcherian Dec 22, 2025 •

edited

Loading

Uh oh!

dcherian Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// The fix requires updating Session.splits during rebase to match the new
	// With IC1, doing this correctly requires updating Session.splits during rebase to match the new

Alternative way to do manifest split without caching the splits #1498

Are you sure you want to change the base?

Alternative way to do manifest split without caching the splits #1498

Uh oh!

Conversation

paraseba commented Dec 19, 2025

Uh oh!

dcherian Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paraseba Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcherian Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dcherian Dec 22, 2025 •

edited

Loading

dcherian Dec 22, 2025 •

edited

Loading