Skip to content

Conversation

@paraseba
Copy link
Collaborator

No description provided.

@paraseba paraseba requested a review from dcherian December 19, 2025 20:39
// parent chunks by extent, dropping chunk [10] because it's outside [0..5)
// 8. Result: chunk [10] is lost
//
// The fix requires updating Session.splits during rebase to match the new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The fix requires updating Session.splits during rebase to match the new
// With IC1, doing this correctly requires updating Session.splits during rebase to match the new


for (node_path, node_id) in flush_data.change_set.new_arrays() {
for (node_path, node_id, array_data) in flush_data.change_set.new_arrays() {
// FIXME: do we need to cache this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think so since we iterate over each node once and query get_split_sizes once.

let splits =
self.splits.get(&node.id).expect("splits should exist for this node.");
let init: HashMap<ManifestExtents, Vec<ChunkInfo>> = Default::default();
// FIXME: this duplicates chunk storage for the array
Copy link
Contributor

@dcherian dcherian Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah IIUC this loads all on-disk manifests to memory instead of simply rewriting the ones that need to be rewritten. if so, that should not be needed. We just need to load the existing manifest for a split in line 2030

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups you're right, this is wrong in that way


let modified_splits = self
//let modified_splits = classified_chunks.keys().collect::<HashSet<_>>();
// FIXME: this is another pass through all chunks
Copy link
Contributor

@dcherian dcherian Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is a pass over all chunks in the changeset, so that's fine. IIRC our goal was to have flush scale with size of Changeset, not size of array

self.change_set
.new_array_chunk_iterator(node_id, node_path, extent.clone())
.new_array_chunk_iterator(node_id, node_path)
// FIXME: do we need to optimize this so we don't need multiple passes over all chunks calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just do the groupby into hashmaps once outside, and pass the appropriate iterator in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants