-
Notifications
You must be signed in to change notification settings - Fork 58
Alternative way to do manifest split without caching the splits #1498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // parent chunks by extent, dropping chunk [10] because it's outside [0..5) | ||
| // 8. Result: chunk [10] is lost | ||
| // | ||
| // The fix requires updating Session.splits during rebase to match the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The fix requires updating Session.splits during rebase to match the new | |
| // With IC1, doing this correctly requires updating Session.splits during rebase to match the new |
|
|
||
| for (node_path, node_id) in flush_data.change_set.new_arrays() { | ||
| for (node_path, node_id, array_data) in flush_data.change_set.new_arrays() { | ||
| // FIXME: do we need to cache this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think so since we iterate over each node once and query get_split_sizes once.
| let splits = | ||
| self.splits.get(&node.id).expect("splits should exist for this node."); | ||
| let init: HashMap<ManifestExtents, Vec<ChunkInfo>> = Default::default(); | ||
| // FIXME: this duplicates chunk storage for the array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah IIUC this loads all on-disk manifests to memory instead of simply rewriting the ones that need to be rewritten. if so, that should not be needed. We just need to load the existing manifest for a split in line 2030
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups you're right, this is wrong in that way
|
|
||
| let modified_splits = self | ||
| //let modified_splits = classified_chunks.keys().collect::<HashSet<_>>(); | ||
| // FIXME: this is another pass through all chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is a pass over all chunks in the changeset, so that's fine. IIRC our goal was to have flush scale with size of Changeset, not size of array
| self.change_set | ||
| .new_array_chunk_iterator(node_id, node_path, extent.clone()) | ||
| .new_array_chunk_iterator(node_id, node_path) | ||
| // FIXME: do we need to optimize this so we don't need multiple passes over all chunks calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just do the groupby into hashmaps once outside, and pass the appropriate iterator in?
No description provided.