-
Notifications
You must be signed in to change notification settings - Fork 61
Extend storage stats calculation to include virtual and inline chunks #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…chunks (double-counting virtual chunks that don't have checksums)
… length) tuple if checksum not available
icechunk/src/ops/stats.rs
Outdated
| checksum.clone(), | ||
| virtual_ref.length, | ||
| &mut virtual_bytes, | ||
| )?; | ||
| } else { | ||
| // No checksum: deduplicate by (location, offset, length) | ||
| let virtual_identifier = ( | ||
| virtual_ref.location.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I think it's possible to remove these clones (by passing a reference and then only actually cloning if needed), but it comes at the cost of an additional if. Maybe worth it though to minimize memory usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left a todo for this. It might be worthwhile to run it on a really big virtual repo before merging just to see what the memory usage is like.
|
I added some tests, including of virtual chunk deduplication. |
| }) | ||
| } | ||
|
|
||
| pub(crate) fn total_chunks_storage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't bother trying to do a deprecation cycle in the rust crate.
| DeprecationWarning, | ||
| ) | ||
|
|
||
| stats = await self._repository.chunk_storage_stats( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to call the _async version here?
| } | ||
|
|
||
| pub(crate) fn total_chunks_storage( | ||
| pub(crate) fn chunk_storage_stats( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are missing to mirror this change in the .pyi file.
…r/icechunk into virtual-storage-stats
…#1483) * define rust struct * expose struct in python * add python docstrings * extend repo_chunks_storage calculation to include virtual and native chunks (double-counting virtual chunks that don't have checksums) * fully deduplicate virtual chunks when counting by using (url, offset, length) tuple if checksum not available * factor out helper function * compile icechunk-python * rewire existing python method to use new method * add async method * update outdated comment * add python deprecation warnings * restrict to pub(crate) * correct bug in calculation of non-virtual bytes * only use (location, offset, length) as an identifier fo de-duplicating virtual chunks * add todo comment about cloning * use the inner pattern to wrap the rust-only struct inside the python-exposed one * correct name of python class * expose __add__ method in python * cargo formatting for python crate too * rust tests * correct python return type * actually correct python return type * call async method in python * update .pyi file with new methods * python linting * explicitly specify stacklevel for warning * expect warnings in tests
* Extend storage stats calculation to include virtual and inline chunks (#1483) * define rust struct * expose struct in python * add python docstrings * extend repo_chunks_storage calculation to include virtual and native chunks (double-counting virtual chunks that don't have checksums) * fully deduplicate virtual chunks when counting by using (url, offset, length) tuple if checksum not available * factor out helper function * compile icechunk-python * rewire existing python method to use new method * add async method * update outdated comment * add python deprecation warnings * restrict to pub(crate) * correct bug in calculation of non-virtual bytes * only use (location, offset, length) as an identifier fo de-duplicating virtual chunks * add todo comment about cloning * use the inner pattern to wrap the rust-only struct inside the python-exposed one * correct name of python class * expose __add__ method in python * cargo formatting for python crate too * rust tests * correct python return type * actually correct python return type * call async method in python * update .pyi file with new methods * python linting * explicitly specify stacklevel for warning * expect warnings in tests * resolve merge conflicts properly * more merge fixes * more merge fixes * update paths for bundled test repo
Closes #1478.
Compiles (at least the core rust crate does), but I haven't tested it yet.
EDIT: Ready