-
Notifications
You must be signed in to change notification settings - Fork 61
Extend storage stats calculation to include virtual and inline chunks #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 22 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
824fec5
define rust struct
TomNicholas e63ab97
expose struct in python
TomNicholas cc78013
add python docstrings
TomNicholas 772d4e5
extend repo_chunks_storage calculation to include virtual and native …
TomNicholas 72cbd45
fully deduplicate virtual chunks when counting by using (url, offset,…
TomNicholas 7cc2da6
factor out helper function
TomNicholas f8fadcf
Merge branch 'main' into virtual-storage-stats
TomNicholas 1669a5b
compile icechunk-python
TomNicholas 7264f32
rewire existing python method to use new method
TomNicholas 710f0e0
add async method
TomNicholas ce7cdd5
update outdated comment
TomNicholas 6e55996
add python deprecation warnings
TomNicholas 39d382f
restrict to pub(crate)
TomNicholas 3b910e2
correct bug in calculation of non-virtual bytes
TomNicholas 68589c6
only use (location, offset, length) as an identifier fo de-duplicatin…
TomNicholas 6fa763f
add todo comment about cloning
TomNicholas 03fbac5
use the inner pattern to wrap the rust-only struct inside the python-…
TomNicholas 262567c
correct name of python class
TomNicholas 1c71151
expose __add__ method in python
TomNicholas 76226be
cargo formatting for python crate too
TomNicholas 57cbc69
rust tests
TomNicholas c532eee
correct python return type
TomNicholas b71c195
actually correct python return type
TomNicholas 3d3ece4
call async method in python
TomNicholas f7b0578
update .pyi file with new methods
TomNicholas 641f80a
Merge branch 'main' into virtual-storage-stats
TomNicholas 1049a69
Merge branch 'main' into virtual-storage-stats
TomNicholas d53554c
Merge branch 'virtual-storage-stats' of https://github.com/earth-move…
TomNicholas 138b025
python linting
TomNicholas 14b8ad3
explicitly specify stacklevel for warning
TomNicholas e3bc18c
expect warnings in tests
TomNicholas 67849f6
Merge branch 'main' into virtual-storage-stats
TomNicholas c9e7284
Merge branch 'main' into virtual-storage-stats
TomNicholas 794b560
Merge branch 'main' into virtual-storage-stats
TomNicholas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,6 +45,7 @@ use crate::{ | |
| errors::PyIcechunkStoreError, | ||
| impl_pickle, | ||
| session::PySession, | ||
| stats::PyChunkStorageStats, | ||
| streams::PyAsyncGenerator, | ||
| }; | ||
|
|
||
|
|
@@ -2215,59 +2216,62 @@ impl PyRepository { | |
| }) | ||
| } | ||
|
|
||
| pub(crate) fn total_chunks_storage( | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't bother trying to do a deprecation cycle in the rust crate. |
||
| pub(crate) fn chunk_storage_stats( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you are missing to mirror this change in the |
||
| &self, | ||
| py: Python<'_>, | ||
| max_snapshots_in_memory: NonZeroU16, | ||
| max_compressed_manifest_mem_bytes: NonZeroUsize, | ||
| max_concurrent_manifest_fetches: NonZeroU16, | ||
| ) -> PyResult<u64> { | ||
| ) -> PyResult<PyChunkStorageStats> { | ||
| // This function calls block_on, so we need to allow other thread python to make progress | ||
| py.detach(move || { | ||
| let result = | ||
| let stats = | ||
| pyo3_async_runtimes::tokio::get_runtime().block_on(async move { | ||
| let asset_manager = { | ||
| let lock = self.0.read().await; | ||
| Arc::clone(lock.asset_manager()) | ||
| }; | ||
| let result = repo_chunks_storage( | ||
| let stats = repo_chunks_storage( | ||
| asset_manager, | ||
| max_snapshots_in_memory, | ||
| max_compressed_manifest_mem_bytes, | ||
| max_concurrent_manifest_fetches, | ||
| ) | ||
| .await | ||
| .map_err(PyIcechunkStoreError::RepositoryError)?; | ||
| Ok::<_, PyIcechunkStoreError>(result) | ||
| Ok::<_, PyIcechunkStoreError>(stats) | ||
| })?; | ||
|
|
||
| Ok(result) | ||
| Ok(stats.into()) | ||
| }) | ||
| } | ||
|
|
||
| fn total_chunks_storage_async<'py>( | ||
| pub(crate) fn chunk_storage_stats_async<'py>( | ||
| &'py self, | ||
| py: Python<'py>, | ||
| max_snapshots_in_memory: NonZeroU16, | ||
| max_compressed_manifest_mem_bytes: NonZeroUsize, | ||
| max_concurrent_manifest_fetches: NonZeroU16, | ||
| ) -> PyResult<Bound<'py, PyAny>> { | ||
| let repository = self.0.clone(); | ||
| pyo3_async_runtimes::tokio::future_into_py::<_, u64>(py, async move { | ||
| let asset_manager = { | ||
| let lock = repository.read().await; | ||
| Arc::clone(lock.asset_manager()) | ||
| }; | ||
| let result = repo_chunks_storage( | ||
| asset_manager, | ||
| max_snapshots_in_memory, | ||
| max_compressed_manifest_mem_bytes, | ||
| max_concurrent_manifest_fetches, | ||
| ) | ||
| .await | ||
| .map_err(PyIcechunkStoreError::RepositoryError)?; | ||
| Ok(result) | ||
| }) | ||
| pyo3_async_runtimes::tokio::future_into_py::<_, PyChunkStorageStats>( | ||
| py, | ||
| async move { | ||
| let asset_manager = { | ||
| let lock = repository.read().await; | ||
| Arc::clone(lock.asset_manager()) | ||
| }; | ||
| let stats = repo_chunks_storage( | ||
| asset_manager, | ||
| max_snapshots_in_memory, | ||
| max_compressed_manifest_mem_bytes, | ||
| max_concurrent_manifest_fetches, | ||
| ) | ||
| .await | ||
| .map_err(PyIcechunkStoreError::RepositoryError)?; | ||
| Ok(stats.into()) | ||
| }, | ||
| ) | ||
| } | ||
|
|
||
| #[pyo3(signature = (snapshot_id, *, pretty = true))] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| use pyo3::{pyclass, pymethods}; | ||
|
|
||
| use icechunk::ops::stats::ChunkStorageStats; | ||
|
|
||
| /// Statistics about chunk storage across different chunk types. | ||
| #[pyclass(name = "ChunkStorageStats")] | ||
| #[derive(Clone, Debug)] | ||
| pub(crate) struct PyChunkStorageStats { | ||
| inner: ChunkStorageStats, | ||
| } | ||
|
|
||
| impl From<ChunkStorageStats> for PyChunkStorageStats { | ||
| fn from(stats: ChunkStorageStats) -> Self { | ||
| Self { inner: stats } | ||
| } | ||
| } | ||
|
|
||
| #[pymethods] | ||
| impl PyChunkStorageStats { | ||
| /// Total bytes stored in native chunks (stored in icechunk's chunk storage) | ||
| #[getter] | ||
| pub(crate) fn native_bytes(&self) -> u64 { | ||
| self.inner.native_bytes | ||
| } | ||
|
|
||
| /// Total bytes stored in virtual chunks (references to external data) | ||
| #[getter] | ||
| pub(crate) fn virtual_bytes(&self) -> u64 { | ||
| self.inner.virtual_bytes | ||
| } | ||
|
|
||
| /// Total bytes stored in inline chunks (stored directly in manifests) | ||
| #[getter] | ||
| pub(crate) fn inlined_bytes(&self) -> u64 { | ||
| self.inner.inlined_bytes | ||
| } | ||
|
|
||
| /// Total bytes excluding virtual chunks. | ||
| /// | ||
| /// This represents the approximate size of all objects stored in the | ||
| /// icechunk repository itself (native chunks plus inline chunks). | ||
| /// Virtual chunks are not included since they reference external data. | ||
| /// | ||
| /// Returns: | ||
| /// int: The sum of native_bytes and inlined_bytes | ||
| pub(crate) fn non_virtual_bytes(&self) -> u64 { | ||
| self.inner.non_virtual_bytes() | ||
| } | ||
|
|
||
| /// Total bytes across all chunk types. | ||
| /// | ||
| /// Returns the sum of native_bytes, virtual_bytes, and inlined_bytes. | ||
| /// This represents the total size of all data referenced by the repository, | ||
| /// including both data stored in icechunk and external virtual references. | ||
| /// | ||
| /// Returns: | ||
| /// int: The sum of all chunk storage bytes | ||
| pub(crate) fn total_bytes(&self) -> u64 { | ||
| self.inner.total_bytes() | ||
| } | ||
|
|
||
| pub(crate) fn __repr__(&self) -> String { | ||
| format!( | ||
| "ChunkStorageStats(native_bytes={}, virtual_bytes={}, inlined_bytes={})", | ||
| self.inner.native_bytes, self.inner.virtual_bytes, self.inner.inlined_bytes | ||
| ) | ||
| } | ||
|
|
||
| pub(crate) fn __add__(&self, other: &PyChunkStorageStats) -> PyChunkStorageStats { | ||
| PyChunkStorageStats { inner: self.inner + other.inner } | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to call the
_asyncversion here?