Skip to content

Conversation

@dcherian
Copy link
Contributor

@dcherian dcherian commented Feb 21, 2025

  • does the config get serialized properly?
  • Add ManifestSplitCondition.AnyArray
  • python tests for Or, And
  • Add docs
  • real-world benchmark; test with ERA5
  • add ndim based condition (3D vs 4D) (if someone asks for it)

Minimal docs here: https://icechunk--767.org.readthedocs.build/en/767/icechunk-python/performance/

I rewrote the ERA5 manifests to put 1 year per manifest (~9000 chunks); This gets us 3X speedup.

image

pub struct ManifestShards(Vec<ManifestExtents>);

impl ManifestShards {
pub fn default(ndim: usize) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, but it is certainly tied to ndim.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ManifestSplits is an enum to avoid this?

enum ManifestSplits {
   Single,
   Multiple(Vec<ManifestExtents>)
}

What I don't like is the empty vector. I wonder if Rust has a NonEmptyVec type, otherwise, a trick people use is:

...
   Multiple{ first: ManifestExtents, rest: Vec<ManifestExtents>}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I don't need the default any more. It was an artifact that appeared because I implemented the core logic before wiring up the config. Now the default gets set when parsing the config using the Array Metadata
image

}

pub fn contains(&self, coord: &[u32]) -> bool {
self.iter().zip(coord.iter()).all(|(range, that)| range.contains(that))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to start checking on writes that indexes have the proper size for the metadata

@dcherian dcherian force-pushed the split-manifests branch 2 times, most recently from e7d9221 to 09476a4 Compare March 6, 2025 23:02
@dcherian dcherian force-pushed the split-manifests branch from bc67218 to 9954cda Compare May 8, 2025 03:14
@dcherian dcherian marked this pull request as ready for review May 8, 2025 03:30
@dcherian dcherian requested a review from paraseba May 8, 2025 03:30
@dcherian dcherian force-pushed the split-manifests branch from fe64bda to 0891269 Compare May 8, 2025 04:26
@dcherian dcherian force-pushed the split-manifests branch from 0891269 to b5812d5 Compare May 8, 2025 04:31
Ok(())
}

// #[tokio::test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@dcherian dcherian enabled auto-merge (squash) May 13, 2025 15:16
@dcherian dcherian merged commit 9b78cf6 into main May 13, 2025
8 checks passed
@dcherian dcherian deleted the split-manifests branch May 13, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants