Skip to content

Conversation

@paraseba
Copy link
Collaborator

@paraseba paraseba commented Apr 3, 2025

Python's SnapshotInfo now also includes the Manifests info.

@paraseba paraseba requested review from dcherian and mpiannucci April 3, 2025 17:38
@paraseba paraseba force-pushed the push-xzkplxxuwkrr branch from 88eee6a to c2b8a4a Compare April 3, 2025 17:46
Python's `SnapshotInfo` now also includes the Manifests info.
@paraseba paraseba force-pushed the push-xzkplxxuwkrr branch from c2b8a4a to 2a04921 Compare April 3, 2025 17:48
written_at: val.flushed_at,
message: val.message,
metadata: val.metadata.into(),
manifests: val.manifests.into_iter().map(|v| v.into()).collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this potentially a really big list? that will now be listed our with ancestry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have SnapshotInfo.list_manifests() instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but, ancestry is very slow anyway, since it fetches each snapshot from object store. Gathering the few manifests cannot add much

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the list of manifests could get very long with split? Don't forget this is only done when you try to print a snapshot, not something you would do a lot, or would expect to be fast.

Copy link
Collaborator Author

@paraseba paraseba Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scratch, my latest about "print". What I should have said is: these objects get populated only by ancestry, which is very slow, and they are mostly used and discarded, I don't imagine gathering a few hundred extra objects will become a problem

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand now. My goal with this was trying to give us a bit more insight to debug, using the lookup_snapshot method. I like the idea of having a dedicated ancestry representation, that could be text in the console, html in a notebook, etc

Copy link
Collaborator Author

@paraseba paraseba Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An updating weather forecast repo might have O(100) manifests in each snapshot, one per array, for example.

But we are not fetching those manifests, only surfacing the data that is already in the snapshot, maybe 30 bytes per manifest or so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this definitely has an impact on the repr for snapshots. Do you want me to skip manifests there? we can do that

Copy link
Contributor

@dcherian dcherian Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm only concerned about the repr, because we use it for ancestry right now.

Yes, let's just comment it out from the repr for now.

This will help for my update to test_can_read_old.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f9baf05

@paraseba paraseba requested a review from dcherian April 3, 2025 20:23
@dcherian dcherian enabled auto-merge (squash) April 3, 2025 20:30
@dcherian dcherian merged commit c094590 into main Apr 3, 2025
7 of 8 checks passed
@dcherian dcherian deleted the push-xzkplxxuwkrr branch April 3, 2025 20:36
dcherian added a commit that referenced this pull request Apr 3, 2025
* main:
  Better `Debug` instances and __repr__ methods. (#891)
  Add chunk container repr, fix test dataset (#893)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants