Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc for Zarr versioning/publishing support via Zarr Manifest Files #1892
base: master
Are you sure you want to change the base?
Design doc for Zarr versioning/publishing support via Zarr Manifest Files #1892
Changes from all commits
1baa092
a676d35
d1836b4
705d029
52368df
df500f1
573e72b
1e15c75
d2658a4
a60a6a7
0ea7dc1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After versions support added above, here should follow description on what is to happen for publishing dandisets which have zarrs in them.
We need to review/analyze what should now happen for zarr records or assets so we capture version (checksum) information for a zarr whenever it becomes part of the released dandiset. In case of blobs it is easy since blobs are not mutable. But with zarrs, since zarr could have multiple versions - we would need to make sure that published asset has versionId for zarr which would not be changed, whenever that asset zarr would be modified in draft version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to ourselves: requests to modify a Zarr from a previous non draft version might be "hard to impossible" since would cause "race condition" between different versions pretty much if modified in parallel or otherwise very inefficient since would require large "diff" uploads. Pretty much it would boil down to have the zarr in its mutable form assigned to just a single path in a single dandiset (like now), as it must not then be changed from multiple dandisets/locations. But then it could still reside in multiple dandisets though, and even published in that original version!
Alternatives (just thinking out loud):
finalize
saving patched manifest without doing full sweep of the bucket. cons: a more complex implementation (??? may be not) - zarr operation must be completed "in full"; version of zarr on s3 "as is" might not be a legit zarr to be used directly if ever modified for multiple versions; need to be thought through better; pros: support to modify any version of zarr; (much) more efficientfinalize
since would just modify prior manifest with changes without doing full sweep of the prefix.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another "mind strike" which relates to above: we need association of zarr to a dandiset for editing to ensure ownership/rights to modify which is somewhat different for blobs that we do not allow modifications, thus people just upload new ones. Overall feels like we need some way to distinguish a "canonical asset of a zarr" (which can still be modified) ... more thinking needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yarikoptic
What are you talking about? A "non draft version" is a published version, and published versions and their contents can't be modified.
Each Zarr is already associated with a Dandiset. You can see a Zarr's Dandiset by requesting
/zarr/{zarr_id}/
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea/hope was that we can "break" the need for association with a particular dandiset. Then the same zarr could be present in multiple dandisets, and thus versions of "draft" version could diverge in two dandisets and changes in one dandiset to the same zarr could "race" with changes in another dandiset.
Problem indeed should not manifest itself if we keep zarr associated with just a single dandiset. But that is "suboptimal" since would disallow (cheap) creating dandisets with assets (content) from another. And we already had such use cases. At least for "read-only" "mix-in" of zarrs from other dandisets... so we should see if we could support that through these proposed changes.