Skip to content

Commit 1e15c75

Browse files
committed
More to Zarr versioning design: need some "ZarrVersion" or "Upload" id; can reuse metadata
1 parent 573e72b commit 1e15c75

File tree

1 file changed

+38
-9
lines changed

1 file changed

+38
-9
lines changed

doc/design/zarr-manifests.md

+38-9
Original file line numberDiff line numberDiff line change
@@ -198,15 +198,43 @@ following fields:
198198
199199
* Zarr version IDs equal the Zarr checksum
200200
201-
* Asset properties gain `zarr_version: str | null` field (absent or null if Zarr is not yet ingested or asset is not a Zarr)
202-
- Not settable by client
203-
- Mint new asset when version changes?
204-
205-
* Add `zarr_version` field to …/assets/path/ results
206-
201+
- `Zarr` model has `.checksum`
202+
- (?) Not settable by client
203+
- (?) Upon changes to zarr asset initiated, `Zarr.checksum` reset to None, which stays such until Zarr is finalized
204+
- (?) Zarr should be denied new changes if `Zarr.checksum` is already None, and until it is finalized
205+
- Make `/finalize` to return new Zarr checksum:
206+
- might take awhile, so we might want to return some upload ID to be able to re-request checksum for specific upload
207+
- at this point we have not minted yet a new asset!
208+
- **Alternative**: do establish ZarrVersion
209+
- `many-to-many` between `zarr_id` and `zarr_version`.
210+
- `/finalize` would return new `zarr_version_id`
211+
- **Alternatives**:
212+
- PUT/PATCH/POST calls in API expecting `zarr_id` should be changed to provide `zarr_version_id` instead
213+
- We just add `/zarr/{zarr_id}/{zarr_version_id}/` call which would return `checksum` for that version.
214+
215+
* Side discussion: new Zarr version/checksum compute is relatively expensive.
216+
It could be "cheap" if we rely on prior manifest + changes (new files with checksums) or DELETEs. But it would require 'fsck' style re-check
217+
and possibly "fixing" the version. Fragile since there would be no state to describe some prior state of Zarr to "checksum" it.
218+
219+
* To not change DB model, to not breed zarr specific DB model fields, rely on `metadata.digest.dandi:dandi-zarr-checksum` for Zarr checksum.
220+
- Add `zarr_checksum` to `Zarr` model, but it must be just a convenience duplicate of the checksum in the metadata. But then some return of the API would need to be adjusted to return this dedicated `zarr_checksum` in addition to value in `metadata`
221+
- We mint new asset when metadata changes, so new asset is produced when metadata record with a new version of Zarr (new checksum) is provided
222+
- we verify that checksum is consistent with the the `checksum` of zarr_id provided
223+
- NOTE: this means we would not be able to re-use versioned zarr from released version!
224+
225+
* …/assets/ results gain `zarr_checksum`
226+
- they can only optionally contain `metadata` hence, we want to have `zarr_checksum` in the response
227+
- Q: What is "Version" int returned now for each asset?
228+
likely internal DB Version.id - unclear why it is in API response in such a form.
229+
* …/assets/paths/ -- no change since point to `asset_id`
230+
231+
* …/assets/{asset_id}/download/ -- point to versioned version based on checksum in metadata
232+
* `webdav.{archive_domain}/zarrs/{dir1}/{dir2}/{zarr_id}/{checksum}/` URLs
233+
([...redirect /download/ for zarrs to webdav](https://github.com/dandi/dandi-archive/issues/1993))
207234
* Zarr `contentUrl`s:
208235
- Make API download URLs for Zarrs redirect to dandidav
209-
- Replace S3 URLs with webdav.{archive_domain}/zarr/ URLs?
236+
- Replace S3 URLs with `webdav.{archive_domain}/zarrs/{dir1}/{dir2}/{zarr_id}/{checksum}/` URLs
237+
([...redirect /download/ for zarrs to webdav](https://github.com/dandi/dandi-archive/issues/1993)) ?
210238
- Document needed changes to dandidav?
211239
- The bucket for the Archive instance will now be given on the command line (only required if a custom/non-default API URL is given)
212240
- The bucket's region will have to be looked up & stored before starting the webserver
@@ -220,9 +248,10 @@ following fields:
220248
- The Zarr entry objects returned in `…/files/` responses (with & without `versions/{version_id}/`) will need to gain a `VersionId` field containing the S3 object version ID
221249
- Nothing under /zarr/versions/ is writable over the API
222250
223-
* Publishing Zarrs: Just ensure that the `zarr_version` in Zarr assets is frozen and that no entries/S3 object versions from the referenced version are ever deleted ?
251+
* Publishing Dandisets with Zarrs: Just ensure that no entries/S3 object versions from the referenced version are ever deleted (see GC section below)
252+
224253
225-
#### Garbage collection
254+
#### Garbage collection (GC)
226255
227256
* GC of Manifests: manifests older than X days (e.g. 30) can be deleted if not referenced by any Zarr asset (draft or published).
228257
* GC of Manifests should trigger analysis/deletion of S3 objects based on their content:

0 commit comments

Comments
 (0)