You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/design/zarr-manifests.md
+38-9
Original file line number
Diff line number
Diff line change
@@ -198,15 +198,43 @@ following fields:
198
198
199
199
* Zarr version IDs equal the Zarr checksum
200
200
201
-
* Asset properties gain `zarr_version: str | null` field (absent or null if Zarr is not yet ingested or asset is not a Zarr)
202
-
- Not settable by client
203
-
- Mint new asset when version changes?
204
-
205
-
* Add `zarr_version` field to …/assets/path/ results
206
-
201
+
- `Zarr` model has `.checksum`
202
+
- (?) Not settable by client
203
+
- (?) Upon changes to zarr asset initiated, `Zarr.checksum` reset to None, which stays such until Zarr is finalized
204
+
- (?) Zarr should be denied new changes if `Zarr.checksum` is already None, and until it is finalized
205
+
- Make `/finalize` to return new Zarr checksum:
206
+
- might take awhile, so we might want to return some upload ID to be able to re-request checksum for specific upload
207
+
- at this point we have not minted yet a new asset!
208
+
- **Alternative**: do establish ZarrVersion
209
+
- `many-to-many` between `zarr_id` and `zarr_version`.
210
+
- `/finalize` would return new `zarr_version_id`
211
+
- **Alternatives**:
212
+
- PUT/PATCH/POST calls in API expecting `zarr_id` should be changed to provide `zarr_version_id` instead
213
+
- We just add `/zarr/{zarr_id}/{zarr_version_id}/` call which would return `checksum` for that version.
214
+
215
+
* Side discussion: new Zarr version/checksum compute is relatively expensive.
216
+
It could be "cheap" if we rely on prior manifest + changes (new files with checksums) or DELETEs. But it would require 'fsck' style re-check
217
+
and possibly "fixing" the version. Fragile since there would be no state to describe some prior state of Zarr to "checksum" it.
218
+
219
+
* To not change DB model, to not breed zarr specific DB model fields, rely on `metadata.digest.dandi:dandi-zarr-checksum` for Zarr checksum.
220
+
- Add `zarr_checksum` to `Zarr` model, but it must be just a convenience duplicate of the checksum in the metadata. But then some return of the API would need to be adjusted to return this dedicated `zarr_checksum` in addition to value in `metadata`
221
+
- We mint new asset when metadata changes, so new asset is produced when metadata record with a new version of Zarr (new checksum) is provided
222
+
- we verify that checksum is consistent with the the `checksum` of zarr_id provided
223
+
- NOTE: this means we would not be able to re-use versioned zarr from released version!
224
+
225
+
* …/assets/ results gain `zarr_checksum`
226
+
- they can only optionally contain `metadata` hence, we want to have `zarr_checksum` in the response
227
+
- Q: What is "Version" int returned now for each asset?
228
+
likely internal DB Version.id - unclear why it is in API response in such a form.
229
+
* …/assets/paths/ -- no change since point to `asset_id`
230
+
231
+
* …/assets/{asset_id}/download/ -- point to versioned version based on checksum in metadata
([...redirect /download/ for zarrs to webdav](https://github.com/dandi/dandi-archive/issues/1993))
207
234
* Zarr `contentUrl`s:
208
235
- Make API download URLs for Zarrs redirect to dandidav
209
-
- Replace S3 URLs with webdav.{archive_domain}/zarr/ URLs?
236
+
- Replace S3 URLs with `webdav.{archive_domain}/zarrs/{dir1}/{dir2}/{zarr_id}/{checksum}/` URLs
237
+
([...redirect /download/ for zarrs to webdav](https://github.com/dandi/dandi-archive/issues/1993)) ?
210
238
- Document needed changes to dandidav?
211
239
- The bucket for the Archive instance will now be given on the command line (only required if a custom/non-default API URL is given)
212
240
- The bucket's region will have to be looked up & stored before starting the webserver
@@ -220,9 +248,10 @@ following fields:
220
248
- The Zarr entry objects returned in `…/files/` responses (with & without `versions/{version_id}/`) will need to gain a `VersionId` field containing the S3 object version ID
221
249
- Nothing under /zarr/versions/ is writable over the API
222
250
223
-
* Publishing Zarrs: Just ensure that the `zarr_version` in Zarr assets is frozen and that no entries/S3 object versions from the referenced version are ever deleted ?
251
+
* Publishing Dandisets with Zarrs: Just ensure that no entries/S3 object versions from the referenced version are ever deleted (see GC section below)
252
+
224
253
225
-
#### Garbage collection
254
+
#### Garbage collection (GC)
226
255
227
256
* GC of Manifests: manifests older than X days (e.g. 30) can be deleted if not referenced by any Zarr asset (draft or published).
228
257
* GC of Manifests should trigger analysis/deletion of S3 objects based on their content:
0 commit comments