Skip to content

Commit f1b4a62

Browse files
committed
Outline
1 parent f4bfc3e commit f1b4a62

File tree

1 file changed

+52
-7
lines changed

1 file changed

+52
-7
lines changed

doc/design/zarr-manifests.md

+52-7
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,30 @@ where noted, the manifest file format defined herein matches the format used by
1212
the proof of concept.
1313

1414

15-
Archive Behavior
16-
----------------
15+
Creating & Storing Manifest Files
16+
---------------------------------
1717

1818
Whenever Dandi Archive calculates the checksum for a Zarr in the Archive, it
1919
shall additionally produce a *manifest file* listing various information about
2020
the Zarr and its entries in the format described in the next section. This
2121
manifest file shall be stored in the Archive's S3 bucket at the path
22-
`zarr-manifest/{zarr_id}/{checksum}.json`, where `{zarr_id}` is replaced by the
23-
ID of the Zarr and `{checksum}` is replaced by the Dandi Zarr checksum of the
24-
Zarr at that point in time. The manifest file shall be world-readable, unless
25-
the Zarr is embargoed or belongs to an embargoed Dandiset, in which case
26-
appropriate steps shall be taken to limit read access to the file.
22+
`zarr-manifest/{dir1}/{dir2}/{zarr_id}/{checksum}.json`, where:
23+
24+
- `{dir1}` is replaced by the first three characters of the Zarr ID
25+
- `{dir2}` is replaced by the next three characters of the Zarr ID
26+
- `{zarr_id}` is replaced by the ID of the Zarr
27+
- `{checksum}` is replaced by the Dandi Zarr checksum of the Zarr at that point
28+
in time
29+
30+
This directory structure (a) will allow `dandidav` to change the data source
31+
for its `/zarr/` hierarchy from the proof-of-concept to the S3 bucket with
32+
minimal code changes and (b) ensures that the number of entries within each
33+
directory in the bucket under `zarr-manifest/` is not colossal, thereby
34+
avoiding tremendous resource usage by `dandidav`.
35+
36+
The manifest file shall be world-readable, unless the Zarr is embargoed or
37+
belongs to an embargoed Dandiset, in which case appropriate steps shall be
38+
taken to limit read access to the file.
2739

2840
Manifest files shall also be generated for all Zarrs already in the Archive
2941
when this feature is first implemented.
@@ -152,3 +164,36 @@ following fields:
152164
> - A `zarrChecksumMismatch` field inside the `statistics` object, used to
153165
> store the checksum that the API reports for a Zarr when it disagrees with
154166
> the checksum calculated by the manifest-generation code
167+
168+
169+
Archive API Changes
170+
-------------------
171+
172+
***WIP***
173+
174+
* Zarr version IDs equal the Zarr checksum
175+
176+
* Asset properties gain `zarr_version: str | null` field (absent or null if Zarr is not yet ingested or asset is not a Zarr)
177+
- Not settable by client
178+
- Mint new asset when version changes?
179+
180+
* Add `zarr_version` field to …/assets/path/ results
181+
182+
* Zarr `contentUrl`s:
183+
- Make API download URLs for Zarrs redirect to dandidav
184+
- Replace S3 URLs with webdav.{archive_domain}/zarr/ URLs?
185+
- Document needed changes to dandidav?
186+
- The bucket for the Archive instance will now be given on the command line (only required if a custom/non-default API URL is given)
187+
- The bucket's region will have to be looked up & stored before starting the webserver
188+
- Zarrs under `/dandisets/` will no longer determine their S3 location via `contentUrl`; instead, they will combine the Archive's bucket & region with the Zarr ID in the asset properties (templated into "zarr/{zarr_id}/")
189+
190+
* Getting specific Zarr versions & their files from API endpoints
191+
- `GET /zarr/versions/` (paginated)
192+
- `GET /zarr/versions/{version_id}/` ?
193+
- `GET /zarr/versions/{version_id}/files/[?prefix=...]` (paginated)
194+
- The Zarr entry objects returned in `…/files/` responses (with & without `versions/{version_id}/`) will need to gain a `VersionId` field containing the S3 object version ID
195+
- Nothing under /zarr/versions/ is writable over the API
196+
197+
* Publishing Zarrs: Just ensure that the `zarr_version` in Zarr assets is frozen and that no entries/S3 object versions from the referenced version are ever deleted ?
198+
199+
* Does garbage collection of old Zarr versions need to be discussed?

0 commit comments

Comments
 (0)