Skip to content

Commit

Permalink
Merge pull request #224 from dandi/api-requests
Browse files Browse the repository at this point in the history
Add document on how the Archive API is used
  • Loading branch information
jwodder authored Feb 10, 2025
2 parents fb50c15 + 0ec52e5 commit 1ad4814
Showing 1 changed file with 212 additions and 0 deletions.
212 changes: 212 additions & 0 deletions doc/api-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
How `dandidav` Uses the DANDI Archive API
=========================================

*This document is up-to-date as of 2025 January 16.*

When `dandidav` receives a request for a path under `/dandisets/`, the requests
it makes to the DANDI Archive API for each type of `dandidav` request path are
as follows.

Note that there are two different approaches that `dandidav` uses for fetching
resources:

- When responding to a `GET` request or a `PROPFIND` request with a `Depth`
header of `1` (hereafter referred to as a "deep request"), `dandidav`
requests both the resource identified by the request path and (if that
resource is a collection) all of its immediate child resources.

- When responding to a `PROPFIND` request with a `Depth` header of `0`
(hereafter referred to as a "shallow request"), `dandidav` requests only the
resource identified by the request path, not any of its children.

Note that `dandidav` immediately rejects all `PROPFIND` requests with a `Depth`
header of `infinity`.

Dandiset Index
--------------

> **dandidav path:** `/dandisets/`
For deep requests, `dandidav` paginates over the API's `/dandisets/` endpoint.

For shallow requests, `dandidav` does not make any API requests.

Dandiset Top Level
------------------

> **dandidav path:** `/dandisets/{dandiset_id}/`
For both deep and shallow requests, `dandidav` makes an API request to `/dandisets/{dandiset_id}/`.

Releases Index
--------------

> **dandidav path:** `/dandisets/{dandiset_id}/releases/`
For deep requests, `dandidav` paginates over the API's `/dandisets/{dandiset_id}/versions/` endpoint.

For shallow requests, `dandidav` does not make any API requests.

Top Level of a Dandiset Version
-------------------------------

> **dandidav paths:**
>
> - `/dandiset/{dandiset_id}/draft/`
> - `/dandiset/{dandiset_id}/latest/`
> - `/dandiset/{dandiset_id}/releases/{version_id}/`
For both deep and shallow requests, if the request path is
`/dandiset/{dandiset_id}/latest/`, an initial request is made to
`/dandiset/{dandiset_id}/` to get the version ID of the latest published
version.

For deep requests, `dandidav` makes API requests to the following endpoints:

- `/dandisets/{dandiset_id}/versions/{version_id}/info/`
- `/dandisets/{dandiset_id}/versions/{version_id}/assets/paths/` (paginated)
- Because this endpoint only provides assets' paths, IDs, and sizes, for
each separate asset returned from this endpoint, `dandidav` makes another
request to
`/dandisets/{dandiset_id}/versions/{version_id}/assets/{asset_id}/info/`
to fetch further details about the asset (See ["Other
Notes"](#other-notes) below).
- Prior to [PR #236](https://github.com/dandi/dandidav/pull/236), a request was
also made to `/dandisets/{dandiset_id}/versions/{version_id}/` to fetch the
version metadata (so that its size could be reported), but this was changed
to use the metadata from the
`/dandisets/{dandiset_id}/versions/{version_id}/info/` request instead.

For shallow requests, `dandidav` makes an API request to
`/dandisets/{dandiset_id}/versions/{version_id}/info/`.

Metadata File
-------------

> **dandidav paths:**
>
> - `/dandiset/{dandiset_id}/draft/dandiset.yaml`
> - `/dandiset/{dandiset_id}/latest/dandiset.yaml`
> - `/dandiset/{dandiset_id}/releases/{version_id}/dandiset.yaml`
For both deep and shallow requests, if the request path is
`/dandiset/{dandiset_id}/latest/`, an initial request is made to
`/dandiset/{dandiset_id}/` to get the version ID of the latest published
version. Then, for both deep and shallow requests, `dandidav` makes an API
request to `/dandisets/{dandiset_id}/versions/{version_id}/`.

Asset Path
----------

> **dandidav paths:**
>
> - `/dandiset/{dandiset_id}/draft/{path}`
> - `/dandiset/{dandiset_id}/latest/{path}`
> - `/dandiset/{dandiset_id}/releases/{version_id}/{path}`
>
> Note that any trailing slashes at the end of `path` are ignored and are
> stripped before passing to the Archive.
For both deep and shallow requests, if the request path is
`/dandiset/{dandiset_id}/latest/`, an initial request is made to
`/dandiset/{dandiset_id}/` to get the version ID of the latest published
version.

Then, for each initial subpath of `path` that ends with either (a) a component
ending in ".zarr" or ".ngff" (case insensitive) or (b) the end of the path, an
API request is made to
`/dandisets/{dandiset_id}/versions/{version_id}/assets/?path={subpath}&metadata=true&order=path`,
which is paginated through until one of the following occurs:

- An asset whose path equals `subpath` is found. In this case:

- If the asset is a blob:

- If `subpath` is the whole of `path`, `dandidav` has found a blob
asset, and the checking of initial subpaths terminates.

- If `subpath` is not the whole of `path`, then the user requested a
path under a blob, and so no more requests are made to the Archive,
and `dandidav` returns a 404 response.

- If the asset is a Zarr, `dandidav` has found a Zarr asset, and the
checking of initial subpaths terminates. If `subpath` is not the whole
of `path`, a request is made to S3 to fetch information about the
resource at the path equal to the remainder.

- An asset whose path is under the directory `{subpath}/` is found. In this
case, the next initial subpath is checked. If there is no next initial
subpath, then `dandidav` has found an asset folder at `path`.

- An asset whose path is lexicographically after `{subpath}/` is found, or no
assets were returned by the API. In this case, no more requests are made to
the Archive, and `dandidav` returns a 404 response.

Shallow requests stop making requests to the Archive at this point. Deep
requests continue as follows:

- If an asset folder was found at path `path`, a paginated request is made to
`/dandisets/{dandiset_id}/versions/{version_id}/assets/paths/?path_prefix={path}/`.
Because this endpoint only provides assets' paths, IDs, and sizes, for each
separate asset returned from this endpoint, `dandidav` makes another request
to `/dandisets/{dandiset_id}/versions/{version_id}/assets/{asset_id}/info/`
to fetch further details about the asset (See ["Other Notes"](#other-notes)
below).

- If a blob asset was found, no more requests are made to the Archive.

- If a Zarr asset was found, further requests are made to S3 to fetch
information about the Zarr's entries.

Other Notes
-----------

- Paginated requests to the Archive use the default `page_size` value and are
requested serially.

- `dandidav` requires the following information from the Archive about each
type of resource, not always at the same time. Required information about
the relationships between resources is not listed here.

- Dandisets:
- identifier (to serve as the resource name)
- creation & modification timestamps

- Versions:
- identifier (to serve as the resource name)
- size
- creation & modification timestamps
- metadata (to serve the virtual `dandiset.yaml` resources)

- Asset folders:
- path/name

- Assets:
- path/name
- ID (for computing the metadata URL to display in the web view)
- blob ID and Zarr ID (to determine the asset type)
- size
- creation & modification timestamps
- metadata:

- `digest` — For blob assets, the `"dandi:dandi-etag"` digest is
used as the resource's ETag in `PROPFIND` responses

- `contentUrl` — As a reminder, this array contains both an Archive
download URL and a direct S3 URL. Which one is used depends on
the asset type and the situation:

- Blob assets: When rendering links in the web (HTML) view, the
Archive download URL is used, as this results in a redirect
to a signed S3 URL that sets the Content-Disposition header,
thereby ensuring that the user's browser downloads the asset
to a file of the same name. When responding to `GET`
requests for blob assets, either URL may be used, depending
on the presence of the `--prefer-s3-redirects` command-line
option.

- Zarr assets: The S3 URL is parsed by `dandidav` to locate the
Zarr's entries on S3.

- `encodingFormat` — Used as the resource's Content Type in
`PROPFIND` responses

0 comments on commit 1ad4814

Please sign in to comment.