-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Zarr Directory Updates in Dandi and LINC #1474
Comments
I think there are two separate, but related issues here (and solving 2. depends on solving 1. first):
|
to boil down/implement desired convenience. NB upon trying different URI schemas I found that there is a "workaround side-effect" if path is used as a glob (might not be generally applicable/desired), then we would get leading path too ❯ dandi download https://dandiarchive.org/dandisets/000027/versions/0.210831.2033/assets/\?glob\=sub-RAT123/sub-RAT123.nwb
PATH SIZE DONE DONE% CHECKSUM STATUS MESSAGE
sub-RAT123/sub-RAT123.nwb 18.8 kB 18.8 kB 100% ok done
Summary: 18.8 kB 18.8 kB 1 done
100.00%
❯ datalad clone https://github.com/dandisets/000027
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://github.com/dandisets/000027/config download failed: Not Found
[INFO ] access to 2 dataset siblings dandi-dandisets-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/tmp/000027" enable -s SIBLING
install(ok): /tmp/000027 (dataset)
❯ cd 000027
❯ datalad get sub-RAT123/sub-RAT123.nwb
get(ok): sub-RAT123/sub-RAT123.nwb (file) [from web...]
❯ ls -lL sub-RAT123/sub-RAT123.nwb
-r--r--r-- 1 yoh yoh 18792 Jul 18 07:51 sub-RAT123/sub-RAT123.nwb
# now edit / dandi upload
For an "ultimate" solution, we need to add some basic zarr navigator, related to make it easier for a user to get desired "full" URL to specific zarr component. As for update of metadata only it would be quite tricky AFAIK to implement correctly but indeed editing metadata is a valid use case. ATM it is 'possible' only via full zarr download, and I believe we would avoid reuploading any file which was not modified (@jwodder might correct me if I am wrong).
|
Thanks team. Moving this issue to the DANDI Client repo, as it doesn't seem like we would need changes to the web app or REST API. |
Cc @dstansby @kabilar @satra @yarikoptic @waxlamp @balbasty
In the LINC project, @dstansby encountered a scenario where an update was requested for a portion of a Zarr directory.
Currently, DANDI and LINC treat a Zarr directory as a single object tree, requiring the entire directory to be downloaded even for updates that only modify specific pieces.
Downloading the entire Zarr directory can be inefficient, especially for large datasets where only a small portion needs updating.
This issue's purpose is to capture the need for mechanism to allow for partial updates of Zarr directories within Dandi and LINC.
Analagous, @satra suggested the initial usage of zarrita to explore elements of sharding, with perhaps the LINC project as a place to test
The text was updated successfully, but these errors were encountered: