Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support upload of Zarr-backend NWB files #1310

Open
CodyCBakerPhD opened this issue Jul 17, 2023 · 7 comments · May be fixed by #1312
Open

[Feature] Support upload of Zarr-backend NWB files #1310

CodyCBakerPhD opened this issue Jul 17, 2023 · 7 comments · May be fixed by #1312
Labels

Comments

@CodyCBakerPhD
Copy link
Contributor

Possibly related to #1307, but specific to NWB format files using the Zarr-backend

I'd like to be able to upload a .nwb file written using PyNWB+HDMF-Zarr to the DANDI archive, but the dandi upload command was unable to recognize the file at all, and didn't even warn that it had been found and skipped for some reason

An example file for testing purposes may be found here, which was forced through using devel options, specifically --allow-any-path

@CodyCBakerPhD CodyCBakerPhD changed the title [Feature] Support upload of Zarr-backed NWB files [Feature] Support upload of Zarr-backend NWB files Jul 17, 2023
@CodyCBakerPhD
Copy link
Contributor Author

Some work may also be needed with representation of NWB assets for Zarr back-end - no 'i' info button appears on the asset, and the API also fails to recognize the file as an asset, but rather every individual item blob is its own asset (this I had initially expected given the underlying structures of the Zarr store - but on Slack Roni had indicated that each Zarr chunk was not supposed to be a separate AssetBlob, which is what we are seeing below)

from dandi.dandiapi import DandiAPIClient

client = DandiAPIClient(api_url="https://api-staging.dandiarchive.org/api")
dandiset = client.get_dandiset(dandiset_id="204919")

dandiset.get_asset_by_path(path="test_read_nwbfile/test_hdf5.nwb") 

works as expected, but

dandiset.get_asset_by_path(path="test_read_nwbfile/test_zarr.nwb")

gives

ValueError                                Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/dandi/dandiapi.py:1155, in RemoteDandiset.get_asset_by_path(self, path)
   1152 try:
   1153     # Weed out any assets that happen to have the given path as a
   1154     # proper prefix:
-> 1155     (asset,) = (
   1156         a for a in self.get_assets_with_path_prefix(path) if a.path == path
   1157     )
   1158 except ValueError:

ValueError: not enough values to unpack (expected 1, got 0)

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
Cell In[21], line 1
----> 1 dandiset.get_asset_by_path(path="test_read_nwbfile/test_zarr.nwb")

File /opt/conda/lib/python3.10/site-packages/dandi/dandiapi.py:1159, in RemoteDandiset.get_asset_by_path(self, path)
   1155     (asset,) = (
   1156         a for a in self.get_assets_with_path_prefix(path) if a.path == path
   1157     )
   1158 except ValueError:
-> 1159     raise NotFoundError(f"No asset at path {path!r}")
   1160 else:
   1161     return asset

NotFoundError: No asset at path 'test_read_nwbfile/test_zarr.nwb'

and if I do

list(dandiset.get_assets())

I see

[RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='fd8e3782-b0c7-4bd5-89fe-e2acc0263744', path='test_read_nwbfile/test_hdf5.nwb', size=197512, created=datetime.datetime(2023, 7, 17, 15, 31, 55, 641893, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 778333, tzinfo=datetime.timezone.utc), blob='6a61bab5-0662-49e5-be46-0b9ee9a27297', dandiset_id='204919', version_id='0.230717.1558'),
 RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='a78dfc02-9cd5-402a-83c8-5006fb18d5e8', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/data/0.0', size=46, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 173503, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 787050, tzinfo=datetime.timezone.utc), blob='1419744b-36f6-4c28-a850-71d381fc90e5', dandiset_id='204919', version_id='0.230717.1558'),
 RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='cd9faf76-cb4e-4849-b9eb-c838958676d1', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/electrodes/0', size=56, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 215932, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 795464, tzinfo=datetime.timezone.utc), blob='e8131c7e-095d-4242-ab4c-1658c8c3f5c5', dandiset_id='204919', version_id='0.230717.1558'),
 RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='383ece04-8db0-4207-843a-86109259a5cd', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/starting_time/0', size=24, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 222857, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 909428, tzinfo=datetime.timezone.utc), blob='a1f46f4a-d8ec-4183-bd8c-8ed530e963e4', dandiset_id='204919', version_id='0.230717.1558'),
 RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='871186e8-ac63-4c5e-b914-8b9246f7326a', path='test_read_nwbfile/test_zarr.nwb/file_create_date/0', size=56, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 253174, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 806273, tzinfo=datetime.timezone.utc), blob='9d7115fb-3133-437d-9168-7058e8fd84b6', dandiset_id='204919', version_id='0.230717.1558'),

....

and so on (the entire NWB file content listed out as separate blobs)

@CodyCBakerPhD
Copy link
Contributor Author

The context the asset ID part is that I want to be able to stream the content using fsspec just like with HDF5 files

PyNWB can easily do this given the S3 asset of the HDF5, so I had thought that it would be just as easy if I had the asset ID of the Zarr folder (the 'test_zarr.nwb' file)

@satra
Copy link
Member

satra commented Jul 17, 2023

@CodyCBakerPhD - i'm pretty positive what's happening here is the non-recognition of zarr on the CLI side and hence it's simply using the non-zarr route, which then the server interprets as individual blobs. so a fix on the CLI side that treats it as zarr would fix it. can you simply try adding the .zarr extension to test?

@CodyCBakerPhD
Copy link
Contributor Author

Well, that is interesting...

Making a copy of the file with the name test_zarr.nwb.zarr (also confirmed same behavior with test_zarr.zarr) allows for dandi upload to appear as expected

image

however, nothing new appears on the dandiset view: https://gui-staging.dandiarchive.org/dandiset/204919/0.230717.1558/files?location=test_read_nwbfile%2F

or the API requests.

I also confirmed the asset made it to the bucket by attempting re-upload, to which it responds by saying the file already exists and so does not re-upload it

@satra
Copy link
Member

satra commented Jul 17, 2023

@CodyCBakerPhD - you have stumped me. perhaps @AlmightyYakob has an answer to why that asset doesn't show up.

@jjnesbitt
Copy link
Member

The file is present, the link you provided points to a previously published version, and so won't show any files uploaded to the draft verison. You can see the file here: https://gui-staging.dandiarchive.org/dandiset/204919/draft/files?location=test_read_nwbfile

@CodyCBakerPhD
Copy link
Contributor Author

@AlmightyYakob Aha, yes that was it! Thank you for the sanity check

Would this workflow perhaps 'simply work' if I just naively add ".nwb" to the list of accepted Zarr entities? I'll try that out locally and see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants