Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload is inconsistent with NWB Zarr files #1520

Open
mavaylon1 opened this issue Nov 6, 2024 · 4 comments
Open

Upload is inconsistent with NWB Zarr files #1520

mavaylon1 opened this issue Nov 6, 2024 · 4 comments

Comments

@mavaylon1
Copy link

I am able to upload a small zarr file to DANDI (~56Mb). However, I am trying to upload a file (~34Gb) and at first I am able to reach 92% upload after a span of 36 hrs. I got the error shown. I assume it was a time out issue so I ran it again. I was told that if I ran the same command I did prior (dandi upload --existing-refresh) that it should continue from the last checkpoint.

I got the same error the second time, but now at 32%

Questions:

  1. Is that 32% of what is left over or did it start up again?
  2. Why is this happening and would decreasing the number of files (increasing the chunk size) in the zarr store help (currently at 1.5 million)
Screenshot 2024-11-05 at 6 05 48 PM
@mavaylon1
Copy link
Author

@yarikoptic

@satra
Copy link
Member

satra commented Nov 6, 2024

@mavaylon1 - how many objects in the zarr file and what's the rough size of the objects? we generally recommend that you chunk the zarr such that each chunk is in the MB range rather than KB.

as an fyi, instead of using the zarr python library, if you use the tensorstore library from google you should also be able to write sharded v3 zarrs.

@kabilar
Copy link
Member

kabilar commented Nov 6, 2024

Hi @mavaylon1, can you also please provide the two log files? Thanks.

@mavaylon1
Copy link
Author

@satra There are roughly 1.5 million files in the store. We did some digging and each chunk is incredibly small at .02 MB. We are not sure why that is what hdf5 used as default size (we exported to zarr which keeps the chunk size of what was in hdf5).

We also agree on rechunking to a larger chunk size (~16Mb).

@kabilar what two log files btw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants