You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is there demand/use-case to target here?
FWIW - md5 is chosen since it is the one used by AWS for ETag compute so we then
stay "similar" to other aspects of checksumming for S3
can use ETag of individual files if we know that they came from non-multipart upload. I believe we rely on that in our zarr access via manifests approach to computer remote zarr checksum.
I'm not proposing change the default behavior, I think that should stay as md5, to match S3's implementation (as that is the initial reason for choosing md5). However, this conversation in the zarr-python repo highlighted someone's need for this tool, but with a different hashing algorithm.
Since this seems like it would be a common use case, and in that thread we got a pseudo-endorsement from one of the zarr-python contributors to use this package (since this functionality doesn't currently exist in zarr), I think it would be worthwhile to generalize the algorithm in a backwards compatible way.
This will probably have to wait until higher priority things have been addressed in DANDI, although I might take some of own time to poke around at this, since it interests me.
Currently
md5
is assumed to be the choice of checksum algorithm, but we should allow for the user to supply their own algorithm if they so choose.The text was updated successfully, but these errors were encountered: