Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize checksum algorithm #56

Open
jjnesbitt opened this issue Mar 25, 2024 · 2 comments
Open

Generalize checksum algorithm #56

jjnesbitt opened this issue Mar 25, 2024 · 2 comments
Assignees

Comments

@jjnesbitt
Copy link
Member

Currently md5 is assumed to be the choice of checksum algorithm, but we should allow for the user to supply their own algorithm if they so choose.

@jjnesbitt jjnesbitt self-assigned this Mar 25, 2024
@yarikoptic
Copy link
Member

is there demand/use-case to target here?
FWIW - md5 is chosen since it is the one used by AWS for ETag compute so we then

  • stay "similar" to other aspects of checksumming for S3
  • can use ETag of individual files if we know that they came from non-multipart upload. I believe we rely on that in our zarr access via manifests approach to computer remote zarr checksum.

@jjnesbitt
Copy link
Member Author

I'm not proposing change the default behavior, I think that should stay as md5, to match S3's implementation (as that is the initial reason for choosing md5). However, this conversation in the zarr-python repo highlighted someone's need for this tool, but with a different hashing algorithm.

Since this seems like it would be a common use case, and in that thread we got a pseudo-endorsement from one of the zarr-python contributors to use this package (since this functionality doesn't currently exist in zarr), I think it would be worthwhile to generalize the algorithm in a backwards compatible way.

This will probably have to wait until higher priority things have been addressed in DANDI, although I might take some of own time to poke around at this, since it interests me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants