Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support xxHash algorithm #409

Open
bhagemeier opened this issue Oct 20, 2022 · 4 comments
Open

Support xxHash algorithm #409

bhagemeier opened this issue Oct 20, 2022 · 4 comments

Comments

@bhagemeier
Copy link

Hi there,

at Juelich Supercomputing Centre, we've recently been researching convenient tools to generate and verify hash sums of large collections of data. The amounts we're typically talking about are in the area of several TB to PB. We've found hashdeep to be convenient and providing a good interface including parallelisation options that may be important to checksum and verify many small files.

We've also come across the xxHash algorithm, which has been specifically designed to create checksums over extremely large amounts of data.

We have found the commandline tools provided for xxHash to lack some functionality offered by hashdeep. Therefore, we propose to integrate xxHash into hashdeep to improve the support for use cases dealing with extremely large volumes of data. Moreover, we also support the idea of integrating Blake3, as mentioned in #397.

In the spirit of Open Source, we do offer our full support in doing the integration ourselves, but would like to learn about your willingness to include the code in the main branch afterwards. Additionally, if there were good reasons to omit algorithms such as xxHash or Blake3, please let us know about them.

In order to support our request in numbers, here's a comparison of various algorithms supported in hashdeep and xxHash on a 155GB data set of two files.

Tool Duration Speed (approx.)
xxHash 36s 4.3GB/s
hashdeep (default md5 and sha256) 564s 275MB/s
hashdeep (md5) 184s 840MB/s
hashdeep (sha1) 294s 530MB/s
hashdeep (tiger) 272s 570MB/s
hashdeep (whirlpool) 789s 200MB/s
hashdeep (mmap,md5,sha256) 629s 250MB/s

As you can see, xxHash it at least 5 times faster than the fastest algorithm supported by hashdeep.

@keybreak
Copy link

keybreak commented Aug 9, 2023

Still a very much needed feature!
@jessek any plans for it?

@bhagemeier
Copy link
Author

We have someone working on it now. The performance gain is not yet as much as we would have expected. Please stay tuned for updates.

@keybreak
Copy link

The performance gain is not yet as much as we would have expected.

That's weird...Hopefully it will be optimized! 👍

@oneEyedCharlie
Copy link

We have someone working on it now. The performance gain is not yet as much as we would have expected. Please stay tuned for updates.

How is your project going along? I am CPU bottle-necked using hashdeep, and would greatly love a "xxhashdeep" or similar. Even small improvements would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants