Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MD5 with CRC32 for 2x checksum speed #57

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

adiov
Copy link
Contributor

@adiov adiov commented Jan 6, 2025

While running this script on a large pool, I noticed that that the MD5 checksum calculation is taking up to 2-3 minutes on the large video project files. By changing to CRC32, I was able to cut down the checksum time by nearly 50% on most files.

For the purpose of adversarial integrity checking, CRC32 is not adequate, and neither is MD5. However, for the purpose of successful copying and corruption checking, CRC32 is adequate. Using MD5 for this script is burning double the CPU cycles for no gain.

@meo-pill
Copy link
Contributor

meo-pill commented Jan 6, 2025

For the use of awk i would advocate against because if you have a lot of small files it would have a big impact on the performance.
In my opinion it's better to use the % that is included in Bash

Ps : I speak about the removals of the extension at the end of the "checksum"

@adiov
Copy link
Contributor Author

adiov commented Jan 6, 2025

When running on 5000 small files, there is indeed a performance difference due to spinning up the awk subprocess, about 16% more time.

./zfs-inplace-rebalancing-noawk.sh --checksum true --passes 1   112.50s user 103.84s system 103% cpu 3:28.98 total
./zfs-inplace-rebalancing.sh --checksum true --passes 1 134.23s user 135.18s system 111% cpu 4:02.25 total

Since both Bash and zsh support it, it makes sense to reclaim that extra CPU time and use the % builtin.

@meo-pill meo-pill mentioned this pull request Jan 8, 2025
@markusressel
Copy link
Owner

Thx!

@markusressel markusressel merged commit 9b233fd into markusressel:master Jan 9, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants