Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate CRC32C faster #65

Closed
lizhanhui opened this issue Nov 9, 2023 · 8 comments
Closed

Calculate CRC32C faster #65

lizhanhui opened this issue Nov 9, 2023 · 8 comments

Comments

@lizhanhui
Copy link
Contributor

This crate uses CRC32C algorithm from crc crate to check and verify data integrity. Given that every byte written to disk and read from disk is verified to detect potential errors, this function is very hot.

Unfortunately, crc crate only employs table-lookup approach, leaving much performance space to optimize. One CPU core may achieve around 200MiB/s with the compiler's implicit optimization. See metrics from crc32fast. (Note, they are comparing CRC32, CRC32C differs in the polynomials only, perf metrics are close enough).

This repo offers good explanations to the optimization space ahead and this blog marches further, achiving performances all the way to some 50GiB/s per CPU core through using haredware accelerated SIMD intristics: _mm_crc32_u64 and _mm_clmulepi64_si128 in particular.

One more thing, zlib, chrome brower and Java hotspot VM uses the three-interleave-crc32x

With that said, let me know if you guys are intested in this issue and the potential improvement

@lizhanhui

This comment was marked as resolved.

@timvisee
Copy link
Member

Hi @lizhanhui. Yes, if a different CRC implementation gives better performance with the same result then such change would be very much appreciated.

I'd prefer to have the implementation in some crate though, versus building a wildly custom implementation inside our WAL codebase. Maybe crc32fast is already fast enough.

In any case, changes like this are always welcome!

@lizhanhui
Copy link
Contributor Author

lizhanhui commented Nov 10, 2023

Maybe crc32fast is already fast enough.

crc32fast is fast enough for use cases of CRC-32 only and I indeed use it in our project. Unfortunately, it does not support CRC-32C polynomial, aka, your use case.

I'd prefer to have the implementation in some crate though

All right, maybe it's a good idea to build a similar crate for CRC32C.

@timvisee
Copy link
Member

All right, maybe it's a good idea to build a similar crate for CRC32C.

Would it be worth to try and include it in crc32fast?

@lizhanhui
Copy link
Contributor Author

crc32fast targets 'Fast, SIMD-accelerated CRC32 (IEEE) checksum computation', not sure if it would expand its scope or not. I would drop a message there to check it out.

@lizhanhui
Copy link
Contributor Author

There is a 3-year-old thread discussing support of other polynomials, srijs/rust-crc32fast#9, but the maintainer has not yet responded.

@timvisee
Copy link
Member

There is a 3-year-old thread discussing support of other polynomials, srijs/rust-crc32fast#9, but the maintainer has not yet responded.

Thank you for checking. In that case it might make sense to work on a new crate.

By the way, if the actual implementation of this ends up being small then it's probably fine to implement it inside WAL. But yes, setting up a new crate will allow others to benefit from this as well.

@lizhanhui
Copy link
Contributor Author

While I am building a crate, I found this crate quite popularly adopted. Even if it still leaves much space to improve, I create a pull request for you guys to review: #66

@agourlay agourlay closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants