-
Notifications
You must be signed in to change notification settings - Fork 171
[cryptography/bloomfilter] generic Hasher and optimizations #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | cd4768f | Jan 09 2026, 06:03 PM |
|
cryptography/src/bloomfilter/mod.rs
Outdated
| /// | ||
| /// The number of bits will be rounded up to the next power of 2. | ||
| pub fn new(hashers: NonZeroU8, bits: NonZeroUsize) -> Self { | ||
| let bits = bits.get().next_power_of_two(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to just round up to the next power of two rather than crashing here. Let me know if you prefer asserting.
Deploying monorepo with
|
| Latest commit: |
cd4768f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://85126068.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://andre-bloomfilter-improvemen.monorepo-eu0.pages.dev |
0ebf814 to
5589649
Compare
|
bugbot run |
patrick-ogrady
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments
cryptography/src/bloomfilter/mod.rs
Outdated
| // Extract two 64-bit hash values from the SHA256 digest of the item | ||
| let digest = Sha256::hash(item); | ||
| let h1 = u64::from_be_bytes(digest[0..8].try_into().unwrap()); | ||
| let h2 = u64::from_be_bytes(digest[8..16].try_into().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the rationale for u64 instead? I opted for u128 for security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u64 arithmetic is faster than u128. My rationale was that bloom filters do not rely on collision resistance or large hash domains. All entropy beyond log2(m) bits is discarded during indexing, so using u128 does not improve security or false-positive behavior. The cryptographic hash should already provide more than enough uniformity.
| let digest = Sha256::hash(item); | ||
| let h1 = u64::from_be_bytes(digest[0..8].try_into().unwrap()); | ||
| let mut h2 = u64::from_be_bytes(digest[8..16].try_into().unwrap()); | ||
| h2 |= 1; // make sure h2 is non-zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to be the unluckiest person in the universe for this to happen, but the Kirsch-Mitzenmacher optimization assumes that h2 is non-zero.
|
I updated the benchmarks, here's a comparison of the impact of the optimizations: The optimizations have a big impact for small inputs size, for larger inputs computing the hash will dominate. Either way, this is consistently faster on all benchmarks. |
dd56e19 to
700db56
Compare
|
bugbot run |
|
bugbot run |
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #2729 +/- ##
==========================================
+ Coverage 92.91% 93.27% +0.36%
==========================================
Files 363 372 +9
Lines 108104 114877 +6773
==========================================
+ Hits 100449 107157 +6708
- Misses 7655 7720 +65
... and 159 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This PR makes the
BloomFiltergeneric overHasher, defaulting to Sha256.To support practical bloom filter sizing, a new
with_rateconstructor computes the optimal number of bits and hash functions for a given expected number of items and target false positive rate. Internally, this uses Q16.16 fixed-point arithmetic to ensure deterministic results across platforms, important for determinism. This is the recommended way to construct bloom filters in most cases, as it automatically balances space efficiency against the desired accuracy.The index calculation has been optimized to use 64-bit arithmetic instead of 128-bit, and the bit count is now always rounded up to the next power of two. This allows using a bitmask instead of modulo for index calculation, which is faster.
Benchmarks were added to measure that the optimizations above work as expected. They compare Sha256 and Blake3 performance across different item sizes (32, 2048, 4096 bytes) and false positive rates (10%, 0.1%).