-
Notifications
You must be signed in to change notification settings - Fork 170
[cryptography] Introduce RBSR
#2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…odule Implement the Negentropy protocol for efficient set reconciliation between two participants. This is useful for synchronizing datasets without transmitting the entire set. Key features: - Item type with timestamp and 32-byte ID for sorting - FingerprintAccumulator using addition mod 2^256 (more collision-resistant than XOR) - Storage trait with VecStorage implementation - Reconciler for driving the reconciliation protocol - Full codec support (Read/Write) for all types - Comprehensive test suite The protocol uses a divide-and-conquer approach: 1. Exchange fingerprints of ranges 2. Recursively split non-matching ranges 3. Directly transmit IDs when ranges become small enough 4. Track have/need IDs for final synchronization Round-trips scale logarithmically with set size: O(log_B(N) / 2)
Refactor RBSR module to use the existing sha256::Digest type instead of raw [u8; 32] arrays for item IDs. This provides: - Better type safety and integration with crate abstractions - Clearer semantic meaning (IDs are cryptographic digests) - Consistent API with other cryptographic types Added Item::from_bytes() helper for convenient creation from raw arrays.
…alization Blake3 is faster than SHA-256 while maintaining equivalent security. This is a minor performance optimization since fingerprints are computed frequently during reconciliation.
Switch from simple addition mod 2^256 to LtHash for fingerprint computation. LtHash is a lattice-based homomorphic hash providing ~200 bits of security, compared to addition mod 2^256 which can be broken in ~28 hours with sufficient resources. The count is still tracked separately and included in the final fingerprint for additional protection against attacks where an attacker can only write to one side of a network split.
- Remove FingerprintAccumulator struct, use LtHash type alias - Remove Fingerprint struct, use blake3::Digest type alias - Use Blake3 digest for both item IDs and fingerprints - Remove count from fingerprint (LtHash's ~200 bits of security makes it unnecessary) - Use full 32-byte fingerprints (no truncation) This removes ~140 lines while maintaining the same security properties.
Protocol fix: - When initiator receives IdList, it now replies with its own IdList (not Skip) so responder can also compute differences - When responder receives IdList, it replies Skip (completing exchange) New tests: - test_bound_codec: Verifies Bound serialization/deserialization - test_empty_fingerprint: Verifies empty range fingerprints - test_vec_storage_remove: Tests item removal - test_vec_storage_contains_id: Tests ID lookup - test_message_version_mismatch: Tests codec error handling - test_reconciler_version_check: Tests reconciler error handling - test_large_set_exact_differences: Verifies all differences are found Test improvements: - test_different_sets_reconciliation now verifies exact differences found by both sides
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | 229394c | Jan 02 2026, 07:55 PM |
cryptography/src/rbsr/mod.rs
Outdated
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub struct Message { | ||
| /// Protocol version | ||
| pub version: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
|
|
||
| /// Compute fingerprint over items in range [start_idx, end_idx). | ||
| fn fingerprint(&self, start_idx: usize, end_idx: usize) -> Fingerprint { | ||
| let mut acc = FingerprintAccumulator::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use LtHash here
cryptography/src/rbsr/mod.rs
Outdated
| /// Default branching factor for range splitting. | ||
| pub const DEFAULT_BRANCHING_FACTOR: usize = 16; | ||
|
|
||
| /// An item in the set, consisting of a timestamp and a 32-byte ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "and a Digest`
cryptography/src/rbsr/mod.rs
Outdated
| } | ||
|
|
||
| /// Create a new item from a timestamp and raw bytes. | ||
| pub const fn from_bytes(timestamp: u64, id: [u8; ID_SIZE]) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just use .into() on new
|
|
||
| /// A reconciliation message containing ranges. | ||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub struct Message { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace Message with Vec<Range>
cryptography/src/rbsr/mod.rs
Outdated
| InvalidMessage(&'static str), | ||
| /// Protocol version mismatch | ||
| #[error("unsupported protocol version: {0}")] | ||
| UnsupportedVersion(u8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
| //! 4. Recursively splitting ranges where fingerprints differ | ||
| //! 5. Directly transmitting items once ranges become small enough | ||
| //! | ||
| //! # Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| /// | ||
| /// Fingerprints use the full 32-byte [LtHash] checksum, providing ~200 bits of | ||
| /// security against collision attacks. | ||
| pub type Fingerprint = Digest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove type alias
| /// Uses [LtHash] for ~200 bits of security against collision attacks. LtHash is a | ||
| /// lattice-based homomorphic hash that is significantly more secure than simple | ||
| /// addition mod 2^256 (which can be broken in ~28 hours with sufficient resources). | ||
| pub type FingerprintAccumulator = LtHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove type alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be moved to some other crate (maybe storage) because it doesn't actually define new cryptographic mechanisms)?
Deploying monorepo with
|
| Latest commit: |
229394c
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://3bc392bc.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://claude-rbsr-set-reconciliati.monorepo-eu0.pages.dev |
Items are identified by ID only, not (timestamp, ID). Timestamps are purely for ordering/partitioning. Added tests demonstrating: - Same IDs with different timestamps are considered identical - Mixed scenarios with shared IDs and unique IDs
- Renamed 'timestamp' to 'hint' throughout (clearer that it's an ordering key) - Updated documentation to explain hint can be timestamps, block heights, etc. - Switched hint encoding from fixed u64 to Varint for smaller messages - Updated all tests to use new terminology
- Remove version field from Message (just encode Vec<Range>) - Simplify Bound to use Option<Digest> instead of Vec<u8> prefix - Remove version-related tests and error variant - Use Varint for range count in Message
Revert Bound from Option<Digest> to Vec<u8> id_prefix for smaller messages. The prefix can be truncated to the minimum bytes needed for uniqueness, reducing message size from 33 bytes per bound (1 byte for Option discriminant + 32 for digest) to variable 1-33 bytes.
|
|
||
| impl Write for Message { | ||
| fn write(&self, buf: &mut impl BufMut) { | ||
| UInt(self.ranges.len() as u64).write(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can just encode vec directly
|
|
||
| /// Error type for reconciliation operations. | ||
| #[derive(Debug, thiserror::Error)] | ||
| pub enum Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
- Rename have_ids/need_ids to missing_locally/missing_remotely for clarity - Update module docs to emphasize bi-directional nature - Both participants discover items to send and receive - Protocol is symmetric; only difference is who initiates
Each side independently discovers and tracks what they're missing. The remote peer runs their own reconciler and will request items they need - we don't need to track that for them. Changes: - Remove missing_remotely from Reconciler (renamed missing_locally to missing) - Simplify ReconciliationSet to only track missing items and sources - Add ReconciliationSet for n-ary peer aggregation - Remove conformance test for Item (not fixed-size due to varint)
9051f1a to
a4f0a07
Compare
| UInt(self.hint).write(buf); | ||
| match &self.id { | ||
| Some(id) => { | ||
| (ID_SIZE as u8).write(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just encode id directly
| } | ||
| Self::IdList(ids) => { | ||
| Self::MODE_ID_LIST.write(buf); | ||
| (ids.len() as u32).write(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just encode vec
| } | ||
| Self::MODE_ID_LIST => { | ||
| let count = u32::read(buf)? as usize; | ||
| let mut ids = Vec::with_capacity(count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just decode vec
| Ok(Self::Fingerprint(fp)) | ||
| } | ||
| Self::MODE_ID_LIST => { | ||
| let count = u32::read(buf)? as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just hardcode branching factor
cryptography/src/rbsr/mod.rs
Outdated
| } | ||
|
|
||
| /// Check if storage contains an item with the given ID. | ||
| pub fn contains_id(&self, id: &Digest) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: contains
| /// reconciler and discovers what they're missing independently. | ||
| pub struct Reconciler<'a, S: Storage> { | ||
| storage: &'a S, | ||
| branching_factor: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make an associated const?
Items are identified by ID only - hints are just for ordering. When comparing ID lists, we must check if the ID exists anywhere in storage, not just within the current range bounds. Otherwise we'd incorrectly report an item as missing if it exists with a different hint.
When multiple adjacent ranges are resolved (matching fingerprints or completed ID exchanges), we now merge them into a single Skip range instead of sending individual Skip markers for each. This significantly reduces message size when peers have mostly overlapping sets.
…og n) ID lookups IndexedStorage uses: - BTreeSet for O(log n) contains_id (vs O(n) linear scan) - Prefix-sum LtHash array for O(1) range fingerprints (vs O(n) recompute) Also adds LtHash::difference() for subtracting accumulator states. Call rebuild() after batch mutations to update indices.
…nt cache CachedStorage stores LtHash checkpoints at regular intervals (configurable) instead of at every item. This dramatically reduces memory usage: - 1M items with interval=1000: ~1000 checkpoints (2MB) vs 1M prefix sums (2GB) - Fingerprint queries are O(K) where K is interval, instead of O(1) - Checkpoints can be reused across multiple peer reconciliations Replaces the memory-intensive IndexedStorage with a practical solution for production use with large datasets and many peers.
Remove VecStorage and rename CachedStorage to MemStorage for simplicity. The MemStorage type provides: - BTreeSet for O(log n) ID lookups via contains_id() - Checkpoint-based fingerprint caching for O(K) range queries - Memory usage of O(n + n/K) for items plus checkpoints All tests updated to use MemStorage::default().
Adds test_multi_checkpoint_fingerprints which verifies that fingerprint computation works correctly when queries span multiple checkpoints. Tests various scenarios: - Aligned vs unaligned boundaries - Single checkpoint vs multi-checkpoint ranges - Edge cases (empty ranges, single items) Compares checkpoint-based computation against naive iteration to ensure the LtHash difference/combine logic is correct.
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #2665 +/- ##
==========================================
- Coverage 92.62% 92.60% -0.03%
==========================================
Files 357 358 +1
Lines 102956 104060 +1104
==========================================
+ Hits 95366 96367 +1001
- Misses 7590 7693 +103
... and 9 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
An invaluable primitive for a "multi-master", immutable database (where we assume all databases receive ~all data at ~roughly the same time).
If the indexes of all items are known, we can use
ordinalwith some procedure overrmap