[cryptography] Introduce `RBSR` #2665

patrick-ogrady · 2026-01-01T23:41:10Z

An invaluable primitive for a "multi-master", immutable database (where we assume all databases receive ~all data at ~roughly the same time).

If the indexes of all items are known, we can use ordinal with some procedure over rmap

…odule Implement the Negentropy protocol for efficient set reconciliation between two participants. This is useful for synchronizing datasets without transmitting the entire set. Key features: - Item type with timestamp and 32-byte ID for sorting - FingerprintAccumulator using addition mod 2^256 (more collision-resistant than XOR) - Storage trait with VecStorage implementation - Reconciler for driving the reconciliation protocol - Full codec support (Read/Write) for all types - Comprehensive test suite The protocol uses a divide-and-conquer approach: 1. Exchange fingerprints of ranges 2. Recursively split non-matching ranges 3. Directly transmit IDs when ranges become small enough 4. Track have/need IDs for final synchronization Round-trips scale logarithmically with set size: O(log_B(N) / 2)

Refactor RBSR module to use the existing sha256::Digest type instead of raw [u8; 32] arrays for item IDs. This provides: - Better type safety and integration with crate abstractions - Clearer semantic meaning (IDs are cryptographic digests) - Consistent API with other cryptographic types Added Item::from_bytes() helper for convenient creation from raw arrays.

…alization Blake3 is faster than SHA-256 while maintaining equivalent security. This is a minor performance optimization since fingerprints are computed frequently during reconciliation.

Switch from simple addition mod 2^256 to LtHash for fingerprint computation. LtHash is a lattice-based homomorphic hash providing ~200 bits of security, compared to addition mod 2^256 which can be broken in ~28 hours with sufficient resources. The count is still tracked separately and included in the final fingerprint for additional protection against attacks where an attacker can only write to one side of a network split.

- Remove FingerprintAccumulator struct, use LtHash type alias - Remove Fingerprint struct, use blake3::Digest type alias - Use Blake3 digest for both item IDs and fingerprints - Remove count from fingerprint (LtHash's ~200 bits of security makes it unnecessary) - Use full 32-byte fingerprints (no truncation) This removes ~140 lines while maintaining the same security properties.

Protocol fix: - When initiator receives IdList, it now replies with its own IdList (not Skip) so responder can also compute differences - When responder receives IdList, it replies Skip (completing exchange) New tests: - test_bound_codec: Verifies Bound serialization/deserialization - test_empty_fingerprint: Verifies empty range fingerprints - test_vec_storage_remove: Tests item removal - test_vec_storage_contains_id: Tests ID lookup - test_message_version_mismatch: Tests codec error handling - test_reconciler_version_check: Tests reconciler error handling - test_large_set_exact_differences: Verifies all differences are found Test improvements: - test_different_sets_reconciliation now verifies exact differences found by both sides

cloudflare-workers-and-pages · 2026-01-01T23:41:20Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	commonware-mcp	`229394c`	Jan 02 2026, 07:55 PM

patrick-ogrady · 2026-01-01T23:43:30Z

cryptography/src/rbsr/mod.rs

+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct Message {
+    /// Protocol version
+    pub version: u8,


remove this

patrick-ogrady · 2026-01-01T23:43:56Z

cryptography/src/rbsr/mod.rs

+
+    /// Compute fingerprint over items in range [start_idx, end_idx).
+    fn fingerprint(&self, start_idx: usize, end_idx: usize) -> Fingerprint {
+        let mut acc = FingerprintAccumulator::new();


just use LtHash here

patrick-ogrady · 2026-01-01T23:46:48Z

cryptography/src/rbsr/mod.rs

+/// Default branching factor for range splitting.
+pub const DEFAULT_BRANCHING_FACTOR: usize = 16;
+
+/// An item in the set, consisting of a timestamp and a 32-byte ID.


nit: "and a Digest`

patrick-ogrady · 2026-01-01T23:47:17Z

cryptography/src/rbsr/mod.rs

+    }
+
+    /// Create a new item from a timestamp and raw bytes.
+    pub const fn from_bytes(timestamp: u64, id: [u8; ID_SIZE]) -> Self {


nit: just use .into() on new

patrick-ogrady · 2026-01-01T23:47:43Z

cryptography/src/rbsr/mod.rs

+
+/// A reconciliation message containing ranges.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct Message {


replace Message with Vec<Range>

patrick-ogrady · 2026-01-01T23:47:54Z

cryptography/src/rbsr/mod.rs

+    InvalidMessage(&'static str),
+    /// Protocol version mismatch
+    #[error("unsupported protocol version: {0}")]
+    UnsupportedVersion(u8),


nit: remove

patrick-ogrady · 2026-01-01T23:50:55Z

cryptography/src/rbsr/mod.rs

+//! 4. Recursively splitting ranges where fingerprints differ
+//! 5. Directly transmitting items once ranges become small enough
+//!
+//! # Properties


patrick-ogrady · 2026-01-01T23:51:13Z

cryptography/src/rbsr/mod.rs

+///
+/// Fingerprints use the full 32-byte [LtHash] checksum, providing ~200 bits of
+/// security against collision attacks.
+pub type Fingerprint = Digest;


remove type alias

patrick-ogrady · 2026-01-01T23:51:19Z

cryptography/src/rbsr/mod.rs

+/// Uses [LtHash] for ~200 bits of security against collision attacks. LtHash is a
+/// lattice-based homomorphic hash that is significantly more secure than simple
+/// addition mod 2^256 (which can be broken in ~28 hours with sufficient resources).
+pub type FingerprintAccumulator = LtHash;


remove type alias

patrick-ogrady · 2026-01-02T00:14:51Z

cryptography/src/rbsr/mod.rs

This should probably be moved to some other crate (maybe storage) because it doesn't actually define new cryptographic mechanisms)?

cloudflare-workers-and-pages · 2026-01-02T00:24:42Z

Deploying monorepo with Cloudflare Pages

Latest commit:	`229394c`
Status:	✅ Deploy successful!
Preview URL:	https://3bc392bc.monorepo-eu0.pages.dev
Branch Preview URL:	https://claude-rbsr-set-reconciliati.monorepo-eu0.pages.dev

View logs

Items are identified by ID only, not (timestamp, ID). Timestamps are purely for ordering/partitioning. Added tests demonstrating: - Same IDs with different timestamps are considered identical - Mixed scenarios with shared IDs and unique IDs

- Renamed 'timestamp' to 'hint' throughout (clearer that it's an ordering key) - Updated documentation to explain hint can be timestamps, block heights, etc. - Switched hint encoding from fixed u64 to Varint for smaller messages - Updated all tests to use new terminology

- Remove version field from Message (just encode Vec<Range>) - Simplify Bound to use Option<Digest> instead of Vec<u8> prefix - Remove version-related tests and error variant - Use Varint for range count in Message

Revert Bound from Option<Digest> to Vec<u8> id_prefix for smaller messages. The prefix can be truncated to the minimum bytes needed for uniqueness, reducing message size from 33 bytes per bound (1 byte for Option discriminant + 32 for digest) to variable 1-33 bytes.

patrick-ogrady · 2026-01-02T02:26:42Z

cryptography/src/rbsr/mod.rs

+
+impl Write for Message {
+    fn write(&self, buf: &mut impl BufMut) {
+        UInt(self.ranges.len() as u64).write(buf);


nit: can just encode vec directly

patrick-ogrady · 2026-01-02T02:27:35Z

cryptography/src/rbsr/mod.rs

+
+/// Error type for reconciliation operations.
+#[derive(Debug, thiserror::Error)]
+pub enum Error {


cryptography/src/rbsr/mod.rs

- Rename have_ids/need_ids to missing_locally/missing_remotely for clarity - Update module docs to emphasize bi-directional nature - Both participants discover items to send and receive - Protocol is symmetric; only difference is who initiates

Each side independently discovers and tracks what they're missing. The remote peer runs their own reconciler and will request items they need - we don't need to track that for them. Changes: - Remove missing_remotely from Reconciler (renamed missing_locally to missing) - Simplify ReconciliationSet to only track missing items and sources - Add ReconciliationSet for n-ary peer aggregation - Remove conformance test for Item (not fixed-size due to varint)

…ec<u8>

patrick-ogrady · 2026-01-02T18:38:50Z

cryptography/src/rbsr/mod.rs

+        UInt(self.hint).write(buf);
+        match &self.id {
+            Some(id) => {
+                (ID_SIZE as u8).write(buf);


remove this

just encode id directly

patrick-ogrady · 2026-01-02T18:39:23Z

cryptography/src/rbsr/mod.rs

+            }
+            Self::IdList(ids) => {
+                Self::MODE_ID_LIST.write(buf);
+                (ids.len() as u32).write(buf);


just encode vec

patrick-ogrady · 2026-01-02T18:39:31Z

cryptography/src/rbsr/mod.rs

+            }
+            Self::MODE_ID_LIST => {
+                let count = u32::read(buf)? as usize;
+                let mut ids = Vec::with_capacity(count);


just decode vec

patrick-ogrady · 2026-01-02T18:39:53Z

cryptography/src/rbsr/mod.rs

+                Ok(Self::Fingerprint(fp))
+            }
+            Self::MODE_ID_LIST => {
+                let count = u32::read(buf)? as usize;


just hardcode branching factor

patrick-ogrady · 2026-01-02T18:40:32Z

cryptography/src/rbsr/mod.rs

+    }
+
+    /// Check if storage contains an item with the given ID.
+    pub fn contains_id(&self, id: &Digest) -> bool {


nit: contains

patrick-ogrady · 2026-01-02T18:40:46Z

cryptography/src/rbsr/mod.rs

+/// reconciler and discovers what they're missing independently.
+pub struct Reconciler<'a, S: Storage> {
+    storage: &'a S,
+    branching_factor: usize,


can make an associated const?

Items are identified by ID only - hints are just for ordering. When comparing ID lists, we must check if the ID exists anywhere in storage, not just within the current range bounds. Otherwise we'd incorrectly report an item as missing if it exists with a different hint.

When multiple adjacent ranges are resolved (matching fingerprints or completed ID exchanges), we now merge them into a single Skip range instead of sending individual Skip markers for each. This significantly reduces message size when peers have mostly overlapping sets.

…og n) ID lookups IndexedStorage uses: - BTreeSet for O(log n) contains_id (vs O(n) linear scan) - Prefix-sum LtHash array for O(1) range fingerprints (vs O(n) recompute) Also adds LtHash::difference() for subtracting accumulator states. Call rebuild() after batch mutations to update indices.

…nt cache CachedStorage stores LtHash checkpoints at regular intervals (configurable) instead of at every item. This dramatically reduces memory usage: - 1M items with interval=1000: ~1000 checkpoints (2MB) vs 1M prefix sums (2GB) - Fingerprint queries are O(K) where K is interval, instead of O(1) - Checkpoints can be reused across multiple peer reconciliations Replaces the memory-intensive IndexedStorage with a practical solution for production use with large datasets and many peers.

Remove VecStorage and rename CachedStorage to MemStorage for simplicity. The MemStorage type provides: - BTreeSet for O(log n) ID lookups via contains_id() - Checkpoint-based fingerprint caching for O(K) range queries - Memory usage of O(n + n/K) for items plus checkpoints All tests updated to use MemStorage::default().

Adds test_multi_checkpoint_fingerprints which verifies that fingerprint computation works correctly when queries span multiple checkpoints. Tests various scenarios: - Aligned vs unaligned boundaries - Single checkpoint vs multi-checkpoint ranges - Edge cases (empty ranges, single items) Compares checkpoint-based computation against naive iteration to ensure the LtHash difference/combine logic is correct.

codecov · 2026-01-02T20:34:21Z

Codecov Report

❌ Patch coverage is 90.18789% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.60%. Comparing base (c5c573e) to head (229394c).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
cryptography/src/rbsr/mod.rs	90.13%	94 Missing ⚠️

@@            Coverage Diff             @@
##             main    #2665      +/-   ##
==========================================
- Coverage   92.62%   92.60%   -0.03%     
==========================================
  Files         357      358       +1     
  Lines      102956   104060    +1104     
==========================================
+ Hits        95366    96367    +1001     
- Misses       7590     7693     +103

Files with missing lines	Coverage Δ
cryptography/src/lib.rs	`100.00% <ø> (ø)`
cryptography/src/lthash/mod.rs	`98.18% <100.00%> (+0.05%)`	⬆️
cryptography/src/rbsr/mod.rs	`90.13% <90.13%> (ø)`

... and 9 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5c573e...229394c. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude added 6 commits January 1, 2026 23:04

[cryptography/rbsr] Use Blake3 instead of SHA-256 for fingerprint fin…

425045e

…alization Blake3 is faster than SHA-256 while maintaining equivalent security. This is a minor performance optimization since fingerprints are computed frequently during reconciliation.

patrick-ogrady commented Jan 1, 2026

View reviewed changes

patrick-ogrady commented Jan 2, 2026

View reviewed changes

[cryptography/rbsr] Add test for multiple items with same timestamp

60dd89b

claude added 4 commits January 2, 2026 00:27

[cryptography/rbsr] Simplify Message and Bound encoding

7886a87

- Remove version field from Message (just encode Vec<Range>) - Simplify Bound to use Option<Digest> instead of Vec<u8> prefix - Remove version-related tests and error variant - Use Varint for range count in Message

patrick-ogrady commented Jan 2, 2026

View reviewed changes

cryptography/src/rbsr/mod.rs Show resolved Hide resolved

claude added 3 commits January 2, 2026 02:36

[cryptography/rbsr] Return missing IDs as Vec from reconcile()

a4f0a07

commonware-llm force-pushed the claude/rbsr-set-reconciliation-jbyZk branch from 9051f1a to a4f0a07 Compare January 2, 2026 03:33

[cryptography/rbsr] Simplify Bound to use Option<Digest> instead of V…

84561f6

…ec<u8>

patrick-ogrady commented Jan 2, 2026

View reviewed changes

claude added 6 commits January 2, 2026 18:45

[cryptography] Introduce RBSR #2665

Are you sure you want to change the base?

[cryptography] Introduce RBSR #2665

Uh oh!

Conversation

patrick-ogrady commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying monorepo with Cloudflare Pages

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 2, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[cryptography] Introduce `RBSR` #2665

[cryptography] Introduce `RBSR` #2665

patrick-ogrady commented Jan 1, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 1, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 2, 2026 •

edited

Loading