Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions wire_encodings/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[package]
name = "wire_encodings"
version = "0.1.0"
edition = "2024"

[dependencies]
serde = { version = "1.0", features = ["derive"] }
bincode = { version = "1.3" }
prost = "0.14.1"
rmp-serde = "1.1"
serde_cbor = "0.11"
parity-scale-codec = { version = "3.6", features = ["derive"] }
borsh = { version = "1.0", features = ["derive"] }
serde_json = "1.0"


ethereum_ssz = "0.9.0"
ethereum_ssz_derive = "0.9.0"

hex = "0.4"

[dev-dependencies]
criterion = { version = "0.6.0", features = ["html_reports"] }


[[bench]]
name = "encodig_benchmarks"
path = "benches/benchmark.rs"
harness = false
162 changes: 162 additions & 0 deletions wire_encodings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Choosing the Right Encoding for Networking

## 1. Introduction

Nomos currently uses **bincode** for all serialization. While fast and simple, bincode lacks support for **schema evolution** and **cross-language compatibility**—two features increasingly important as the ecosystem scales.

## 2. Selection Criteria

We evaluated candidate encoding formats based on:

* **Schema Evolution** – Supports changes to data structures without breaking compatibility.
* **Security** – Handles untrusted inputs safely.
* **Performance** – Fast to serialize and deserialize.
* **Determinism** – Always produces identical output for the same input.
* **Cross-language Compatibility** – Available and maintained across multiple languages.

## 3. Format Comparison (P2P-Relevant Formats)

| Encoding Format | Evolution | Deterministic | Security | Lang Support | Usage |
| --------------- | --------- | ------------- | -------- | ------------ | --------------------- |
| **Protobuf** | ✓ | △ | ⬤⬤◯ | ⬤⬤⬤ | Cosmos SDK, libp2p |
| **Borsh** | — | ✓ | ⬤⬤⬤ | ⬤⬤◯ | NEAR, Solana programs |
| **SCALE** | — | ✓ | ⬤⬤◯ | ⬤◯◯ | Polkadot |
| **Bincode** | — | ✓ | ⬤⬤◯ | ⬤◯◯ | Solana validators |
| **CBOR** | ✓ | △ | ⬤⬤◯ | ⬤⬤⬤ | Cardano, IPFS |
| **MsgPack** | △ | △ | ⬤◯◯ | ⬤⬤⬤ | Algorand |
| **SSZ** | — | ✓ | ⬤⬤⬤ | ⬤◯◯ | Ethereum 2.0 |

## 4. Tooling Considerations

### Serde-Generate

[https://crates.io/crates/serde-generate](https://crates.io/crates/serde-generate)

Auto-generates serializers for Rust types, but fails with complex constructs like `Risc0LeaderProof` due to issues with the `tracing` feature. Adding schemas manually is also non-trivial.

### Canonical Protobuf in Rust

[https://crates.io/crates/prost](https://crates.io/crates/prost)

We can use `prost` with deterministic configurations:

```rust
// In build.rs - use BTreeMap for consistent key ordering
let mut config = prost_build::Config::new();
config.btree_map(&["."]); // Apply to all map fields
config.compile_protos(&["your.proto"], &["."])?;
```

Manual canonicalization wrapper:

```rust
use prost::Message;

fn encode_with_prost<T: Message>(msg: &T) -> Vec<u8> {
// Standard prost encoding. Output is deterministic if:
// - map fields use BTreeMap via build config
// - .proto field tag numbers match declaration order
// - unknown or unset optional fields are avoided
let mut buf = Vec::new();
msg.encode(&mut buf).unwrap();
buf
}
```

## 5. Why Evolution Matters

```rust
// V1
struct Block { height: u64, hash: [u8; 32] }

// V2
struct Block { height: u64, hash: [u8; 32], timestamp: u64 }
```

Rigid encodings like bincode break when structures evolve. Nomos may support thousands of clients, making coordinated upgrades costly. Formats that support optional fields and schema evolution reduce this friction.

## 6. Planning for Multi-Language Support

Support for multiple implementations matters most post-adoption. Initially, most ecosystems rely on one client. However, protocol-level encoding should avoid locking into a single ecosystem from the start.

## 7. Serialization Benchmark Analysis

### Benchmark Overview

This analysis compares 8 serialization formats: Bincode, Borsh, CBOR, JSON, MessagePack, Protobuf, SCALE, and SSZ. The benchmarks test performance on different data types and scales.

### Top Observations

* Bincode offers best performance for small and medium-sized data
* SCALE and Borsh perform best on large structures
* Protobuf offers compact encoding size and moderate throughput

### Simple Structs Serialization (Roundtrip Performance)

| Encoding Format | Roundtrip Time | Size (bytes) | Throughput (Melem/s) |
| --------------- | -------------- | ------------ | -------------------- |
| Bincode | 607.66 ns | 12 | 82.28 |
| Borsh | 775.14 ns | 12 | 64.51 |
| SSZ | 947.31 ns | 12 | 52.78 |
| SCALE | 1.06 µs | 10 | 47.31 |
| Protobuf | 1.21 µs | 9 | 41.24 |
| MessagePack | 2.12 µs | 13 | 23.63 |
| JSON | 3.74 µs | 31 | 13.36 |
| CBOR | 5.70 µs | 22 | 8.77 |

### Binary Structs Performance

| Encoding Format | Roundtrip Time | Size (bytes) | Throughput (Melem/s) |
| --------------- | -------------- | ------------ | -------------------- |
| Bincode | 1.59 µs | 13 | 31.55 |
| Borsh | 1.81 µs | 9 | 27.64 |
| SCALE | 1.83 µs | 6 | 27.25 |
| Protobuf | 2.66 µs | 7 | 18.81 |
| MessagePack | 3.19 µs | 7 | 15.66 |
| SSZ | 4.27 µs | 9 | 11.70 |
| CBOR | 4.82 µs | 12 | 10.37 |
| JSON | 4.99 µs | 20 | 10.03 |

### Large Structs Performance

| Encoding Format | Roundtrip Time | Size (bytes) | Throughput (Kelem/s) |
| --------------- | -------------- | ------------ | -------------------- |
| SCALE | 2.85 µs | 647 | 1,755 |
| Borsh | 3.05 µs | 700 | 1,637 |
| Protobuf | 4.81 µs | 643 | 1,041 |
| Bincode | 5.29 µs | 772 | 946 |
| SSZ | 8.57 µs | 589 | 583 |
| CBOR | 18.13 µs | 720 | 276 |
| MessagePack | 18.45 µs | 618 | 271 |
| JSON | 27.11 µs | 1,318 | 184 |

### Performance Scaling Characteristics

| Format | Simple Structs | Binary Structs | Large Structs | Overall Scalability |
| ----------- | -------------- | -------------- | ------------- | ------------------- |
| Bincode | ⬤⬤⬤ | ⬤⬤⬤ | ⬤⬤◯ | Strong |
| SCALE | ⬤⬤◯ | ⬤⬤⬤ | ⬤⬤⬤ | Strong |
| Borsh | ⬤⬤◯ | ⬤⬤◯ | ⬤⬤⬤ | Strong |
| Protobuf | ⬤⬤◯ | ⬤⬤◯ | ⬤⬤◯ | Good |
| SSZ | ⬤◯◯ | ⬤◯◯ | ⬤◯◯ | Good |
| MessagePack | ⬤◯◯ | ⬤◯◯ | ⬤◯◯ | Good |
| CBOR | ⬤◯◯ | ⬤⬤◯ | ⬤◯◯ | Moderate |
| JSON | ⬤◯◯ | ⬤◯◯ | ⬤◯◯ | Moderate |

## 8. Encoding Evaluation and Conclusion

Benchmark results show meaningful performance differences across formats, especially in large and nested structures. While bincode leads in raw speed for small data, SCALE and Borsh outperform others in large-structure throughput.

**Good alternatives to consider:**

* **Protobuf** – Supports schema evolution, great language support, and offers compact size with moderate performance.
* **Borsh and SCALE** – Compact and deterministic; good performance, but lack schema evolution and broader tooling.

Each has trade-offs. The decision depends on the need for schema evolution, performance, cross-language support, and implementation simplicity.

## 9. How to run benchmarks
To run the benchmarks, clone the repository and execute:

```bash
cargo bench
```
75 changes: 75 additions & 0 deletions wire_encodings/benches/benchmark.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
use criterion::{Criterion, criterion_group, criterion_main};
use std::time::Duration;

mod common;
mod formats;

use common::*;
use formats::*;

fn benchmark_simple_structs(c: &mut Criterion) {
let data: Vec<SimpleStruct> = (0..50).map(|_| generate_simple_struct()).collect();
let borsh_data: Vec<_> = data.iter().map(convert_to_borsh_simple).collect();
let scale_data: Vec<_> = data.iter().map(convert_to_scale_simple).collect();
let ssz_data: Vec<_> = data.iter().map(convert_to_ssz_simple).collect();
let proto_data: Vec<_> = data.iter().map(convert_to_proto_simple).collect();

println!("\n=== SIMPLE STRUCTS COMPARISON ===");
bench_roundtrip::<SimpleStruct, BincodeFormat>(c, &data, "simple");
bench_roundtrip::<SimpleStruct, JsonFormat>(c, &data, "simple");
bench_roundtrip::<SimpleStruct, CborFormat>(c, &data, "simple");
bench_roundtrip::<SimpleStruct, MessagePackFormat>(c, &data, "simple");
bench_roundtrip::<borsh_format::SimpleStructBorsh, BorshFormat>(c, &borsh_data, "simple");
bench_roundtrip::<scale_format::SimpleStructScale, ScaleFormat>(c, &scale_data, "simple");
bench_roundtrip::<ssz_format::SimpleStructSsz, SszFormat>(c, &ssz_data, "simple");
bench_roundtrip::<protobuf_format::SimpleStructProto, ProtobufFormat>(c, &proto_data, "simple");
}

fn benchmark_binary_structs(c: &mut Criterion) {
let data: Vec<BinaryStruct> = (0..50).map(|_| generate_binary_struct()).collect();
let borsh_data: Vec<_> = data.iter().map(convert_to_borsh_binary).collect();
let scale_data: Vec<_> = data.iter().map(convert_to_scale_binary).collect();
let ssz_data: Vec<_> = data.iter().map(convert_to_ssz_binary).collect();
let proto_data: Vec<_> = data.iter().map(convert_to_proto_binary).collect();

println!("\n=== BINARY STRUCTS COMPARISON ===");
bench_roundtrip::<BinaryStruct, BincodeFormat>(c, &data, "binary");
bench_roundtrip::<BinaryStruct, JsonFormat>(c, &data, "binary");
bench_roundtrip::<BinaryStruct, CborFormat>(c, &data, "binary");
bench_roundtrip::<BinaryStruct, MessagePackFormat>(c, &data, "binary");
bench_roundtrip::<borsh_format::BinaryStructBorsh, BorshFormat>(c, &borsh_data, "binary");
bench_roundtrip::<scale_format::BinaryStructScale, ScaleFormat>(c, &scale_data, "binary");
bench_roundtrip::<ssz_format::BinaryStructSsz, SszFormat>(c, &ssz_data, "binary");
bench_roundtrip::<protobuf_format::BinaryStructProto, ProtobufFormat>(c, &proto_data, "binary");
}

fn benchmark_large_structs(c: &mut Criterion) {
let data: Vec<LargeStruct> = (0..5).map(|_| generate_large_struct()).collect();
let borsh_data: Vec<_> = data.iter().map(convert_to_borsh_large).collect();
let scale_data: Vec<_> = data.iter().map(convert_to_scale_large).collect();
let ssz_data: Vec<_> = data.iter().map(convert_to_ssz_large).collect();
let proto_data: Vec<_> = data.iter().map(convert_to_proto_large).collect();

println!("\n=== LARGE STRUCTS COMPARISON ===");
bench_roundtrip::<LargeStruct, BincodeFormat>(c, &data, "large");
bench_roundtrip::<LargeStruct, JsonFormat>(c, &data, "large");
bench_roundtrip::<LargeStruct, CborFormat>(c, &data, "large");
bench_roundtrip::<LargeStruct, MessagePackFormat>(c, &data, "large");
bench_roundtrip::<borsh_format::LargeStructBorsh, BorshFormat>(c, &borsh_data, "large");
bench_roundtrip::<scale_format::LargeStructScale, ScaleFormat>(c, &scale_data, "large");
bench_roundtrip::<ssz_format::LargeStructSsz, SszFormat>(c, &ssz_data, "large");
bench_roundtrip::<protobuf_format::LargeStructProto, ProtobufFormat>(c, &proto_data, "large");
}

criterion_group!(
name = combined_benches;
config = Criterion::default()
.measurement_time(Duration::from_secs(10))
.sample_size(100);
targets =
benchmark_simple_structs,
benchmark_binary_structs,
benchmark_large_structs
);

criterion_main!(combined_benches);
123 changes: 123 additions & 0 deletions wire_encodings/benches/common/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
use criterion::{Criterion, Throughput};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::hint::black_box;

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct SimpleStruct {
pub id: u32,
pub value: u64,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct BinaryStruct {
pub data: Vec<u8>,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct LargeStruct {
pub items: Vec<ItemStruct>,
pub map: HashMap<String, ItemStruct>,
pub nested: SimpleData,
pub blob: Vec<u8>,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct ItemStruct {
pub name: String,
pub values: Vec<u32>,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct SimpleData {
pub values: HashMap<String, String>,
pub inner: InnerData,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
pub struct InnerData {
pub count: u32,
pub flag: bool,
}

pub fn generate_simple_struct() -> SimpleStruct {
SimpleStruct {
id: 12345,
value: 9876543210,
}
}

pub fn generate_binary_struct() -> BinaryStruct {
BinaryStruct {
data: vec![1, 2, 3, 4, 5],
}
}

pub fn generate_large_struct() -> LargeStruct {
let item1 = ItemStruct {
name: "item_one".to_string(),
values: vec![10, 20, 30],
};

let item2 = ItemStruct {
name: "item_two".to_string(),
values: vec![40, 50, 60],
};

let mut map = HashMap::new();
map.insert("first".to_string(), item1.clone());
map.insert("second".to_string(), item2.clone());

LargeStruct {
items: vec![item1, item2],
map,
nested: SimpleData {
values: [
("key1".to_string(), "value1".to_string()),
("key2".to_string(), "value2".to_string()),
]
.into_iter()
.collect(),
inner: InnerData {
count: 5000,
flag: true,
},
},
blob: vec![0u8; 512],
}
}

pub trait EncodingBenchmark<T> {
fn name() -> &'static str;
fn encode(data: &T) -> Vec<u8>;
fn decode(data: &[u8]) -> T;
}

pub fn bench_roundtrip<T, F>(c: &mut Criterion, data: &[T], test_name: &str)
where
F: EncodingBenchmark<T>,
T: Clone,
{
let mut group = c.benchmark_group(format!("roundtrip_{}", test_name));
group.throughput(Throughput::Elements(data.len() as u64));

group.bench_function(F::name(), |b| {
b.iter(|| {
for item in data {
let encoded = F::encode(black_box(item));
let decoded = F::decode(black_box(&encoded));
black_box(decoded);
}
})
});

group.finish();

let sample_encoded = F::encode(&data[0]);
println!(
"{} {} - {} bytes per item",
F::name(),
test_name,
sample_encoded.len()
);
}
Loading