Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a performant way to (de-)serialize fixed-size byte arrays #127

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hrxi
Copy link

@hrxi hrxi commented Jan 27, 2024

First observed in est31/serde-big-array#19, there's no performant way to serialize fixed-size byte arrays with serde/postcard.

Add a FixedSizeByteArray type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself.

This works around serde-rs/serde#2680.

serialize16/own                 [3.8846 ns 3.8909 ns 3.8988 ns]
serialize16/bytes               [9.9192 ns 9.9503 ns 9.9907 ns]
serialize16/byte_array          [4.1374 ns 4.1461 ns 4.1564 ns]
serialize16/big_array           [68.245 ns 68.344 ns 68.454 ns]
serialize16/fixed_size          [9.9966 ns 10.030 ns 10.076 ns]
serialize16/variable_size       [15.696 ns 15.795 ns 15.898 ns]

serialize32/own                 [4.1744 ns 4.1857 ns 4.2014 ns]
serialize32/bytes               [9.6590 ns 9.6825 ns 9.7151 ns]
serialize32/byte_array          [4.5584 ns 4.5663 ns 4.5747 ns]
serialize32/big_array           [135.38 ns 135.78 ns 136.34 ns]
serialize32/fixed_size          [13.989 ns 14.034 ns 14.091 ns]
serialize32/variable_size       [21.223 ns 21.270 ns 21.328 ns]

deserialize16/own               [5.9018 ns 5.9143 ns 5.9272 ns]
deserialize16/bytes             [21.198 ns 21.278 ns 21.376 ns]
deserialize16/byte_array        [6.8560 ns 6.8719 ns 6.8934 ns]
deserialize16/big_array         [19.027 ns 19.098 ns 19.187 ns]
deserialize16/fixed_size        [7.9197 ns 7.9471 ns 7.9817 ns]
deserialize16/variable_size     [36.804 ns 36.871 ns 36.946 ns]

deserialize32/own               [8.8860 ns 8.8989 ns 8.9139 ns]
deserialize32/bytes             [21.241 ns 21.271 ns 21.321 ns]
deserialize32/byte_array        [11.441 ns 11.459 ns 11.481 ns]
deserialize32/big_array         [31.553 ns 31.618 ns 31.697 ns]
deserialize32/fixed_size        [17.334 ns 17.360 ns 17.386 ns]
deserialize32/variable_size     [59.705 ns 59.815 ns 59.949 ns]
  • own is the new FixedSizeByteArray, no length prefix.
  • bytes is serde_bytes::Bytes, it has a length prefix.
  • byte_array is serde_byte_array::ByteArray, it has a length prefix.
  • big_array is serde_big_array::Array, no length prefix.
  • fixed_size is [u8; _], no length prefix.
  • variable_size is [u8], it has a length prefix.

First observed in est31/serde-big-array#19,
there's no performant way to serialize fixed-size byte arrays with
serde/postcard.

Add a `FixedSizeByteArray` type that can be used to serialize fixed-size
byte arrays faster than all other available methods. This type only
works with postcard. A more general solution would need to be
implemented in serde itself.

This works around serde-rs/serde#2680.

```
serialize16/own                 [3.8846 ns 3.8909 ns 3.8988 ns]
serialize16/bytes               [9.9192 ns 9.9503 ns 9.9907 ns]
serialize16/byte_array          [4.1374 ns 4.1461 ns 4.1564 ns]
serialize16/big_array           [68.245 ns 68.344 ns 68.454 ns]
serialize16/fixed_size          [9.9966 ns 10.030 ns 10.076 ns]
serialize16/variable_size       [15.696 ns 15.795 ns 15.898 ns]

serialize32/own                 [4.1744 ns 4.1857 ns 4.2014 ns]
serialize32/bytes               [9.6590 ns 9.6825 ns 9.7151 ns]
serialize32/byte_array          [4.5584 ns 4.5663 ns 4.5747 ns]
serialize32/big_array           [135.38 ns 135.78 ns 136.34 ns]
serialize32/fixed_size          [13.989 ns 14.034 ns 14.091 ns]
serialize32/variable_size       [21.223 ns 21.270 ns 21.328 ns]

deserialize16/own               [5.9018 ns 5.9143 ns 5.9272 ns]
deserialize16/bytes             [21.198 ns 21.278 ns 21.376 ns]
deserialize16/byte_array        [6.8560 ns 6.8719 ns 6.8934 ns]
deserialize16/big_array         [19.027 ns 19.098 ns 19.187 ns]
deserialize16/fixed_size        [7.9197 ns 7.9471 ns 7.9817 ns]
deserialize16/variable_size     [36.804 ns 36.871 ns 36.946 ns]

deserialize32/own               [8.8860 ns 8.8989 ns 8.9139 ns]
deserialize32/bytes             [21.241 ns 21.271 ns 21.321 ns]
deserialize32/byte_array        [11.441 ns 11.459 ns 11.481 ns]
deserialize32/big_array         [31.553 ns 31.618 ns 31.697 ns]
deserialize32/fixed_size        [17.334 ns 17.360 ns 17.386 ns]
deserialize32/variable_size     [59.705 ns 59.815 ns 59.949 ns]
```

- `own` is the new `FixedSizeByteArray`, no length prefix.
- `bytes` is `serde_bytes::Bytes`, it has a length prefix.
- `byte_array` is `serde_byte_array::ByteArray`, it has a length prefix.
- `big_array` is `serde_big_array::Array`, no length prefix.
- `fixed_size` is `[u8; _]`, no length prefix.
- `variable_size` is `[u8]`, it has a length prefix.
Copy link

netlify bot commented Jan 27, 2024

Deploy Preview for cute-starship-2d9c9b canceled.

Name Link
🔨 Latest commit b0a07fd
🔍 Latest deploy log https://app.netlify.com/sites/cute-starship-2d9c9b/deploys/65bbc34b4cfe0d0008d490b9

@hrxi
Copy link
Author

hrxi commented Jan 27, 2024

This was motivated by benchmarking the serialization of a Merkle tree structure that serializes lots of hashes represented as fixed size byte arrays. The serialization of that tree has seen a >2x performance increase using this code.

@@ -300,8 +300,12 @@ where
}

#[inline]
fn serialize_struct(self, _name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {
Ok(self)
fn serialize_struct(self, name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little worried about degrading performance for all other cases to optimize specifically for the byte array case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (have not measured) that the performance degradation is going to be minimal. The extra code is a single pointer comparison, likely optimized out because this is in monomorphized code.

It's also the same way serde_json::value::RawValue is implemented. I believe serde_json is a high-performance serde JSON implementation ensured by benchmarks.

@jamesmunns
Copy link
Owner

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

This adds conditionals and panicking branches to every ser/de struct field path.

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple", like what other fixed size arrays already do today.

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple"

Yes, that would be possible. It's what e.g. serde_big_array does. However, it leads to subpar performance, seen in the PR description. The motivation of this PR is precisely fixing this subpar performance. If there's another way to do that, I'd be more than happy to implement it. I wasn't able to come up with another strategy though.

This adds conditionals and panicking branches to every ser/de struct field path.

I'm not sure if this additional conditional/panic is a problem, I'd guess "no". Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

Thanks for taking a look at the PR and being quick to answer with your feelings. :)

@jamesmunns
Copy link
Owner

Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

I'm interested in poking around with what serde-big-array is doing. I wonder if you had a SPECIFIC type for bytes, rather than "any T" like serde-big-array, if the performance delta would be so great.

This isn't a hard no on merging this, it just doesn't sit great with me due to the special-cased nature of it. If it is really the only way and doesn't have other perf impact tradeoffs, then I'm happy to consider it further.

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

Note that the serialization in serde-big-array doesn't do anything weird, it's just a simple for loop, but it's still slow.

@jamesmunns
Copy link
Owner

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

Feel free to ignore my gut until I (or you!) come up with data to prove it right/wrong.

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

AFAICT, this is only in the Deserialize implementation. The Serialize implementation looks completely plain to me:

impl<'de, T, const N: usize> BigArray<'de, T> for [T; N] {
    fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
    where
        S: Serializer,
        T: Serialize,
    {
        let mut seq = serializer.serialize_tuple(self.len())?;
        for elem in &self[..] {
            seq.serialize_element(elem)?;
        }
        seq.end()
    }
    []
}

I agree that the Deserialize implementation might have some overhead.

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

Not looking good from a first benchmark:

log/postcard/serialize    [317.97 µs 319.60 µs 321.70 µs]
log/postcard127/serialize [427.51 µs 429.38 µs 431.44 µs]
log/postcard/deserialize    [2.3656 ms 2.3784 ms 2.3933 ms]
log/postcard127/deserialize [2.3488 ms 2.3555 ms 2.3632 ms]

mesh/postcard/serialize    [669.90 µs 673.52 µs 678.01 µs]
mesh/postcard127/serialize [959.39 µs 962.73 µs 966.47 µs]
mesh/postcard/deserialize    [1.6938 ms 1.7085 ms 1.7258 ms]
mesh/postcard127/deserialize [1.3297 ms 1.3331 ms 1.3367 ms]

minecraft_savedata/postcard/serialize    [383.24 µs 386.63 µs 390.71 µs]
minecraft_savedata/postcard127/serialize [451.78 µs 453.60 µs 455.74 µs]
minecraft_savedata/postcard/deserialize    [2.0008 ms 2.0067 ms 2.0133 ms]
minecraft_savedata/postcard127/deserialize [1.9978 ms 2.0061 ms 2.0156 ms]

mk48/postcard/serialize    [1.7979 ms 1.8187 ms 1.8451 ms]
mk48/postcard127/serialize [1.8483 ms 1.8572 ms 1.8679 ms]
mk48/postcard/deserialize    [4.2330 ms 4.2469 ms 4.2622 ms]
mk48/postcard127/deserialize [4.2577 ms 4.2743 ms 4.2925 ms]

@hrxi
Copy link
Author

hrxi commented Jan 31, 2024

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

@hrxi
Copy link
Author

hrxi commented Feb 1, 2024

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

Would you be sympathetic to have this, or to have this as an opt-in feature?

@jamesmunns
Copy link
Owner

I'm not sure yet! I likely won't be able to poke around in this before the weekend.

They're used less often in real code.
@hrxi
Copy link
Author

hrxi commented Feb 19, 2024

I'm interested in the general sentiment on this PR because depending on the result, I have to take different routes for the serialization in my project. Have you had time to look at it yet?

@jamesmunns
Copy link
Owner

Hey @hrxi, I haven't had a chance to prioritize this. I have a client deadline coming up soon, and will likely not have time to spend on this in the near future.

I'm interested in hearing about your exploration in this space, but I'd generally like to keep postcard as straightforward and predictable as possible, rather than optimizing any one specific use case.

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

@hrxi
Copy link
Author

hrxi commented Feb 19, 2024

Thanks for the quick answer!

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

Yup, that's definitely an option I'll consider.

hrxi added a commit to nimiq/core-rs-albatross that referenced this pull request Feb 23, 2024
Add a FixedSizeByteArray type that can be used to serialize fixed-size
byte arrays faster than all other available methods. This type only
works with postcard. A more general solution would need to be
implemented in serde itself.

This works around serde-rs/serde#2680.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants