Skip to content

feat(io): add reader implementation for BufRead#196

Open
kskalski wants to merge 3 commits intoanza-xyz:masterfrom
kskalski:ks/buf_read
Open

feat(io): add reader implementation for BufRead#196
kskalski wants to merge 3 commits intoanza-xyz:masterfrom
kskalski:ks/buf_read

Conversation

@kskalski
Copy link
Contributor

@kskalski kskalski commented Feb 18, 2026

Introduces BufReadAdapter<R> in wincode::io::std_read module, a low-overhead wrapper over any std::io::BufRead source (e.g. BufReader<R>) implementing wincode::io::Reader.

Mix of consuming and scoped borrowing operations supported by checking consume state on each operation and using one of:

  • Fast path — when the reader’s internal buffer already holds enough bytes, peek_array and take_scoped return a subslice directly with no allocation, copy_into_slice uses read directly into the dst buffer
  • Aux-buffer fallback — when a contiguous slice larger than one currently available from internal reader's fill_buf is requested, bytes are assembled into an aux_buf: Vec<u8>. Subsequent operations can re-serve or extend from this buffer, but switch back to using internal reader once the bytes in aux_buf are consumed.

@kskalski kskalski force-pushed the ks/buf_read branch 9 times, most recently from 75e8eec to b4dd766 Compare March 18, 2026 09:06
@kskalski kskalski marked this pull request as ready for review March 18, 2026 10:33
@kskalski kskalski requested review from cpubot March 18, 2026 10:34
@kskalski kskalski force-pushed the ks/buf_read branch 2 times, most recently from ca038d0 to cfa77c8 Compare March 20, 2026 01:29
@cpubot
Copy link
Contributor

cpubot commented Mar 22, 2026

The complexity and state management required for this implementation is a strong indication to me that we should actually deprecate peek_* and consume* APIs.

With those removed, implementation can simply become:

use {
crate::io::{ReadResult, Reader},
std::{
io::{BufReader, Read},
mem::{MaybeUninit, transmute},
},
};
pub struct ReadAdapter<R: ?Sized> {
inner: R,
}
impl<R: Read + ?Sized> Reader<'_> for ReadAdapter<R> {
#[inline]
fn copy_into_slice(&mut self, dst: &mut [MaybeUninit<u8>]) -> ReadResult<()> {
Ok(self
.inner
.read_exact(unsafe { transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(dst) })?)
}
}
impl<R: Read + ?Sized> Reader<'_> for BufReader<R> {
#[inline]
fn copy_into_slice(&mut self, dst: &mut [MaybeUninit<u8>]) -> ReadResult<()> {
Ok(self.read_exact(unsafe { transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(dst) })?)
}
}

There is a benchmark in this branch you can run: https://github.com/anza-xyz/wincode/blob/af1904c5d8f83c6c8fa44aa45fba09dae0ffd21f/wincode/benches/reader.rs

The simple implementation without peek_* or consume* methods is faster in every case, and in two cases by a very significant margin. And the implementation in this PR is actually slower than bincode for the SimpleStruct case.

On x86,

The simple solution, relative to the solution in this PR is:

  • ~6x faster in the SimpleStruct case
  • ~3.7x faster in the PodStruct case
  • ~10% faster in the VersionedMessage case.

The simple solution, relative to bincode is:

  • ~4.5x faster in the SimpleStruct case
  • ~38x faster in the PodStruct case (the solution in this PR was much better than bincode here, too)

This isn't to say that we couldn't further optimize the solution proposed here to be closer to the performance of bincode or the solution without peek_* or consume*. But rather that the additional complexity is almost certainly not worth it.

We only have two cases in solana-sdk that are using peek_byte, and both instances are trivial to re-write without it. All implementations in wincode have already been updated such that they never require peek_* or consume* methods. From a library perspective, it doesn't make sense for us to bend over backwards to accommodate those two solana-sdk use-cases. The simple solution adds virtually no maintenance burden, and the removal of peek_* and consume* APIs meaningfully simplify things across the board.


My feeling is that we should merge the context system and deprecate peek_* / consume* APIs in a v0.4.9 release. Now that #220 adds context as an additional trait rather than bundling into SchemaRead, there is no breaking risk.

VersionedMessage can be implemented entirely without context. As seen here

let header = {
let mut reader = unsafe { reader.as_trusted_for(2) }?;
MessageHeader {
num_required_signatures: variant,
num_unsigned_accounts: reader.take_byte()?,
num_signed_accounts: reader.take_byte()?,
}
};
let account_key = <Address as SchemaRead<C>>::get(reader.by_ref())?;
let recent_blockhash = <Hash as SchemaRead<C>>::get(reader.by_ref())?;

And we'll only need to add one context implementation for ShortU16 for this code, which I've already proof-of-concepted here

@kskalski
Copy link
Contributor Author

I certainly agree that removing far-look-ahead functions is desirable and this PR I think demonstrated the reasons well. :)

I will take a look at the benches too - thanks.

Some quick thoughts:

  • peek_byte seems as useful API to me and is trivial to implement without cost for BufRead (it isn't for Read though, so maybe a good question is if it's going to be common to use Read without buffering - e.g. de-compressors sometimes do implement Read, but not BufRead)
  • we don't really need consume even with peek_byte, take_byte seems fine to consume peeked byte
  • a big complexity addition and need for extra branching on each call in this PR is for handling as_trusted_for, since it needs to delay inner.consume - partially it is a consequence of wrapping BufRead as black-box - when owning and managing the buffer manually, it would be possible to bump the internal cursor and still return the slice reader for bytes preceding the position marker. Having a more efficient as_trusted_for would be a big win - we use it often for large and small ranges
  • if branching inherently added for as_trusted_for due to contract of BufRead is the reason for big perf hit, then I think it's going to be better to expose our own buffered reader instead of wrapping BufRead - need to take a closer look at benchmarks
  • peek_array and take_scoped are certainly APIs I would cut - the former isn't used at all and for the latter we know many ways to get rid of

@cpubot
Copy link
Contributor

cpubot commented Mar 23, 2026

  • peek_byte seems as useful API to me and is trivial to implement without cost for BufRead (it isn't for Read though, so maybe a good question is if it's going to be common to use Read without buffering - e.g. de-compressors sometimes do implement Read, but not BufRead)
  • we don't really need consume even with peek_byte, take_byte seems fine to consume peeked byte

I still think we should remove it. It's not strictly necessary functionality with the context system available.

  • a big complexity addition and need for extra branching on each call in this PR is for handling as_trusted_for, since it needs to delay inner.consume - partially it is a consequence of wrapping BufRead as black-box - when owning and managing the buffer manually, it would be possible to bump the internal cursor and still return the slice reader for bytes preceding the position marker. Having a more efficient as_trusted_for would be a big win - we use it often for large and small ranges
  • if branching inherently added for as_trusted_for due to contract of BufRead is the reason for big perf hit, then I think it's going to be better to expose our own buffered reader instead of wrapping BufRead - need to take a closer look at benchmarks

After tinkering with benchmarks, it seems to me that as_trusted_for implemented as an enum is likely not going to yield much, if any, benefit. The primary performance improvement from as_trusted_for is the elimination of branches per read, and subsequent operation coalescing that enables. A match for each method negates those gains and optimization opportunities. The simple implementation proposed above doesn't implement as_trusted_for, and was still a clear winner performance-wise.

  • peek_array and take_scoped are certainly APIs I would cut - the former isn't used at all and for the latter we know many ways to get rid of

I still think take_scoped has utility that can't be replicated otherwise. Additionally, take_scoped is not required to implement by the trait, and usage can be gated by Reader::supports_borrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants