perf: manual decode for utf-8 char deserialization using take_array#187
perf: manual decode for utf-8 char deserialization using take_array#187kskalski merged 4 commits intoanza-xyz:masterfrom
Conversation
|
For small reads like this it's totally fine to use wincode/wincode/src/io/std_io.rs Lines 103 to 106 in 85034c2 The proposed implementation will short-circuit if the buffer already contains enough bytes to fulfill the request.
Our Consider a case where one is deserializing a struct containing all integers and chars. If those were to all use The proposed no-grow implementation will still actually call wincode/wincode/src/io/std_io.rs Line 213 in 85034c2 and this was done specifically to avoid the above case mentioned above. But, this is an implementation detail of the proposal, not something necessarily guaranteed by the API. By using the |
|
The practical problem is that even when operating on buffers, satisfying
I would argue that It's fine to have Arguably the de-fragmentation cost mentioned above is not that bad if the expectation for returned slice can be bounded to |
The issue with a memcpy for small values is that it can prohibit scalarization because it is an opaque "copy some bytes" intrinsic. This is similar in motivation to #64. It gives the compiler more opportunity and more visibility to I do agree with one of your points in #188, that because we employ this pattern so often: let bytes: [u8; N] = *reader.fill_array();
unsafe { reader.consume_unchecked(N) };There is likely room for an additional helper like your proposed I would certainly prefer this over using |
wincode/src/schema/impls.rs
Outdated
| 0xF0..=0xF4 => 4, | ||
| _ => return Err(invalid_char_lead(b0)), | ||
| }; | ||
| let mut buf = [0u8; 4]; |
There was a problem hiding this comment.
It's still not totally clear to me why we want to eliminate usage of fill_buf / fill_exact, but if we do go this route, perhaps
let mut buf = MaybeUninit::<[u8; 4]>::uninit();
// Safety: len is at most 4, so slice is up to buf.len() and casting [u8] to uninit is safe
let uninit_slice = unsafe {
core::slice::from_raw_parts_mut(buf.as_mut_ptr().cast::<MaybeUninit<u8>>(), len)
};
reader.copy_into_slice(uninit_slice)?;
let buf = unsafe { core::slice::from_raw_parts(buf.as_ptr().cast::<u8>(), len) };There was a problem hiding this comment.
I will post a bit more context on that in a different place - in short the issue is that chunked buffered reader can't guarantee >1 byte sized contiguous slices are always available.
For this PR I was also thinking of building on top of #33 and use take_array in each branch separately, but I can also compare with the version above.
There was a problem hiding this comment.
Sounds good. Would definitely like to hear more about the background context on this
There was a problem hiding this comment.
Ok, I think we got what I wanted in a win-win fashion - current state of the PR gets rid of fill_exact and achieves better performance than master and even slightly better than patch proposed in #33 (see benchamark numbers from PR description)
a0afebf to
697db74
Compare
697db74 to
090e669
Compare
fill_exactduringchardeserialization such thatReaderimplementation isn't required to always provide borrowed slice of requested (2-4 byte) size.take_arrayof varying lengths depending on first byte checkcode_pointfrom bytes for each 2-4 len (based on Optimize char deserialization with manual UTF-8 decoder #33)This removes the only call to
fill_exactin production code outside of default / forwarding implementations ofReader.Benchmark comparison
Checks decoding 10_000 chars from a random String