-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
While browsing through Bevy's code I stumbled upon this comment:
impl BufferVec<T> {
pub fn push(&mut self, value: T) -> usize {
let element_size = u64::from(T::min_size()) as usize;
let offset = self.data.len();
// TODO: Consider using unsafe code to push uninitialized, to prevent
// the zeroing. It shows up in profiles.
self.data.extend(iter::repeat_n(0, element_size));
// ...
}
// ...
}and thought I'd have a stab at it.
We begin by using reserve() instead of extend() and then use spare_capacity_mut() to get the new data (yet uninitialized). Writeris implemented for &[MaybeUninit<u8>] so the rest is mostly the same.
pub fn push(&mut self, value: T) -> usize {
let element_size = u64::from(T::min_size()) as usize;
self.data.reserve(element_size);
let spare: &mut [MaybeUninit<u8>] = self.data.spare_capacity_mut();
let mut dest = &mut spare[..element_size];
value.write_into(&mut Writer::new(&value, &mut dest, 0).unwrap());
// SAFETY:
// - new len only covers the new element, for which space was reserved
// - all uninitialized bytes have been written to
let offset = self.data.len();
unsafe {
self.data.set_len(offset + element_size);
}
offset / u64::from(T::min_size()) as usize
}From my tests this works fine (but see below) and the assembly indeed shows that the zeroing is gone. Benchmarks against the baseline are a bit all other the place on my machine (MBP M2 Max), but the trend seems positive in most cases. Though in devprofile the gains are huge, between 3-10x faster. If anyone wants to test on other devices, here is the branch. Use
cargo bench -p benches --bench render -- pushto benchmark using different element types.
So far so good, except the safety comment is a lie: not all bytes have been written to, because write_into() skips any inner padding that type T might have in its GPU representation (described by its implementation of trait ShaderType). T the Rust type is "expanded" to this representation by push() and write_into().
So this is a request for comments and ideas for follow-ups; the BufferVec API does not expose the underlying CPU data, but the unintialized bytes are exposed through a reference in write_buffer() and write_buffer_range(). Maybe write_into() can be modified to zero the padding bytes too ?
In the branch linked above, see the buffer-test example for what I used to test correctness (code for this example is with the benchmarks)
cargo run -p benches --example buffer-test