-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a SmallVec for every pixel is slow #172
Comments
The |
thanks for investing the time! great observations :) |
the most important question is: is smallvec slower than a standard vec? because usually, slices are used where possible already. most of the time, smallvec is only used where sliced cannot be used unfortunately |
Instead of creating a SmallVec for every pixel, you can have the caller provide a pub fn output_sample_at(&self, position: Vec2<usize>, output_buffer: &mut [u8]) {
output_buffer.copy_from_slice(self.samples_at(position))
} And the usage would look roughly like this: // pre-allocate a buffer once; a Vec is faster in this case
let mut buffer = vec![0; bytes_per_pixel];
// iterate over all pixels in an image
// note: iteration order (rows first or columns first)
// must match the image layout for performance
for x in 0..img_width {
for y in 0..img_height {
image.output_pixel_at(x, y, &mut buffer);
// the pixel was written to the buffer
process_pixel(&mut buffer); // whatever we want to do with it
}
} This API also allows for cheaper transformation of images without copying the data to/from the stack. You could allocate a large Vec for the output image, iterate over it with |
I'm a little bit hesitant with this approach. I fear it is too low level. The content of the byte slice can not be interpreted without knowing about the channel layout of the particular image. What about passing a Also, there is a function that avoids the allocation of the smallvec, returning an iterator instead. The iterator can be written into a pre allocated vector. Maybe this is something the crop function needs to use instead of Lines 504 to 511 in 49fece0
|
Yes, I believe this would resolve the performance pitfall. The function returning a SmallVec is just a convenience wrapper around this one, correct? If so, I suggest exposing this function everywhere and marking the one returning SmallVec as deprecated. Some thoughts on the
I'm not sure I follow. The proposed function will write out the exact same sequence of bytes as the one returning a SmallVec, the difference is that the allocation is controlled by the caller. The current calls look something like this (ignoring the let pixel = img.samples_at(x, y);
process_pixel(&mut pixel); While a call to the proposed interface would be: let buffer = vec![0; sample_size];
output_samples_at(x, y, &mut buffer);
process_pixel(&mut buffer); Except nobody ever looks up just one pixel, so calling The only difference is that the required length of the buffer needs to be known before calling |
I would like to implement a prototypal refactoring to show what I mean :) but maybe I cannot get it done today My plan is to adjust the crop function to re-use one allocation, by calling the iterator based function. Then Profile to find out whether the Iterator is implement in a way that the compiler can optimize as expected. I don't want to remove the convenience wrapper returning a smallvec, but we can rename it and add documentation about the performance implications. |
@Shnatsel would you mind summarizing your approach to profiling? And how you generated those interactive reports? Those are pretty nice. What software did you use? I'm interested in profiling this library more |
I have an entire article written about this, but it's currently not published. Drop me a line at the email in my Github profile and I'll share the draft with you. |
I've been looking into the TODO in the README:
TODO profile if smallvec is really an improvement!
SmallVec
is all over the place, so it's rather difficult to remove it and benchmark the before/after. I've used Rust's instrumentation-based coverage to find out how frequently aSmallVec
is constructed. I've profiled the7_crop_alpha_any_image
example. Here's the report in HTML (trimmed so that only covered lines are shown): results_gt1.html.gzI've found that an instance of this type is constructed for every pixel:
exrs/src/image/mod.rs
Line 493 in 49fece0
Profiling shows that 55% of the execution time is spent just creating
SmallVec
s for every pixel in the crop function: https://share.firefox.dev/3WQyWURThis will be even worse on images with more than 8 channels, where a heap allocation will be performed for every pixel.
It seems that the approach of creating a SmallVec for every pixel is fundamentally slow. A much more performant way to implement this would be returning a slice of the original data, without constructing an owned data structure.
The text was updated successfully, but these errors were encountered: