-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(injector): add an extend
method to Nucleo's injector
#74
base: master
Are you sure you want to change the base?
feat(injector): add an extend
method to Nucleo's injector
#74
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance difference looks promising!
I'm not super familiar with this code myself so I just have some minor/style comments.
Co-authored-by: Michael Davis <[email protected]>
bca0298
to
ba6c552
Compare
ba6c552
to
fb31691
Compare
Any thoughts on how to proceed with these changes? Should we wait for input from @pascalkuthe? (same thing for #75) |
I'm not that familiar with this code but I think this looks good. @pascalkuthe should have a look as well. He's a bit busy at the moment with work so it might take him a while to find some time to look at this (and #75). Unrelated: also consider upstreaming both of these changes to https://github.com/ibraheemdev/boxcar - if I read the history correctly this module is vendored from that crate and it could be nice to share these improvements with the users of that crate too. (That crate looks to have quite a few dependents looking at download info so these changes could be quite impactful :) |
let end_location = Location::of(start_index + count - 1); | ||
|
||
// Allocate necessary buckets upfront | ||
if start_location.bucket != end_location.bucket { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pessimisation. This is only supposed to be used for avoiding contention on allocating a new shard. That is only needed for the end_bucket
and the bucket after it. For the other buckets it's not needed as they will all be allocated contention free from within this function.
The correct logic would look like this:
let alloc_entry = end_location.bucket_len - (end_location.bucket_len >> 3);
if end_location.entry >= alloc_entry && (start_location.bucket != end_location.bucket || start_location.entry <= alloc_entry) {
if let Some(next_bucket) = self.buckets.get(end_location.bucket as usize + 1) {
Vec::get_or_alloc(next_bucket, end_location.bucket_len << 1, self.columns);
}
}
if start_location.bucket != end_location.bucket {
let bucket_ptr = self.buckets.get_unckecked(end_location.bucket as usize);
Vec::get_or_alloc(bucket_ptr, end_location.bucket_len, self.columns);
}
we probably want to turn all_entry
intoa function on Location
since it's used in multiple places now
} | ||
// Route each value to its corresponding bucket | ||
let mut location; | ||
for (i, v) in values.into_iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExactSizeItreator is a safe trait that can have bugs/lie about it's size. Unsafe code cannot rely on the reported length being correct. You need to track the index (with .enumrate()) and if we go past the claimed length you need to panic (less is fine since that would just mean a permanent gap in the vec)
.len() | ||
.try_into() | ||
.expect("overflowed maximum capacity"); | ||
if count == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as said the reported len can be wrong so you want an assert inside this function like assert_eq!(values.next(), None, "values reproted incorrect len")
@@ -182,6 +182,83 @@ impl<T> Vec<T> { | |||
index | |||
} | |||
|
|||
/// Extends the vector by appending multiple elements at once. | |||
pub fn extend<I>(&self, values: I, fill_columns: impl Fn(&T, &mut [Utf32String])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a non-trivial unsafe function I would like to see some unit tests (just of the boxcar in isolation)
// if we are at the end of the bucket, move on to the next one | ||
if location.entry == location.bucket_len - 1 { | ||
// safety: `location.bucket + 1` is always in bounds | ||
bucket = unsafe { self.buckets.get_unchecked((location.bucket + 1) as usize) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true. end_location
could be the last bucket (which would make this UB).
I think this check should be at the start of the function (and simply check wether the bucket changed compared to the previous location)
Description
This pull request adds an
extend
method to theInjector
struct.The main motivation I have for this comes from trying to optimize loading times for https://github.com/alexpasmantier/television which led me to take a look at
Nucleo
's implementation ofboxcar
.The proposed
extend
method does the following for an incoming batch of values:inflight
atomic)Benchmarks
I took the liberty of adding
Criterion
as a dev dependency in order to run a couple of benchmarks and assess if this was a meaningful feature to add or not.cargo bench raw output
Observations
Sequential execution
The first benchmark compares, for different sizes of input:
While
extend
does look slightly faster thanpush
for most input sizes, I was pretty skeptical at that point that the difference really justified the extra complexity.The slight edge is I believe mostly explained by the fact that
extend
can pre-allocate all the buckets beforehand.Adding values from multiple threads
The second benchmark compares, for different sizes of input:
In this case, the difference becomes quite significant across the entire range of input sizes, mostly - I believe, due to much less contention on atomics, and imho is a nice low hanging optimization for the library.
Curious to have some feedback on this.
Cheers