-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add bitwise ops for BooleanBufferBuilder
and for MutableBuffer
#8619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add bitwise ops for BooleanBufferBuilder
and for MutableBuffer
#8619
Conversation
…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer
I will try and review this one tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- I haven't made it through this PR yet but the idea of optimized bitwise operations even for offset data is very compelling. The code is also very well tested and documented in my opinion. Thank you.
My primary concern is with the complexity of this code (including the unsafe
) though your tests and documentation make it much easier to contemplate. I did have a few comments so far. I think with some more study I could find
Can you please share the benchmarks you are using / any WIP? I want to confirm the performance improvements before studying this code in more detail
FYI @tustvold and @crepererum and @jhorstmann if you are interested
/// (e.g. `BooleanBufferBuilder`). | ||
/// | ||
/// ## Why this trait is needed, can't we just use `MutableBuffer` directly? | ||
/// Sometimes we don't want to expose the inner `MutableBuffer` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this rationale. It seems to me that this code does expose the inner MutableBuffer
for BooleanBufferBuilder
(other code can modify the MutableBuffer) it just does so via a trait. I am not sure how that is different than just passing in mutable buffer directly
I wonder why you can't just pass &mut [u8]
(aka pass in the mutable slices directly) as none of the APIs seem to change the length of the underlying buffers 🤔
if it is absolutely required to use a MutableBuffer
directly from BooleanBufferBuilder
perhaps we can make an unsafe API instead:
impl BooleanBufferBuilder {
/// returns a mutable reference to the buffer and length. Callers must ensure if they change the length
/// the buffer, they also update len
pub unsafe fn inner(&mut self) -> (&mut MutableBuffer, &mut usize) { ... }
}
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you see it exposing mutable buffer? It only expose the slice
And not passing bytes to be similar to buffer ops and to make sure that user understand they need to be bit packed but don't have strong opinions about the last thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you see it exposing mutable buffer? It only expose the slice
I was thinking of this code in particular, which seems to pass a MutableBuffer
reference directly out of the BooleanBufferBuilder
impl MutableOpsBufferSupportedLhs for BooleanBufferBuilder {
fn inner_mutable_buffer(&mut self) -> &mut MutableBuffer {
&mut self.buffer
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but this is pub(crate) on purpose (documented on the trait level) to not expose it further than current crate
return; | ||
} | ||
|
||
// We are now byte aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how you can byte align the operations if they both have an offset
For example if you had lhs_offset=1
and rhs_offset=2
how can you byte align that operation? It seems like it would requires shifting each byte / word to get alignment and then handling the remaining bits as edge cases 🤔
However, your tests seem to cover this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only byte align the mutable by calling op with the bits remaining until a byte
.map(|(l, r)| expected_op(*l, *r)) | ||
.collect(); | ||
|
||
super::mutable_bitwise_bin_op_helper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice test
#[test] | ||
fn test_binary_ops_different_offsets() { | ||
let (left, right) = create_test_data(200); | ||
test_all_binary_ops(&left, &right, 3, 7, 50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please also test an offset that is greater than 1 byte but less than 8 bytes?
Something like this perhaps?
`test_all_binary_ops(&left, &right, 13, 27, 100);`
|
||
let is_mutable_buffer_byte_aligned = left_bit_offset == 0; | ||
|
||
if is_mutable_buffer_byte_aligned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth special casing the case where both left_offset and right_offset are zero? In that case a simple loop that compared u64
by u64
is probably fastest (maybe even u128
🤔 )
|
||
// Helper to create test data of specific length | ||
fn create_test_data(len: usize) -> (Vec<bool>, Vec<bool>) { | ||
let left: Vec<bool> = (0..len).map(|i| i % 2 == 0).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add more randomness in these patterns? I worry these repeating patterns don't cover all the cases
Perhaps something like this (I tried this locally and all the tests still pass)
// Helper to create test data of specific length
fn create_test_data(len: usize) -> (Vec<bool>, Vec<bool>) {
let mut rng = rand::rng();
let left: Vec<bool> = (0..len).map(|_| rng.random_bool(0.5)).collect();
let right: Vec<bool> = (0..len).map(|_| rng.random_bool(0.5)).collect();
(left, right)
}
struct U64UnalignedSlice<'a> { | ||
/// Pointer to the start of the u64 data | ||
/// | ||
/// We are using raw pointer as the data came from a u8 slice so we need to read and write unaligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using Unsafe, would it make sense to align the pointer to u64
instead And handle any starting / ending bytes that were not u64 aligned specially? That might make the code simpler / faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it require copy? Or you mean https://doc.rust-lang.org/std/primitive.slice.html#method.align_to
Which I used at first but removed as there is no guarantee that you wouldn't get the remainder at prefix instead of suffix
|
||
impl BitAndAssign<&BooleanBuffer> for BooleanBufferBuilder { | ||
fn bitand_assign(&mut self, rhs: &BooleanBuffer) { | ||
assert_eq!(self.len, rhs.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be nice to document somewhere that using the bitwise operators on BooleanBuffer/Builders with the different lengths will panic
You can change the code that I described |
I plan to spend more time studying this PR tomorrow morning with a fresh pair of eyes |
Which issue does this PR close?
BooleanBufferBuilder
andMutableBuffer
that mutate directly the buffer #8618.Rationale for this change
Allowing to combine BooleanBuffers without a lot of copies and more (see issue)
What changes are included in this PR?
Created most of
Buffer
ops that exists inarrow-buffer/src/buffer/ops.rs
forMutableBuffer
andBooleanBufferBuilder
because we can't create
BitChunksMut
due to the reasons described below I had to port those to the mutable ops codeImplementation notes
Why there is a trait for
MutableOpsBufferSupportedLhs
and not gettingMutableBuffer
like theBuffer
ops getBuffer
Because then we wouldn't be able to do an operation (e.g.
AND
) on a subset (e.g. from bit 10 to bit 100) of aBooleanBufferBuilder
becauseBooleanBufferBuilder
does not exposeMutableBuffer
and I don't want to expose it as the user could then add some values that will affect theBooleanBufferBuilder
length without updating the lengthWhy there is a trait for
BufferSupportedRhs
and not gettingBuffer
like theBuffer
ops getBuffer
Because we want to be able to do
MutableBuffer & Buffer
and alsoMutableBuffer & MutableBuffer
Why not creating
BitChunksMut
forMutableBuffer
and making the code be likeBuffer
which is very simple opsAt first I thought of implementing
BitChunksMut
forMutableBuffer
like and implement the ops the same way that it was implemented for Buffer but saw that it was impossible as:u64
and I can't get a reference for thatu64
and convert them to little endian as bit-packed buffers are stored starting with the least-significant byte first.len % 64
)Are these changes tested?
Yes, although I did not run them on big endian machine
Are there any user-facing changes?
Yes, new functions which are documented
I will later change
BooleanBufferBuilder#append_packed_range
function to usemutable_bitwise_bin_op_helper
as I saw that running theboolean_append_packed
benchmark improved by 57%