feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` #8619

rluvaton · 2025-10-15T08:18:49Z

Which issue does this PR close?

Closes Add bitwise ops on BooleanBufferBuilder and MutableBuffer that mutate directly the buffer #8618.

Rationale for this change

Allowing to combine BooleanBuffers without a lot of copies and more (see issue)

What changes are included in this PR?

Created most of Buffer ops that exists in arrow-buffer/src/buffer/ops.rs for MutableBuffer and BooleanBufferBuilder
because we can't create BitChunksMut due to the reasons described below I had to port those to the mutable ops code

Implementation notes

Why there is a trait for `MutableOpsBufferSupportedLhs` and not getting `MutableBuffer` like the `Buffer` ops get `Buffer`

Because then we wouldn't be able to do an operation (e.g. AND) on a subset (e.g. from bit 10 to bit 100) of a BooleanBufferBuilder because BooleanBufferBuilder does not expose MutableBuffer and I don't want to expose it as the user could then add some values that will affect the BooleanBufferBuilder length without updating the length

Why there is a trait for `BufferSupportedRhs` and not getting `Buffer` like the `Buffer` ops get `Buffer`

Because we want to be able to do MutableBuffer & Buffer and also MutableBuffer & MutableBuffer

Why not creating `BitChunksMut` for `MutableBuffer` and making the code be like `Buffer` which is very simple ops

At first I thought of implementing BitChunksMut for MutableBuffer like and implement the ops the same way that it was implemented for Buffer but saw that it was impossible as:

I might get a bit offset to do the op from that is between 2 u64 and I can't get a reference for that
We read each u64 and convert them to little endian as bit-packed buffers are stored starting with the least-significant byte first.
can't get mutable value for the remainder of the bytes (len % 64)

Are these changes tested?

Yes, although I did not run them on big endian machine

Are there any user-facing changes?

Yes, new functions which are documented

I will later change BooleanBufferBuilder#append_packed_range function to use mutable_bitwise_bin_op_helper as I saw that running the boolean_append_packed benchmark improved by 57%

boolean_append_packed   time:   [2.0079 µs 2.0139 µs 2.0202 µs]
                        change: [−57.808% −57.653% −57.494%] (p = 0.00 < 0.05)
                        Performance has improved.

…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer

alamb · 2025-10-16T18:48:32Z

I will try and review this one tomorrow

alamb

Thank you @rluvaton -- I haven't made it through this PR yet but the idea of optimized bitwise operations even for offset data is very compelling. The code is also very well tested and documented in my opinion. Thank you.

My primary concern is with the complexity of this code (including the unsafe) though your tests and documentation make it much easier to contemplate. I did have a few comments so far. I think with some more study I could find

Can you please share the benchmarks you are using / any WIP? I want to confirm the performance improvements before studying this code in more detail

FYI @tustvold and @crepererum and @jhorstmann if you are interested

alamb · 2025-10-17T13:54:44Z

arrow-buffer/src/buffer/mutable_ops.rs

+/// (e.g. `BooleanBufferBuilder`).
+///
+/// ## Why this trait is needed, can't we just use `MutableBuffer` directly?
+/// Sometimes we don't want to expose the inner `MutableBuffer`


I don't understand this rationale. It seems to me that this code does expose the inner MutableBuffer for BooleanBufferBuilder (other code can modify the MutableBuffer) it just does so via a trait. I am not sure how that is different than just passing in mutable buffer directly

I wonder why you can't just pass &mut [u8] (aka pass in the mutable slices directly) as none of the APIs seem to change the length of the underlying buffers 🤔

if it is absolutely required to use a MutableBuffer directly from BooleanBufferBuilder perhaps we can make an unsafe API instead:

impl BooleanBufferBuilder { /// returns a mutable reference to the buffer and length. Callers must ensure if they change the length /// the buffer, they also update len pub unsafe fn inner(&mut self) -> (&mut MutableBuffer, &mut usize) { ... } }

🤔

Where do you see it exposing mutable buffer? It only expose the slice

And not passing bytes to be similar to buffer ops and to make sure that user understand they need to be bit packed but don't have strong opinions about the last thing

Where do you see it exposing mutable buffer? It only expose the slice

I was thinking of this code in particular, which seems to pass a MutableBuffer reference directly out of the BooleanBufferBuilder

impl MutableOpsBufferSupportedLhs for BooleanBufferBuilder { fn inner_mutable_buffer(&mut self) -> &mut MutableBuffer { &mut self.buffer } }

Yes but this is pub(crate) on purpose (documented on the trait level) to not expose it further than current crate

alamb · 2025-10-17T14:06:13Z

arrow-buffer/src/buffer/mutable_ops.rs

+            return;
+        }
+
+        // We are now byte aligned


I don't understand how you can byte align the operations if they both have an offset

For example if you had lhs_offset=1 and rhs_offset=2 how can you byte align that operation? It seems like it would requires shifting each byte / word to get alignment and then handling the remaining bits as edge cases 🤔

However, your tests seem to cover this case

I only byte align the mutable by calling op with the bits remaining until a byte

alamb · 2025-10-17T14:08:47Z

arrow-buffer/src/buffer/mutable_ops.rs

+            .map(|(l, r)| expected_op(*l, *r))
+            .collect();
+
+        super::mutable_bitwise_bin_op_helper(


this is a nice test

alamb · 2025-10-17T14:11:07Z

arrow-buffer/src/buffer/mutable_ops.rs

+    #[test]
+    fn test_binary_ops_different_offsets() {
+        let (left, right) = create_test_data(200);
+        test_all_binary_ops(&left, &right, 3, 7, 50);


can you please also test an offset that is greater than 1 byte but less than 8 bytes?

Something like this perhaps?

`test_all_binary_ops(&left, &right, 13, 27, 100);`

alamb · 2025-10-17T14:19:12Z

arrow-buffer/src/buffer/mutable_ops.rs

+
+    let is_mutable_buffer_byte_aligned = left_bit_offset == 0;
+
+    if is_mutable_buffer_byte_aligned {


is it worth special casing the case where both left_offset and right_offset are zero? In that case a simple loop that compared u64 by u64 is probably fastest (maybe even u128 🤔 )

alamb · 2025-10-17T14:23:31Z

arrow-buffer/src/buffer/mutable_ops.rs

+
+    // Helper to create test data of specific length
+    fn create_test_data(len: usize) -> (Vec<bool>, Vec<bool>) {
+        let left: Vec<bool> = (0..len).map(|i| i % 2 == 0).collect();


Can you please add more randomness in these patterns? I worry these repeating patterns don't cover all the cases

Perhaps something like this (I tried this locally and all the tests still pass)

// Helper to create test data of specific length fn create_test_data(len: usize) -> (Vec<bool>, Vec<bool>) { let mut rng = rand::rng(); let left: Vec<bool> = (0..len).map(|_| rng.random_bool(0.5)).collect(); let right: Vec<bool> = (0..len).map(|_| rng.random_bool(0.5)).collect(); (left, right) }

alamb · 2025-10-17T14:27:02Z

arrow-buffer/src/buffer/mutable_ops.rs

+struct U64UnalignedSlice<'a> {
+    /// Pointer to the start of the u64 data
+    ///
+    /// We are using raw pointer as the data came from a u8 slice so we need to read and write unaligned


Rather than using Unsafe, would it make sense to align the pointer to u64 instead And handle any starting / ending bytes that were not u64 aligned specially? That might make the code simpler / faster

Wouldn't it require copy? Or you mean https://doc.rust-lang.org/std/primitive.slice.html#method.align_to

Which I used at first but removed as there is no guarantee that you wouldn't get the remainder at prefix instead of suffix

alamb · 2025-10-17T14:33:55Z

arrow-buffer/src/builder/boolean.rs

+
+impl BitAndAssign<&BooleanBuffer> for BooleanBufferBuilder {
+    fn bitand_assign(&mut self, rhs: &BooleanBuffer) {
+        assert_eq!(self.len, rhs.len());


it might be nice to document somewhere that using the bitwise operators on BooleanBuffer/Builders with the different lengths will panic

rluvaton · 2025-10-17T14:50:04Z

I will later change BooleanBufferBuilder#append_packed_range function to use mutable_bitwise_bin_op_helper as I saw that running the boolean_append_packed benchmark improved by 57%
boolean_append_packed   time:   [2.0079 µs 2.0139 µs 2.0202 µs]

                        change: [−57.808% −57.653% −57.494%] (p = 0.00 < 0.05)

                        Performance has improved.

You can change the code that I described

alamb · 2025-10-17T18:55:43Z

I plan to spend more time studying this PR tomorrow morning with a fresh pair of eyes

rluvaton added 9 commits October 12, 2025 16:48

add bitwise ops

c7d9267

add bitwise ops

d14e5b7

cleanup

739fe0a

pub(crate) as I don't like that we have both mutable and only left mu…

0e15b32

…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer

start adding tests

c442299

add tests

2f28dc3

add trait for left

c4676a6

format

da03628

revert changes

652a256

github-actions bot added the arrow Changes to the arrow crate label Oct 15, 2025

rluvaton added 3 commits October 15, 2025 12:14

fix validation

0c29f0e

remove many unsafe and cleanup

bcd4863

format

6b7bfe9

alamb reviewed Oct 17, 2025

View reviewed changes

alamb added the performance label Oct 19, 2025


		let is_mutable_buffer_byte_aligned = left_bit_offset == 0;

		if is_mutable_buffer_byte_aligned {

feat: add bitwise ops for BooleanBufferBuilder and for MutableBuffer #8619

Are you sure you want to change the base?

feat: add bitwise ops for BooleanBufferBuilder and for MutableBuffer #8619

Uh oh!

Conversation

rluvaton commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implementation notes

Why there is a trait for MutableOpsBufferSupportedLhs and not getting MutableBuffer like the Buffer ops get Buffer

Why there is a trait for BufferSupportedRhs and not getting Buffer like the Buffer ops get Buffer

Why not creating BitChunksMut for MutableBuffer and making the code be like Buffer which is very simple ops

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Oct 16, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluvaton commented Oct 17, 2025

Uh oh!

alamb commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` #8619

feat: add bitwise ops for `BooleanBufferBuilder` and for `MutableBuffer` #8619

rluvaton commented Oct 15, 2025 •

edited

Loading

Why there is a trait for `MutableOpsBufferSupportedLhs` and not getting `MutableBuffer` like the `Buffer` ops get `Buffer`

Why there is a trait for `BufferSupportedRhs` and not getting `Buffer` like the `Buffer` ops get `Buffer`

Why not creating `BitChunksMut` for `MutableBuffer` and making the code be like `Buffer` which is very simple ops

rluvaton Oct 17, 2025 •

edited

Loading

rluvaton Oct 18, 2025 •

edited

Loading