Skip to content

Conversation

@vegarsti
Copy link
Contributor

@vegarsti vegarsti commented Nov 5, 2025

Rationale for this change

Noticed while doing #18424 that the list types List and FixedSizeList uses MutableData to build the reverse array. That turns out to be a lot faster (84% for List and 60% for FixedSizeList) on the benchmark added in #18425:

# cargo bench --bench array_reverse
   Compiling datafusion-functions-nested v50.3.0 (/Users/vegard/dev/datafusion/datafusion/functions-nested)
    Finished `bench` profile [optimized] target(s) in 42.94s
     Running benches/array_reverse.rs (target/release/deps/array_reverse-2c473eed34a53d0a)
Gnuplot not found, using plotters backend
array_reverse_list      time:   [65.812 µs 65.995 µs 66.268 µs]
                        change: [−84.353% −84.303% −84.247%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

array_reverse_fixed_size_list
                        time:   [167.04 µs 167.64 µs 168.21 µs]
                        change: [−61.541% −61.375% −61.213%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Are these changes tested?

Covered by existing sqllogic tests, and one new test for FixedSizeList.

@vegarsti
Copy link
Contributor Author

vegarsti commented Nov 5, 2025

cc @Jefffrey 👀

Comment on lines +178 to +193
// Materialize values from underlying array with take
let indices_array: ArrayRef = if O::IS_LARGE {
Arc::new(UInt64Array::from(
indices
.iter()
.map(|i| i.as_usize() as u64)
.collect::<Vec<_>>(),
))
} else {
Arc::new(UInt32Array::from(
indices
.iter()
.map(|i| i.as_usize() as u32)
.collect::<Vec<_>>(),
))
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated for ListView. I figured twice was not enough to extract to a function, but if we find a nicer way to do it, we can improve both.

@vegarsti vegarsti marked this pull request as ready for review November 5, 2025 18:32
@vegarsti vegarsti changed the title Make array_reverse 5x faster for List, 2.5x for FixedSizeList, by using take Make array_reverse a lot faster for List and FixedSizeList Nov 5, 2025
@vegarsti vegarsti marked this pull request as draft November 5, 2025 18:42
@vegarsti vegarsti marked this pull request as ready for review November 5, 2025 18:54
@vegarsti vegarsti changed the title Make array_reverse a lot faster for List and FixedSizeList Make array_reverse faster for List and FixedSizeList Nov 5, 2025
@vegarsti vegarsti changed the title Make array_reverse faster for List and FixedSizeList Make array_reverse faster for List (~5x) and FixedSizeList (~2.5x) Nov 5, 2025
@vegarsti vegarsti changed the title Make array_reverse faster for List (~5x) and FixedSizeList (~2.5x) Make array_reverse faster for List and FixedSizeList Nov 5, 2025
@vegarsti vegarsti force-pushed the improve-perf-list-reverse branch from 7f84c0e to 38af34a Compare November 5, 2025 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant