-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(rust): Refactor compute kernels in polars-arrow to avoid using gather #19669
base: main
Are you sure you want to change the base?
refactor(rust): Refactor compute kernels in polars-arrow to avoid using gather #19669
Conversation
} | ||
let take_values = unsafe { | ||
crate::compute::take::take_unchecked(list.values().as_ref(), &indices.freeze()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting nullable List -> FixedSizeList, used a gather to ensure the width of the null slots - have updated this to use Growable
instead.
} | ||
|
||
let values = arr.values(); | ||
// SAFETY: | ||
// the indices we generate are in bounds | ||
unsafe { Ok(take_unchecked(&**values, &take_by)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list.get()
/ array.get()
were building selection indices and then calling gather with them - I've re-written them to use loops instead.
out = s.arr.get(100, null_on_oob=False) | ||
|
||
with pytest.raises(ComputeError, match="get index -3 is out of bounds"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by - print the oob index in error message
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19669 +/- ##
=======================================
Coverage 79.72% 79.73%
=======================================
Files 1542 1542
Lines 212208 212232 +24
Branches 2449 2449
=======================================
+ Hits 169182 169220 +38
+ Misses 42472 42458 -14
Partials 554 554 ☔ View full report in Codecov by Sentry. |
There might be some performance implications in those rewrites. It's hard to tell. I think we actually should move the |
7e5705e
to
5dff00a
Compare
I tried to move the casting code, but I don't think it's possible as the cast is currently used by polars/crates/polars-arrow/src/ffi/array.rs Lines 111 to 114 in 3cdb7c2
From benchmarking, the PR as it is improves # DF
shape: (20_000_000, 2)
┌──────────┬─────────────────────────────────┐
│ i64 ┆ list │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞══════════╪═════════════════════════════════╡
│ 6765403 ┆ [6765403, 6765403, … 6765403] │
│ 16059030 ┆ [16059030, 16059030, … 1605903… │
...
# This PR
cast list->array 0.7143477080389857
list.get(i64) 0.2533301250077784
arr.get(col(indices)) 0.860774208791554
# 1.12.0
cast list->array 1.0722814579494298
list.get(i64) 0.2003235830925405
arr.get(col(indices)) 0.5676086670719087 I believe for I think what I'll try next is to probably move |
Oh, that one we can match the dtype (it's either BinaryView, or Utf8View) and as_any().downcast_ref(). Get that type out and then call to_binview if it is Utf8View. |
Before we can move the gather logic to
polars-compute
we need to remove all uses of it inpolars-arrow
, as it will no longer be accessible inpolars-arrow
after the move.