forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit c0efa37
Project record batches to avoid filtering unused columns in
## Which issue does this PR close?
- Closes apache#18056
- Part of apache#18075
## Rationale for this change
When `CaseExpr` needs to evaluate a `PhysicalExpr` for a subset of the
rows of the input `RecordBatch` it will first filter the record batch
using a selection vector. This filter steps filters all columns of the
`RecordBatch`, including ones that may not be accessed by the
`PhysicalExpr`. For wide (many columns) record batches and narrow
expressions (few column references) it can be beneficial to project the
record batch first to reduce the amount of wasted filtering work.
## What changes are included in this PR?
This PR attempts to reduce the amount of time spent filtering columns
unnecessarily by reducing the columns of the record batch prior to
filtering. Since this renumbers the columns, it is also required to
derive new versions of the `when`, `then`, and `else` expressions that
have corrected column references.
To make this more manageable the set of child expressions of a `case`
expression are collected in a new struct named `CaseBody`. The
projection logic derives a projection vector and a projected `CaseBody`.
This logic is only used when the number of used columns (the length of
the projection vector) is less than the number of columns of the
incoming record batch.
Certain evaluation methods in `case` do not perform any filtering. These
remain unchanged and will never perform the projection logic since this
is only beneficial when filtering of record batches is required.
## Are these changes tested?
- Covered by existing tests
## Are there any user-facing changes?
No
---------
Co-authored-by: Raz Luvaton <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>CASE evaluation (apache#18329)1 parent e785613 commit c0efa37Copy full SHA for c0efa37
0 commit comments