Skip to content

Commit c0efa37

Browse files
pepijnverluvatonalamb
authored andcommitted
Project record batches to avoid filtering unused columns in CASE evaluation (apache#18329)
## Which issue does this PR close? - Closes apache#18056 - Part of apache#18075 ## Rationale for this change When `CaseExpr` needs to evaluate a `PhysicalExpr` for a subset of the rows of the input `RecordBatch` it will first filter the record batch using a selection vector. This filter steps filters all columns of the `RecordBatch`, including ones that may not be accessed by the `PhysicalExpr`. For wide (many columns) record batches and narrow expressions (few column references) it can be beneficial to project the record batch first to reduce the amount of wasted filtering work. ## What changes are included in this PR? This PR attempts to reduce the amount of time spent filtering columns unnecessarily by reducing the columns of the record batch prior to filtering. Since this renumbers the columns, it is also required to derive new versions of the `when`, `then`, and `else` expressions that have corrected column references. To make this more manageable the set of child expressions of a `case` expression are collected in a new struct named `CaseBody`. The projection logic derives a projection vector and a projected `CaseBody`. This logic is only used when the number of used columns (the length of the projection vector) is less than the number of columns of the incoming record batch. Certain evaluation methods in `case` do not perform any filtering. These remain unchanged and will never perform the projection logic since this is only beneficial when filtering of record batches is required. ## Are these changes tested? - Covered by existing tests ## Are there any user-facing changes? No --------- Co-authored-by: Raz Luvaton <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
1 parent e785613 commit c0efa37

File tree

2 files changed

+361
-110
lines changed
  • datafusion

2 files changed

+361
-110
lines changed

0 commit comments

Comments
 (0)