Avoid scatter operation in `ExpressionOrExpression` case evaluation method #18444

pepijnve · 2025-11-02T11:30:39Z

Which issue does this PR close?

Part of [EPIC] A collection of items to improve CASE performance #18075.

Rationale for this change

The ExpressionOrExpression case evaluation method currently uses zip to combine the then and else results for a batch. This requires a scatter operation to ensure the partial results are correctly lined up for the zip algorithm.

By using a custom merge algorithm, this scatter step can be avoided.

What changes are included in this PR?

Introduce a zip variant that does not require prealigning truthy and falsy result values with the mask array

Are these changes tested?

Covered by existing case tests

Are there any user-facing changes?

No

pepijnve · 2025-11-02T11:33:27Z

@alamb could you do your benchmark thing for this one as well? Looks like about -30% in my own local testing.

alamb · 2025-11-03T16:20:34Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing expr_or_expr (821e50d) to efcc216 diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=expr_or_expr
Results will be posted here when complete

pepijnve · 2025-11-03T16:48:23Z

Force push of 2b34d81 was just a squash to make this one easier to cherry-pick locally. No code changes.

alamb · 2025-11-03T17:17:27Z

🤖: Benchmark completed

Details

group                                                                                                                             expr_or_expr                           main
-----                                                                                                                             ------------                           ----
case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                    1.00    133.9±2.01µs        ? ?/sec    1.00    133.8±1.75µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                         1.00     56.3±0.28µs        ? ?/sec    1.02     57.2±0.20µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                       1.00     18.5±0.35µs        ? ?/sec    1.27     23.5±0.47µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                   1.01      6.8±0.05µs        ? ?/sec    1.00      6.8±0.02µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                           1.00     49.1±0.13ms        ? ?/sec    1.00     49.3±0.18ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                          1.00     54.6±0.60ms        ? ?/sec    1.00     54.3±0.26ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                     1.02     30.7±0.51µs        ? ?/sec    1.00     30.1±0.79µs        ? ?/sec
case_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                1.00    118.8±1.86µs        ? ?/sec    1.00    118.9±1.61µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                      1.00    132.1±1.67µs        ? ?/sec    1.00    131.6±1.77µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                           1.00     55.2±0.26µs        ? ?/sec    1.03     56.9±0.21µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                         1.00     18.8±0.42µs        ? ?/sec    1.21     22.8±0.41µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                     1.00      6.8±0.04µs        ? ?/sec    1.00      6.8±0.05µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                             1.00     48.6±0.15ms        ? ?/sec    1.00     48.5±0.19ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                            1.00     54.1±0.27ms        ? ?/sec    1.00     54.0±0.30ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                       1.06     31.3±0.48µs        ? ?/sec    1.00     29.7±0.47µs        ? ?/sec
case_when 8192x3: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                  1.00    118.1±1.04µs        ? ?/sec    1.00    118.3±1.70µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                     1.01    133.1±1.83µs        ? ?/sec    1.00    132.4±1.88µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                          1.00     56.0±0.19µs        ? ?/sec    1.02     57.3±0.48µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                        1.00     19.0±0.31µs        ? ?/sec    1.23     23.2±0.47µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                    1.01      6.8±0.02µs        ? ?/sec    1.00      6.7±0.03µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                            1.00     48.7±0.19ms        ? ?/sec    1.01     49.0±0.15ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                           1.01     54.6±0.29ms        ? ?/sec    1.00     54.2±0.21ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                      1.04     31.4±0.42µs        ? ?/sec    1.00     30.2±0.57µs        ? ?/sec
case_when 8192x50: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                 1.00    118.8±1.50µs        ? ?/sec    1.00    119.0±1.26µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00    442.3±2.14µs        ? ?/sec    1.00    443.5±2.40µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    492.1±1.92µs        ? ?/sec    1.00    492.4±3.05µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.00    377.4±2.50µs        ? ?/sec    1.00    375.7±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00    446.9±2.46µs        ? ?/sec    1.00    445.7±3.30µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00    494.1±3.86µs        ? ?/sec    1.00    495.0±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    378.5±2.51µs        ? ?/sec    1.00    375.5±3.15µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.00    446.5±2.24µs        ? ?/sec    1.00    447.8±3.97µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.00    490.6±1.63µs        ? ?/sec    1.01    494.8±2.26µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.00    445.5±2.24µs        ? ?/sec    1.00    447.4±2.54µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    193.5±0.72µs        ? ?/sec    1.00    192.2±0.61µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    264.0±2.12µs        ? ?/sec    1.01    266.1±1.91µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.01    247.0±3.03µs        ? ?/sec    1.00    245.5±2.78µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    194.5±1.23µs        ? ?/sec    1.00    192.5±0.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    264.8±3.07µs        ? ?/sec    1.01    267.3±2.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    247.4±2.25µs        ? ?/sec    1.00    244.5±0.89µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    194.1±0.90µs        ? ?/sec    1.00    193.2±0.78µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    264.5±3.45µs        ? ?/sec    1.01    267.0±2.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    194.9±1.27µs        ? ?/sec    1.00    194.3±1.31µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00    587.8±7.04µs        ? ?/sec    1.00    589.3±4.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    625.3±5.03µs        ? ?/sec    1.00    624.7±3.21µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    469.3±5.08µs        ? ?/sec    1.00    466.8±2.20µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00    588.9±4.00µs        ? ?/sec    1.00    588.4±2.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00    627.8±4.64µs        ? ?/sec    1.00    628.7±2.85µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.00    468.8±1.63µs        ? ?/sec    1.00    467.0±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01    587.2±3.07µs        ? ?/sec    1.00    583.9±2.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01    626.5±3.37µs        ? ?/sec    1.00    622.6±2.54µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01    589.0±6.27µs        ? ?/sec    1.00    584.8±3.00µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    194.4±1.25µs        ? ?/sec    1.00    192.3±0.79µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    264.2±1.35µs        ? ?/sec    1.00    265.2±1.28µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.00    246.1±1.60µs        ? ?/sec    1.00    245.7±1.51µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    194.5±1.79µs        ? ?/sec    1.00    194.1±2.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    264.5±3.02µs        ? ?/sec    1.01    266.9±2.05µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    247.0±6.91µs        ? ?/sec    1.00    244.3±0.85µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    194.4±1.18µs        ? ?/sec    1.00    194.7±0.88µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    264.4±1.59µs        ? ?/sec    1.01    267.0±2.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    194.9±1.26µs        ? ?/sec    1.00    193.0±0.70µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.01    325.3±1.21µs        ? ?/sec    1.00    321.7±1.59µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.00    380.6±2.12µs        ? ?/sec    1.00    381.4±2.73µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.01    306.6±1.54µs        ? ?/sec    1.00    304.0±3.20µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.01    325.3±1.83µs        ? ?/sec    1.00    321.8±1.36µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.00    382.0±2.61µs        ? ?/sec    1.00    383.2±2.55µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.01    308.2±1.76µs        ? ?/sec    1.00    306.3±2.70µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.01    326.0±1.62µs        ? ?/sec    1.00    321.9±1.72µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.00    382.1±2.11µs        ? ?/sec    1.00    383.0±3.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.01    326.6±1.05µs        ? ?/sec    1.00    323.0±1.57µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.01    193.6±1.77µs        ? ?/sec    1.00    192.0±1.45µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.00    264.0±1.03µs        ? ?/sec    1.00    264.8±2.14µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.01    247.7±2.41µs        ? ?/sec    1.00    244.8±1.88µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.01    194.2±0.53µs        ? ?/sec    1.00    192.4±0.50µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.00    263.9±2.91µs        ? ?/sec    1.01    265.7±3.01µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.01    247.3±1.51µs        ? ?/sec    1.00    245.6±0.63µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.00    193.6±0.90µs        ? ?/sec    1.00    193.6±0.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.00    265.3±3.08µs        ? ?/sec    1.00    266.6±2.95µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.00    193.8±0.73µs        ? ?/sec    1.00    194.1±0.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.01    809.9±4.03µs        ? ?/sec    1.00    804.9±7.14µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    862.0±4.32µs        ? ?/sec    1.00    858.6±6.61µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    596.4±3.17µs        ? ?/sec    1.00    589.6±7.41µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.01    812.7±4.68µs        ? ?/sec    1.00    804.0±3.77µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.01    862.8±5.89µs        ? ?/sec    1.00    856.2±5.53µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    595.2±4.22µs        ? ?/sec    1.00    587.1±1.33µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01    811.4±5.44µs        ? ?/sec    1.00    805.0±3.26µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01    863.3±8.49µs        ? ?/sec    1.00    854.0±2.30µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01    810.1±2.46µs        ? ?/sec    1.00    803.9±2.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.02    211.2±0.80µs        ? ?/sec    1.00    208.0±0.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    316.4±2.27µs        ? ?/sec    1.00    315.9±1.81µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.02    278.7±1.59µs        ? ?/sec    1.00    273.6±2.92µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    211.6±2.73µs        ? ?/sec    1.00    208.5±0.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.01    317.2±2.87µs        ? ?/sec    1.00    315.1±1.35µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    278.2±0.94µs        ? ?/sec    1.00    275.2±4.97µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.01    211.2±0.72µs        ? ?/sec    1.00    208.1±0.78µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    316.1±1.33µs        ? ?/sec    1.00    315.4±1.24µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    211.7±1.76µs        ? ?/sec    1.00    208.7±2.18µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.01   1171.6±7.63µs        ? ?/sec    1.00   1163.2±8.79µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.01   1172.9±4.37µs        ? ?/sec    1.00   1162.3±7.09µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    786.4±2.67µs        ? ?/sec    1.00    781.4±3.19µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.01   1171.2±8.32µs        ? ?/sec    1.00   1161.5±6.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.01   1172.9±4.52µs        ? ?/sec    1.00   1161.2±4.73µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    787.0±2.28µs        ? ?/sec    1.00    779.2±2.41µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01   1171.7±7.59µs        ? ?/sec    1.00   1163.0±7.11µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01   1172.9±6.37µs        ? ?/sec    1.00   1163.3±7.17µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01   1172.1±6.28µs        ? ?/sec    1.00   1159.1±3.56µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    211.7±1.59µs        ? ?/sec    1.00    208.7±1.26µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    315.5±1.51µs        ? ?/sec    1.00    315.2±2.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.01    278.8±1.79µs        ? ?/sec    1.00    275.0±1.81µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    211.1±0.82µs        ? ?/sec    1.00    209.1±1.09µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    316.5±2.30µs        ? ?/sec    1.00    315.0±2.77µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    278.0±2.18µs        ? ?/sec    1.00    274.4±1.60µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.02    211.7±1.31µs        ? ?/sec    1.00    208.5±0.94µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.01    316.6±1.79µs        ? ?/sec    1.00    314.8±2.14µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    211.1±1.29µs        ? ?/sec    1.00    208.8±1.35µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.01    504.3±3.48µs        ? ?/sec    1.00    501.6±3.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.01    586.3±4.17µs        ? ?/sec    1.00    582.0±5.04µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.00    434.9±2.55µs        ? ?/sec    1.00    435.4±3.16µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.00    502.1±2.78µs        ? ?/sec    1.00    501.8±2.24µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.00    584.4±4.38µs        ? ?/sec    1.00    582.2±3.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.00    435.0±2.44µs        ? ?/sec    1.00    434.7±2.29µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.00    501.2±1.46µs        ? ?/sec    1.00    502.2±3.10µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.01    585.6±3.32µs        ? ?/sec    1.00    581.9±2.51µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.00    501.7±2.08µs        ? ?/sec    1.00    504.0±1.65µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.02    211.4±2.24µs        ? ?/sec    1.00    208.1±0.88µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.01    315.6±1.26µs        ? ?/sec    1.00    314.0±1.08µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.02    279.3±2.26µs        ? ?/sec    1.00    274.2±1.08µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.02    212.2±2.13µs        ? ?/sec    1.00    208.7±1.46µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.00    317.1±3.70µs        ? ?/sec    1.00    315.6±3.11µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.01    277.6±1.11µs        ? ?/sec    1.00    274.2±0.74µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.01    211.7±1.75µs        ? ?/sec    1.00    209.1±2.52µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.01    318.3±2.93µs        ? ?/sec    1.00    315.2±2.87µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.02    211.8±2.48µs        ? ?/sec    1.00    208.7±1.32µs        ? ?/sec

pepijnve · 2025-11-03T18:25:00Z

Benchmark results confirm 20-27% on the relevant benchmarks. That's about the percentage of time I was seeing in the profiler assigned to the scatter operation.

alamb · 2025-11-03T20:41:25Z

For anyone else following along, here are the relevant benchmarks:

case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                       1.00     18.5±0.35µs        ? ?/sec    1.27     23.5±0.47µs        ? ?/sec

case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                         1.00     18.8±0.42µs        ? ?/sec    1.21     22.8±0.41µs        ? ?/sec

case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                        1.00     19.0±0.31µs        ? ?/sec    1.23     23.2±0.47µs        ? ?/sec

alamb

Thank you @pepijnve -- I think the code makes sense to me and the benchmark results check out ✅

The only thing I am not sure about is the new merge / zip kernel -- I think we may be able to reuse zip upstream in arrow-rs after the next arrow release

alamb · 2025-11-03T20:54:05Z

datafusion/physical-expr/src/expressions/case.rs

    filter.filter(array)
 }

+fn merge(


This looks like an implementation of zip to me (rather than merge). It seems like it would be better to use consistent terminology if this is indeed the case

This looks very similar to the fancy new code that @rluvaton added to arrow for zip with scalars (will be released in arrow 57.1.0):

perf: add optimized zip implementation for scalars arrow-rs#8653

If you agree it is the same, perhaps we can either avoid adding this method to DataFusion or else we can add a comment that says we can revert to using just zip once apache/arrow-rs#8653 is available

It's almost the same as zip, but different enough that it's necessary. Without this implementation you can't avoid the scatter step.

I've added a test case to show the difference. The short version is that merge([true, false, true], [A, C], [B]) will get you [A, B, C] while zip would return an error stating all arrays should have the same length.

I agree that these two merge kernels would be better off in arrow-rs which is why I made PR apache/arrow-rs#8753.

@rluvaton's work on zip only covers the case of two scalar inputs BTW. That's why I chose to delegate to plain zip in that case. array/array, scalar/array and array/scalar still needs the specific logic here.

The subtle difference between this and zip is in

let falsy_length = start - filled; let falsy_end = falsy_offset + falsy_length; mutable.extend(1, falsy_offset, falsy_end); falsy_offset = falsy_end;

vs

mutable.extend(1, filled, start);

where zip is using the slice indices from the mask directly, merge only uses the length of the slices and tracks the amount taken from truthy and falsy separately.

@alamb I realised this morning I had chosen a rather poor example in the arrow-rs PR. I've updated it to illustrate the truthy/falsy length difference.

alamb · 2025-11-03T20:57:07Z

datafusion/physical-expr/src/expressions/case.rs

+            }
+        };
+
+        let optimize_filter = batch.num_columns() > 1;


It might also be worth checking if there are any nested types (e.g. structarrays) and optimize the filter in that case too -- this is done elsewhere (maybe in the filter kernel itself 🤔 )

Agreed. The logic that handles that isn't pub in arrow-rs unfortunately. I can duplicate it here if you like.

Fixed for now. Would it be useful to make this a method of DataType? I can prepare an arrow-rs PR for that.

A DataType method didn't make much sense to me after all since the predicate is very much tied to the actual filter implementation logic. I went for apache/arrow-rs#8782 instead.

alamb · 2025-11-05T16:32:30Z

Thanks again @pepijnve

github-actions bot added the physical-expr Changes to the physical-expr crates label Nov 2, 2025

pepijnve mentioned this pull request Nov 2, 2025

Project record batches to avoid filtering unused columns in CASE evaluation #18329

Merged

pepijnve mentioned this pull request Nov 2, 2025

Add merge and merge_n algorithms apache/arrow-rs#8753

Open

alamb added the performance Make DataFusion faster label Nov 3, 2025

Avoid scatter operation in expr_or_expr=

2b34d81

pepijnve force-pushed the expr_or_expr branch from 821e50d to 2b34d81 Compare November 3, 2025 16:29

pepijnve mentioned this pull request Nov 3, 2025

Release DataFusion 51.0.0 (Nov 2025) #17558

Open

33 tasks

pepijnve mentioned this pull request Nov 3, 2025

perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) #18183

Open

alamb approved these changes Nov 3, 2025

View reviewed changes

pepijnve added 3 commits November 3, 2025 23:46

Add merge test case

bb1f8e0

Formatting

c9455fb

Refine filter optimization decision

5e5dae6

alamb added this pull request to the merge queue Nov 5, 2025

Merged via the queue into apache:main with commit 32d2618 Nov 5, 2025
32 checks passed

Avoid scatter operation in ExpressionOrExpression case evaluation method #18444

Avoid scatter operation in ExpressionOrExpression case evaluation method #18444

Conversation

pepijnve commented Nov 2, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pepijnve commented Nov 2, 2025

Uh oh!

alamb commented Nov 3, 2025

Uh oh!

pepijnve commented Nov 3, 2025

Uh oh!

alamb commented Nov 3, 2025

Uh oh!

pepijnve commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Nov 3, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid scatter operation in `ExpressionOrExpression` case evaluation method #18444

Avoid scatter operation in `ExpressionOrExpression` case evaluation method #18444

pepijnve commented Nov 3, 2025 •

edited

Loading

pepijnve Nov 3, 2025 •

edited

Loading