Skip to content

Conversation

@pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Nov 2, 2025

Which issue does this PR close?

Rationale for this change

The ExpressionOrExpression case evaluation method currently uses zip to combine the then and else results for a batch. This requires a scatter operation to ensure the partial results are correctly lined up for the zip algorithm.

By using a custom merge algorithm, this scatter step can be avoided.

What changes are included in this PR?

  • Introduce a zip variant that does not require prealigning truthy and falsy result values with the mask array

Are these changes tested?

Covered by existing case tests

Are there any user-facing changes?

No

@pepijnve
Copy link
Contributor Author

pepijnve commented Nov 2, 2025

@alamb could you do your benchmark thing for this one as well? Looks like about -30% in my own local testing.

@alamb
Copy link
Contributor

alamb commented Nov 3, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing expr_or_expr (821e50d) to efcc216 diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=expr_or_expr
Results will be posted here when complete

@alamb alamb added the performance Make DataFusion faster label Nov 3, 2025
@pepijnve
Copy link
Contributor Author

pepijnve commented Nov 3, 2025

Force push of 2b34d81 was just a squash to make this one easier to cherry-pick locally. No code changes.

@alamb
Copy link
Contributor

alamb commented Nov 3, 2025

🤖: Benchmark completed

Details

group                                                                                                                             expr_or_expr                           main
-----                                                                                                                             ------------                           ----
case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                    1.00    133.9±2.01µs        ? ?/sec    1.00    133.8±1.75µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                         1.00     56.3±0.28µs        ? ?/sec    1.02     57.2±0.20µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                       1.00     18.5±0.35µs        ? ?/sec    1.27     23.5±0.47µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                   1.01      6.8±0.05µs        ? ?/sec    1.00      6.8±0.02µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                           1.00     49.1±0.13ms        ? ?/sec    1.00     49.3±0.18ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                          1.00     54.6±0.60ms        ? ?/sec    1.00     54.3±0.26ms        ? ?/sec
case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                     1.02     30.7±0.51µs        ? ?/sec    1.00     30.1±0.79µs        ? ?/sec
case_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                1.00    118.8±1.86µs        ? ?/sec    1.00    118.9±1.61µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                      1.00    132.1±1.67µs        ? ?/sec    1.00    131.6±1.77µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                           1.00     55.2±0.26µs        ? ?/sec    1.03     56.9±0.21µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                         1.00     18.8±0.42µs        ? ?/sec    1.21     22.8±0.41µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                     1.00      6.8±0.04µs        ? ?/sec    1.00      6.8±0.05µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                             1.00     48.6±0.15ms        ? ?/sec    1.00     48.5±0.19ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                            1.00     54.1±0.27ms        ? ?/sec    1.00     54.0±0.30ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                       1.06     31.3±0.48µs        ? ?/sec    1.00     29.7±0.47µs        ? ?/sec
case_when 8192x3: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                  1.00    118.1±1.04µs        ? ?/sec    1.00    118.3±1.70µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END                     1.01    133.1±1.83µs        ? ?/sec    1.00    132.4±1.88µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                                          1.00     56.0±0.19µs        ? ?/sec    1.02     57.3±0.48µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                        1.00     19.0±0.31µs        ? ?/sec    1.23     23.2±0.47µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                                    1.01      6.8±0.02µs        ? ?/sec    1.00      6.7±0.03µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END                            1.00     48.7±0.19ms        ? ?/sec    1.01     49.0±0.15ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                                           1.01     54.6±0.29ms        ? ?/sec    1.00     54.2±0.21ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                                      1.04     31.4±0.42µs        ? ?/sec    1.00     30.2±0.57µs        ? ?/sec
case_when 8192x50: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                                 1.00    118.8±1.50µs        ? ?/sec    1.00    119.0±1.26µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00    442.3±2.14µs        ? ?/sec    1.00    443.5±2.40µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    492.1±1.92µs        ? ?/sec    1.00    492.4±3.05µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.00    377.4±2.50µs        ? ?/sec    1.00    375.7±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00    446.9±2.46µs        ? ?/sec    1.00    445.7±3.30µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00    494.1±3.86µs        ? ?/sec    1.00    495.0±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    378.5±2.51µs        ? ?/sec    1.00    375.5±3.15µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.00    446.5±2.24µs        ? ?/sec    1.00    447.8±3.97µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.00    490.6±1.63µs        ? ?/sec    1.01    494.8±2.26µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.00    445.5±2.24µs        ? ?/sec    1.00    447.4±2.54µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    193.5±0.72µs        ? ?/sec    1.00    192.2±0.61µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    264.0±2.12µs        ? ?/sec    1.01    266.1±1.91µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.01    247.0±3.03µs        ? ?/sec    1.00    245.5±2.78µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    194.5±1.23µs        ? ?/sec    1.00    192.5±0.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    264.8±3.07µs        ? ?/sec    1.01    267.3±2.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    247.4±2.25µs        ? ?/sec    1.00    244.5±0.89µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    194.1±0.90µs        ? ?/sec    1.00    193.2±0.78µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    264.5±3.45µs        ? ?/sec    1.01    267.0±2.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.00    194.9±1.27µs        ? ?/sec    1.00    194.3±1.31µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.00    587.8±7.04µs        ? ?/sec    1.00    589.3±4.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    625.3±5.03µs        ? ?/sec    1.00    624.7±3.21µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    469.3±5.08µs        ? ?/sec    1.00    466.8±2.20µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.00    588.9±4.00µs        ? ?/sec    1.00    588.4±2.94µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.00    627.8±4.64µs        ? ?/sec    1.00    628.7±2.85µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.00    468.8±1.63µs        ? ?/sec    1.00    467.0±3.75µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01    587.2±3.07µs        ? ?/sec    1.00    583.9±2.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01    626.5±3.37µs        ? ?/sec    1.00    622.6±2.54µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01    589.0±6.27µs        ? ?/sec    1.00    584.8±3.00µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    194.4±1.25µs        ? ?/sec    1.00    192.3±0.79µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    264.2±1.35µs        ? ?/sec    1.00    265.2±1.28µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.00    246.1±1.60µs        ? ?/sec    1.00    245.7±1.51µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.00    194.5±1.79µs        ? ?/sec    1.00    194.1±2.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    264.5±3.02µs        ? ?/sec    1.01    266.9±2.05µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    247.0±6.91µs        ? ?/sec    1.00    244.3±0.85µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.00    194.4±1.18µs        ? ?/sec    1.00    194.7±0.88µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    264.4±1.59µs        ? ?/sec    1.01    267.0±2.35µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    194.9±1.26µs        ? ?/sec    1.00    193.0±0.70µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.01    325.3±1.21µs        ? ?/sec    1.00    321.7±1.59µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.00    380.6±2.12µs        ? ?/sec    1.00    381.4±2.73µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.01    306.6±1.54µs        ? ?/sec    1.00    304.0±3.20µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.01    325.3±1.83µs        ? ?/sec    1.00    321.8±1.36µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.00    382.0±2.61µs        ? ?/sec    1.00    383.2±2.55µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.01    308.2±1.76µs        ? ?/sec    1.00    306.3±2.70µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.01    326.0±1.62µs        ? ?/sec    1.00    321.9±1.72µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.00    382.1±2.11µs        ? ?/sec    1.00    383.0±3.92µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.01    326.6±1.05µs        ? ?/sec    1.00    323.0±1.57µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.01    193.6±1.77µs        ? ?/sec    1.00    192.0±1.45µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.00    264.0±1.03µs        ? ?/sec    1.00    264.8±2.14µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.01    247.7±2.41µs        ? ?/sec    1.00    244.8±1.88µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.01    194.2±0.53µs        ? ?/sec    1.00    192.4±0.50µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.00    263.9±2.91µs        ? ?/sec    1.01    265.7±3.01µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.01    247.3±1.51µs        ? ?/sec    1.00    245.6±0.63µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.00    193.6±0.90µs        ? ?/sec    1.00    193.6±0.68µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.00    265.3±3.08µs        ? ?/sec    1.00    266.6±2.95µs        ? ?/sec
lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.00    193.8±0.73µs        ? ?/sec    1.00    194.1±0.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.01    809.9±4.03µs        ? ?/sec    1.00    804.9±7.14µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.00    862.0±4.32µs        ? ?/sec    1.00    858.6±6.61µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    596.4±3.17µs        ? ?/sec    1.00    589.6±7.41µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.01    812.7±4.68µs        ? ?/sec    1.00    804.0±3.77µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.01    862.8±5.89µs        ? ?/sec    1.00    856.2±5.53µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    595.2±4.22µs        ? ?/sec    1.00    587.1±1.33µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01    811.4±5.44µs        ? ?/sec    1.00    805.0±3.26µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01    863.3±8.49µs        ? ?/sec    1.00    854.0±2.30µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01    810.1±2.46µs        ? ?/sec    1.00    803.9±2.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.02    211.2±0.80µs        ? ?/sec    1.00    208.0±0.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    316.4±2.27µs        ? ?/sec    1.00    315.9±1.81µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.02    278.7±1.59µs        ? ?/sec    1.00    273.6±2.92µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    211.6±2.73µs        ? ?/sec    1.00    208.5±0.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.01    317.2±2.87µs        ? ?/sec    1.00    315.1±1.35µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    278.2±0.94µs        ? ?/sec    1.00    275.2±4.97µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.01    211.2±0.72µs        ? ?/sec    1.00    208.1±0.78µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.00    316.1±1.33µs        ? ?/sec    1.00    315.4±1.24µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    211.7±1.76µs        ? ?/sec    1.00    208.7±2.18µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0           1.01   1171.6±7.63µs        ? ?/sec    1.00   1163.2±8.79µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1         1.01   1172.9±4.37µs        ? ?/sec    1.00   1162.3±7.09µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5         1.01    786.4±2.67µs        ? ?/sec    1.00    781.4±3.19µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0           1.01   1171.2±8.32µs        ? ?/sec    1.00   1161.5±6.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1         1.01   1172.9±4.52µs        ? ?/sec    1.00   1161.2±4.73µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5         1.01    787.0±2.28µs        ? ?/sec    1.00    779.2±2.41µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0           1.01   1171.7±7.59µs        ? ?/sec    1.00   1163.0±7.11µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1         1.01   1172.9±6.37µs        ? ?/sec    1.00   1163.3±7.17µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0             1.01   1172.1±6.28µs        ? ?/sec    1.00   1159.1±3.56µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0      1.01    211.7±1.59µs        ? ?/sec    1.00    208.7±1.26µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1    1.00    315.5±1.51µs        ? ?/sec    1.00    315.2±2.75µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5    1.01    278.8±1.79µs        ? ?/sec    1.00    275.0±1.81µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0      1.01    211.1±0.82µs        ? ?/sec    1.00    209.1±1.09µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1    1.00    316.5±2.30µs        ? ?/sec    1.00    315.0±2.77µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5    1.01    278.0±2.18µs        ? ?/sec    1.00    274.4±1.60µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0      1.02    211.7±1.31µs        ? ?/sec    1.00    208.5±0.94µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1    1.01    316.6±1.79µs        ? ?/sec    1.00    314.8±2.14µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0        1.01    211.1±1.29µs        ? ?/sec    1.00    208.8±1.35µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0            1.01    504.3±3.48µs        ? ?/sec    1.00    501.6±3.47µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1          1.01    586.3±4.17µs        ? ?/sec    1.00    582.0±5.04µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5          1.00    434.9±2.55µs        ? ?/sec    1.00    435.4±3.16µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0            1.00    502.1±2.78µs        ? ?/sec    1.00    501.8±2.24µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1          1.00    584.4±4.38µs        ? ?/sec    1.00    582.2±3.95µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5          1.00    435.0±2.44µs        ? ?/sec    1.00    434.7±2.29µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0            1.00    501.2±1.46µs        ? ?/sec    1.00    502.2±3.10µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1          1.01    585.6±3.32µs        ? ?/sec    1.00    581.9±2.51µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0              1.00    501.7±2.08µs        ? ?/sec    1.00    504.0±1.65µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0       1.02    211.4±2.24µs        ? ?/sec    1.00    208.1±0.88µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1     1.01    315.6±1.26µs        ? ?/sec    1.00    314.0±1.08µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5     1.02    279.3±2.26µs        ? ?/sec    1.00    274.2±1.08µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0       1.02    212.2±2.13µs        ? ?/sec    1.00    208.7±1.46µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1     1.00    317.1±3.70µs        ? ?/sec    1.00    315.6±3.11µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5     1.01    277.6±1.11µs        ? ?/sec    1.00    274.2±0.74µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0       1.01    211.7±1.75µs        ? ?/sec    1.00    209.1±2.52µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1     1.01    318.3±2.93µs        ? ?/sec    1.00    315.2±2.87µs        ? ?/sec
lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0         1.02    211.8±2.48µs        ? ?/sec    1.00    208.7±1.32µs        ? ?/sec

@pepijnve
Copy link
Contributor Author

pepijnve commented Nov 3, 2025

Benchmark results confirm 20-27% on the relevant benchmarks. That's about the percentage of time I was seeing in the profiler assigned to the scatter operation.

@alamb
Copy link
Contributor

alamb commented Nov 3, 2025

For anyone else following along, here are the relevant benchmarks:

case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                       1.00     18.5±0.35µs        ? ?/sec    1.27     23.5±0.47µs        ? ?/sec

case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                         1.00     18.8±0.42µs        ? ?/sec    1.21     22.8±0.41µs        ? ?/sec

case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                                        1.00     19.0±0.31µs        ? ?/sec    1.23     23.2±0.47µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pepijnve -- I think the code makes sense to me and the benchmark results check out ✅

The only thing I am not sure about is the new merge / zip kernel -- I think we may be able to reuse zip upstream in arrow-rs after the next arrow release

filter.filter(array)
}

fn merge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an implementation of zip to me (rather than merge). It seems like it would be better to use consistent terminology if this is indeed the case

This looks very similar to the fancy new code that @rluvaton added to arrow for zip with scalars (will be released in arrow 57.1.0):

If you agree it is the same, perhaps we can either avoid adding this method to DataFusion or else we can add a comment that says we can revert to using just zip once apache/arrow-rs#8653 is available

Copy link
Contributor Author

@pepijnve pepijnve Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's almost the same as zip, but different enough that it's necessary. Without this implementation you can't avoid the scatter step.

I've added a test case to show the difference. The short version is that merge([true, false, true], [A, C], [B]) will get you [A, B, C] while zip would return an error stating all arrays should have the same length.

I agree that these two merge kernels would be better off in arrow-rs which is why I made PR apache/arrow-rs#8753.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rluvaton's work on zip only covers the case of two scalar inputs BTW. That's why I chose to delegate to plain zip in that case. array/array, scalar/array and array/scalar still needs the specific logic here.

The subtle difference between this and zip is in

let falsy_length = start - filled;
let falsy_end = falsy_offset + falsy_length;
mutable.extend(1, falsy_offset, falsy_end);
falsy_offset = falsy_end;

vs

mutable.extend(1, filled, start);

where zip is using the slice indices from the mask directly, merge only uses the length of the slices and tracks the amount taken from truthy and falsy separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I realised this morning I had chosen a rather poor example in the arrow-rs PR. I've updated it to illustrate the truthy/falsy length difference.

}
};

let optimize_filter = batch.num_columns() > 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth checking if there are any nested types (e.g. structarrays) and optimize the filter in that case too -- this is done elsewhere (maybe in the filter kernel itself 🤔 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The logic that handles that isn't pub in arrow-rs unfortunately. I can duplicate it here if you like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed for now. Would it be useful to make this a method of DataType? I can prepare an arrow-rs PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DataType method didn't make much sense to me after all since the predicate is very much tied to the actual filter implementation logic. I went for apache/arrow-rs#8782 instead.

@alamb alamb added this pull request to the merge queue Nov 5, 2025
@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

Thanks again @pepijnve

Merged via the queue into apache:main with commit 32d2618 Nov 5, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants