fix: Eliminate consecutive repartitions #18521

gene-bordegaray · 2025-11-06T22:14:14Z

Which issue does this PR close?

Closes Avoid consecutive RepartitionExec #18341.

Rationale for this change

Cases where two RepartitionExec operators appear consecutively in the plan. This is unneeded overhead that eliminating provides speed ups.

Full Report: The Physical Optimizer and Fixing Consecutive Repartitions In the Enforce Distribution Rule.pdf

Issue Report: Fixing Consecutive Repartitions In the Enforce Distribution Rule.pdf

What changes are included in this PR?

Change to repartition adding logic in enforce_distribution.rs
A ton of test and bench updates to mirror new behavior

Are these changes tested?

Yes benchmarked and tested, check report for benchmarks

Are there any user-facing changes?

…partition_exec

alamb

This looks amazing @gene-bordegaray -- thank you 🙏

I kicked off some benchmarks to make sure it doesn't impact performance. Assuming not I'll then try and take a closer look

alamb · 2025-11-07T21:21:08Z

@NGA-TRAN and @gabotechs could you also please help review this PR?

NGA-TRAN

Wonderful work. Way to go @gene-bordegaray

NGA-TRAN · 2025-11-07T22:04:44Z

datafusion/physical-optimizer/src/enforce_distribution.rs

                }
                Distribution::HashPartitioned(exprs) => {
-                    if add_roundrobin {
+                    if add_roundrobin && !hash_necessary {


To reviewers: This is the fix. The rest is test update

See this for summary of the fix

See this for full analysis in 2 pdf doc

Could we permalink to these in the code?

@adriangb : Do you mean adding the comment with links right at the fix for future reference?

if we link these I would like to finish the docs to be complete. If you notice I cut off at the end of the explanation of the enforce_distribution because this was thorough enough for this bug

alamb · 2025-11-08T01:44:43Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing gene.bordegaray/2025/10/avoid_consecutive_repartition_exec (746d6ba) to a5eb912 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-11-08T02:43:39Z

🤖: Benchmark completed

Details

Comparing HEAD and gene.bordegaray_2025_10_avoid_consecutive_repartition_exec
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃        gene ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2701.66 ms │  2687.43 ms │ no change │
│ QQuery 1     │  1256.52 ms │  1265.34 ms │ no change │
│ QQuery 2     │  2421.37 ms │  2430.82 ms │ no change │
│ QQuery 3     │  1175.67 ms │  1166.23 ms │ no change │
│ QQuery 4     │  2289.33 ms │  2290.09 ms │ no change │
│ QQuery 5     │ 28305.63 ms │ 28486.55 ms │ no change │
│ QQuery 6     │  4049.84 ms │  4039.75 ms │ no change │
│ QQuery 7     │  3747.33 ms │  3683.37 ms │ no change │
└──────────────┴─────────────┴─────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)      │ 45947.35ms │
│ Total Time (gene)      │ 46049.59ms │
│ Average Time (HEAD)    │  5743.42ms │
│ Average Time (gene)    │  5756.20ms │
│ Queries Faster         │          0 │
│ Queries Slower         │          0 │
│ Queries with No Change │          8 │
│ Queries with Failure   │          0 │
└────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃        gene ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.56 ms │     2.24 ms │ +1.15x faster │
│ QQuery 1     │    49.67 ms │    49.68 ms │     no change │
│ QQuery 2     │   136.20 ms │   135.65 ms │     no change │
│ QQuery 3     │   166.98 ms │   159.12 ms │     no change │
│ QQuery 4     │  1109.85 ms │  1196.88 ms │  1.08x slower │
│ QQuery 5     │  1510.69 ms │  1584.73 ms │     no change │
│ QQuery 6     │     2.22 ms │     2.19 ms │     no change │
│ QQuery 7     │    55.25 ms │    56.19 ms │     no change │
│ QQuery 8     │  1478.38 ms │  1542.17 ms │     no change │
│ QQuery 9     │  1837.53 ms │  1850.17 ms │     no change │
│ QQuery 10    │   377.00 ms │   383.87 ms │     no change │
│ QQuery 11    │   436.38 ms │   432.86 ms │     no change │
│ QQuery 12    │  1372.09 ms │  1469.16 ms │  1.07x slower │
│ QQuery 13    │  2147.58 ms │  2217.65 ms │     no change │
│ QQuery 14    │  1296.23 ms │  1332.32 ms │     no change │
│ QQuery 15    │  1265.17 ms │  1299.31 ms │     no change │
│ QQuery 16    │  2711.44 ms │  2773.30 ms │     no change │
│ QQuery 17    │  2709.72 ms │  2744.08 ms │     no change │
│ QQuery 18    │  5621.26 ms │  5105.92 ms │ +1.10x faster │
│ QQuery 19    │   125.58 ms │   130.52 ms │     no change │
│ QQuery 20    │  2034.69 ms │  1999.22 ms │     no change │
│ QQuery 21    │  2307.78 ms │  2302.57 ms │     no change │
│ QQuery 22    │  6432.16 ms │  3979.01 ms │ +1.62x faster │
│ QQuery 23    │ 24802.80 ms │ 12873.93 ms │ +1.93x faster │
│ QQuery 24    │   223.29 ms │   212.35 ms │     no change │
│ QQuery 25    │   483.20 ms │   489.86 ms │     no change │
│ QQuery 26    │   219.08 ms │   210.53 ms │     no change │
│ QQuery 27    │  2788.56 ms │  2799.81 ms │     no change │
│ QQuery 28    │ 23375.34 ms │ 23911.66 ms │     no change │
│ QQuery 29    │   956.28 ms │   979.24 ms │     no change │
│ QQuery 30    │  1380.62 ms │  1363.76 ms │     no change │
│ QQuery 31    │  1409.79 ms │  1402.90 ms │     no change │
│ QQuery 32    │  4872.99 ms │  4523.94 ms │ +1.08x faster │
│ QQuery 33    │  5968.13 ms │  5757.30 ms │     no change │
│ QQuery 34    │  6295.36 ms │  6058.88 ms │     no change │
│ QQuery 35    │  2091.71 ms │  2070.84 ms │     no change │
│ QQuery 36    │   118.00 ms │   121.01 ms │     no change │
│ QQuery 37    │    51.82 ms │    52.62 ms │     no change │
│ QQuery 38    │   121.17 ms │   119.61 ms │     no change │
│ QQuery 39    │   199.56 ms │   196.22 ms │     no change │
│ QQuery 40    │    42.19 ms │    44.91 ms │  1.06x slower │
│ QQuery 41    │    39.41 ms │    40.77 ms │     no change │
│ QQuery 42    │    32.56 ms │    33.39 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)      │ 110658.28ms │
│ Total Time (gene)      │  96012.32ms │
│ Average Time (HEAD)    │   2573.45ms │
│ Average Time (gene)    │   2232.84ms │
│ Queries Faster         │           5 │
│ Queries Slower         │           3 │
│ Queries with No Change │          35 │
│ Queries with Failure   │           0 │
└────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃      gene ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 127.88 ms │ 130.46 ms │    no change │
│ QQuery 2     │  26.94 ms │  28.66 ms │ 1.06x slower │
│ QQuery 3     │  39.13 ms │  39.31 ms │    no change │
│ QQuery 4     │  29.17 ms │  28.85 ms │    no change │
│ QQuery 5     │  87.75 ms │  88.28 ms │    no change │
│ QQuery 6     │  19.89 ms │  20.08 ms │    no change │
│ QQuery 7     │ 226.07 ms │ 234.72 ms │    no change │
│ QQuery 8     │  34.42 ms │  35.61 ms │    no change │
│ QQuery 9     │ 107.81 ms │ 104.83 ms │    no change │
│ QQuery 10    │  65.41 ms │  64.39 ms │    no change │
│ QQuery 11    │  18.45 ms │  18.69 ms │    no change │
│ QQuery 12    │  50.21 ms │  52.75 ms │ 1.05x slower │
│ QQuery 13    │  47.07 ms │  47.95 ms │    no change │
│ QQuery 14    │  13.84 ms │  14.03 ms │    no change │
│ QQuery 15    │  25.17 ms │  25.04 ms │    no change │
│ QQuery 16    │  25.59 ms │  25.01 ms │    no change │
│ QQuery 17    │ 151.65 ms │ 155.29 ms │    no change │
│ QQuery 18    │ 286.29 ms │ 285.04 ms │    no change │
│ QQuery 19    │  37.89 ms │  38.83 ms │    no change │
│ QQuery 20    │  49.58 ms │  49.98 ms │    no change │
│ QQuery 21    │ 334.18 ms │ 342.72 ms │    no change │
│ QQuery 22    │  21.03 ms │  21.40 ms │    no change │
└──────────────┴───────────┴───────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)      │ 1825.43ms │
│ Total Time (gene)      │ 1851.91ms │
│ Average Time (HEAD)    │   82.97ms │
│ Average Time (gene)    │   84.18ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         2 │
│ Queries with No Change │        20 │
│ Queries with Failure   │         0 │
└────────────────────────┴───────────┘

2010YOUY01

Thank you for the amazing work.

Removing unnecessary consecutive RepartitionExec makes sense to me, however I have went through half of the test and found 2 plan changes that is not 2 RepartitionExec -> 1 RepartitionExec, I'm wondering is that expected? Do we have other plan changes that is not removing 1 of the consecutive RepartitionExec?

The first one see review comment.

The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

// before
> explain select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
        o_orderdate >= '1993-07-01'
  and o_orderdate < date '1993-07-01' + interval '3' month
  and exists (
        select
            *
        from
            lineitem
        where
                l_orderkey = o_orderkey
          and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;
+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐                              |
|               | │  SortPreservingMergeExec  │                              |
|               | │    --------------------   │                              |
|               | │ o_orderpriority ASC NULLS │                              |
|               | │            LAST           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │          SortExec         │                              |
|               | │    --------------------   │                              |
|               | │   o_orderpriority@0 ASC   │                              |
|               | │         NULLS LAST        │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       ProjectionExec      │                              |
|               | │    --------------------   │                              |
|               | │      o_orderpriority:     │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │        order_count:       │                              |
|               | │      count(Int64(1))      │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       AggregateExec       │                              |
|               | │    --------------------   │                              |
|               | │       aggr: count(1)      │                              |
|               | │                           │                              |
|               | │         group_by:         │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │           mode:           │                              |
|               | │      FinalPartitioned     │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | │    --------------------   │                              |
|               | │     target_batch_size:    │                              |
|               | │            8192           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │      RepartitionExec      │                              |
|               | │    --------------------   │                              |
|               | │ partition_count(in->out): │                              |
|               | │          14 -> 14         │                              |
|               | │                           │                              |
|               | │    partitioning_scheme:   │                              |
|               | │ Hash([o_orderpriority@0], │                              |
|               | │             14)           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       AggregateExec       │                              |
|               | │    --------------------   │                              |
|               | │       aggr: count(1)      │                              |
|               | │                           │                              |
|               | │         group_by:         │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │       mode: Partial       │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | │    --------------------   │                              |
|               | │     target_batch_size:    │                              |
|               | │            8192           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │        HashJoinExec       │                              |
|               | │    --------------------   │                              |
|               | │    join_type: RightSemi   │                              |
|               | │                           ├──────────────┐               |
|               | │            on:            │              │               |
|               | │ (l_orderkey = o_orderkey) │              │               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │   CoalescePartitionsExec  ││    CoalesceBatchesExec    │ |
|               | │                           ││    --------------------   │ |
|               | │                           ││     target_batch_size:    │ |
|               | │                           ││            8192           │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │    CoalesceBatchesExec    ││         FilterExec        │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │     target_batch_size:    ││         predicate:        │ |
|               | │            8192           ││ o_orderdate >= 1993-07-01 │ |
|               | │                           ││   AND o_orderdate < 1993  │ |
|               | │                           ││           -10-01          │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │         FilterExec        ││      RepartitionExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │         predicate:        ││ partition_count(in->out): │ |
|               | │      l_receiptdate >      ││          1 -> 14          │ |
|               | │        l_commitdate       ││                           │ |
|               | │                           ││    partitioning_scheme:   │ |
|               | │                           ││    RoundRobinBatch(14)    │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │ partition_count(in->out): ││          files: 1         │ |
|               | │          1 -> 14          ││      format: parquet      │ |
|               | │                           ││                           │ |
|               | │    partitioning_scheme:   ││         predicate:        │ |
|               | │    RoundRobinBatch(14)    ││ o_orderdate >= 1993-07-01 │ |
|               | │                           ││   AND o_orderdate < 1993  │ |
|               | │                           ││           -10-01          │ |
|               | └─────────────┬─────────────┘└───────────────────────────┘ |
|               | ┌─────────────┴─────────────┐                              |
|               | │       DataSourceExec      │                              |
|               | │    --------------------   │                              |
|               | │          files: 1         │                              |
|               | │      format: parquet      │                              |
|               | │                           │                              |
|               | │         predicate:        │                              |
|               | │      l_receiptdate >      │                              |
|               | │        l_commitdate       │                              |
|               | └───────────────────────────┘                              |
|               |                                                            |
+---------------+------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.011 seconds.

// PR
+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐                              |
|               | │  SortPreservingMergeExec  │                              |
|               | │    --------------------   │                              |
|               | │ o_orderpriority ASC NULLS │                              |
|               | │            LAST           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │          SortExec         │                              |
|               | │    --------------------   │                              |
|               | │   o_orderpriority@0 ASC   │                              |
|               | │         NULLS LAST        │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       ProjectionExec      │                              |
|               | │    --------------------   │                              |
|               | │      o_orderpriority:     │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │        order_count:       │                              |
|               | │      count(Int64(1))      │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       AggregateExec       │                              |
|               | │    --------------------   │                              |
|               | │       aggr: count(1)      │                              |
|               | │                           │                              |
|               | │         group_by:         │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │           mode:           │                              |
|               | │      FinalPartitioned     │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | │    --------------------   │                              |
|               | │     target_batch_size:    │                              |
|               | │            8192           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │      RepartitionExec      │                              |
|               | │    --------------------   │                              |
|               | │ partition_count(in->out): │                              |
|               | │          14 -> 14         │                              |
|               | │                           │                              |
|               | │    partitioning_scheme:   │                              |
|               | │ Hash([o_orderpriority@0], │                              |
|               | │             14)           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │       AggregateExec       │                              |
|               | │    --------------------   │                              |
|               | │       aggr: count(1)      │                              |
|               | │                           │                              |
|               | │         group_by:         │                              |
|               | │      o_orderpriority      │                              |
|               | │                           │                              |
|               | │       mode: Partial       │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | │    --------------------   │                              |
|               | │     target_batch_size:    │                              |
|               | │            8192           │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │        HashJoinExec       │                              |
|               | │    --------------------   │                              |
|               | │    join_type: RightSemi   │                              |
|               | │                           ├──────────────┐               |
|               | │            on:            │              │               |
|               | │ (l_orderkey = o_orderkey) │              │               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │   CoalescePartitionsExec  ││    CoalesceBatchesExec    │ |
|               | │                           ││    --------------------   │ |
|               | │                           ││     target_batch_size:    │ |
|               | │                           ││            8192           │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │    CoalesceBatchesExec    ││         FilterExec        │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │     target_batch_size:    ││         predicate:        │ |
|               | │            8192           ││ o_orderdate >= 1993-07-01 │ |
|               | │                           ││   AND o_orderdate < 1993  │ |
|               | │                           ││           -10-01          │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │         FilterExec        ││      RepartitionExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │         predicate:        ││ partition_count(in->out): │ |
|               | │      l_receiptdate >      ││          1 -> 14          │ |
|               | │        l_commitdate       ││                           │ |
|               | │                           ││    partitioning_scheme:   │ |
|               | │                           ││    RoundRobinBatch(14)    │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │         files: 14         ││          files: 1         │ |
|               | │      format: parquet      ││      format: parquet      │ |
|               | │                           ││                           │ |
|               | │         predicate:        ││         predicate:        │ |
|               | │      l_receiptdate >      ││ o_orderdate >= 1993-07-01 │ |
|               | │        l_commitdate       ││   AND o_orderdate < 1993  │ |
|               | │                           ││           -10-01          │ |
|               | └───────────────────────────┘└───────────────────────────┘ |
|               |                                                            |
+---------------+------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.013 seconds.

2010YOUY01 · 2025-11-08T09:01:22Z

datafusion/sqllogictest/test_files/joins.slt

-15)--------------ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1]
-16)----------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
-17)------------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true
+02)--SortMergeJoin: join_type=Inner, on=[(a@1, a@1)]


Here it removed a SortExec, is it expected?

See my three comments below. If the data in each corresponding partition/stream of the join input is already sorted, we can skip the re-sorting step and simply perform a merge join on each matching partition/stream.

gene-bordegaray · 2025-11-08T14:29:40Z

Thank you for the amazing work.

Removing unnecessary consecutive RepartitionExec makes sense to me, however I have went through half of the test and found 2 plan changes that is not 2 RepartitionExec -> 1 RepartitionExec, I'm wondering is that expected? Do we have other plan changes that is not removing 1 of the consecutive RepartitionExec?

The first one see review comment.

The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

Can you calrify the data you used to create the tables, thank you

NGA-TRAN · 2025-11-08T13:54:47Z

datafusion/physical-optimizer/src/enforce_distribution.rs

                }
                Distribution::HashPartitioned(exprs) => {
-                    if add_roundrobin {
+                    if add_roundrobin && !hash_necessary {


@adriangb : Do you mean adding the comment with links right at the fix for future reference?

NGA-TRAN · 2025-11-08T14:09:15Z

datafusion/sqllogictest/test_files/joins.slt

+09)------RepartitionExec: partitioning=Hash([a@1], 2), input_partitions=1
+10)--------ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1]
+11)----------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
+12)------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true


Data is sorted on a, b, c

NGA-TRAN · 2025-11-08T14:09:39Z

datafusion/sqllogictest/test_files/joins.slt

+08)----CoalesceBatchesExec: target_batch_size=2
+09)------RepartitionExec: partitioning=Hash([a@1], 2), input_partitions=1
+10)--------ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1]
+11)----------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]


Data is sorted on a, b, c, rn1

NGA-TRAN · 2025-11-08T14:23:48Z

datafusion/sqllogictest/test_files/joins.slt

+06)----------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
+07)------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true
+08)----CoalesceBatchesExec: target_batch_size=2
+09)------RepartitionExec: partitioning=Hash([a@1], 2), input_partitions=1


This is the key check: we need to confirm whether the data remains sorted after Hash Repartition. We're using it to split one partition/stream into two, and if we simply apply the hash function and stream each row forward, the data should stay sorted within each resulting partition/stream on a, b, c, rn1. I strongly suspect this holds, but we need to verify.

@gene-bordegaray:

Could you test this across different datasets to confirm both correctness and sort order?

Most importantly:
i. Check whether the execution plan marks the data as sorted per partition after Hash Repartition. If it does, please file a new ticket to ensure we display the sort order post-repartition in future work.
ii. Investigate whether the data is actually sorted per partition, even when it's marked as such.

And if this is the case, this would be another great optimization we have for free with this fix. I actually ran into this. See Even More Suboptimal Plan here.

Ok great, yes I can investigate this.

I have recreated the query from join.slt and checked the results, here is the command recreated:

-- These are settings that are set in join.slt that are set prior to the query SET datafusion.optimizer.prefer_hash_join = false; SET datafusion.explain.format = 'indent'; CREATE EXTERNAL TABLE annotated_data ( a0 INTEGER, a INTEGER, b INTEGER, c INTEGER, d INTEGER ) STORED AS CSV WITH ORDER (a ASC NULLS FIRST, b ASC, c ASC) LOCATION 'datafusion/core/tests/data/window_2.csv' OPTIONS ('format.has_header' 'true'); EXPLAIN SELECT * FROM (SELECT *, ROW_NUMBER() OVER() as rn1 FROM annotated_data) as l_table JOIN (SELECT *, ROW_NUMBER() OVER() as rn1 FROM annotated_data) as r_table ON l_table.a = r_table.a ORDER BY l_table.a ASC NULLS FIRST, l_table.b, l_table.c, r_table.rn1; SELECT * FROM (SELECT *, ROW_NUMBER() OVER() as rn1 FROM annotated_data) as l_table JOIN (SELECT *, ROW_NUMBER() OVER() as rn1 FROM annotated_data) as r_table ON l_table.a = r_table.a ORDER BY l_table.a ASC NULLS FIRST, l_table.b, l_table.c, r_table.rn1 LIMIT 30; +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Sort: l_table.a ASC NULLS FIRST, l_table.b ASC NULLS LAST, l_table.c ASC NULLS LAST, r_table.rn1 ASC NULLS LAST | | | Inner Join: l_table.a = r_table.a | | | SubqueryAlias: l_table | | | Projection: annotated_data.a0, annotated_data.a, annotated_data.b, annotated_data.c, annotated_data.d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS rn1 | | | WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] | | | TableScan: annotated_data projection=[a0, a, b, c, d] | | | SubqueryAlias: r_table | | | Projection: annotated_data.a0, annotated_data.a, annotated_data.b, annotated_data.c, annotated_data.d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS rn1 | | | WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] | | | TableScan: annotated_data projection=[a0, a, b, c, d] | | physical_plan | SortPreservingMergeExec: [a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST, rn1@11 ASC NULLS LAST] | | | SortMergeJoin: join_type=Inner, on=[(a@1, a@1)] | | | CoalesceBatchesExec: target_batch_size=2 | | | RepartitionExec: partitioning=Hash([a@1], 2), input_partitions=1 | | | ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1] | | | BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted] | | | DataSourceExec: file_groups={1 group: [[Users/gene.bordegaray/go/src/github.com/DataDog/datafusion/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true | | | CoalesceBatchesExec: target_batch_size=2 | | | RepartitionExec: partitioning=Hash([a@1], 2), input_partitions=1 | | | ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1] | | | BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted] | | | DataSourceExec: file_groups={1 group: [[Users/gene.bordegaray/go/src/github.com/DataDog/datafusion/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true | | | | +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.003 seconds. +----+---+---+---+---+-----+----+---+---+----+---+-----+ | a0 | a | b | c | d | rn1 | a0 | a | b | c | d | rn1 | +----+---+---+---+---+-----+----+---+---+----+---+-----+ | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 2 | 2 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 2 | 0 | 3 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 3 | 0 | 4 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 4 | 1 | 5 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 5 | 1 | 6 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 6 | 0 | 7 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 7 | 2 | 8 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 8 | 1 | 9 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 9 | 4 | 10 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 10 | 4 | 11 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 11 | 2 | 12 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 12 | 2 | 13 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 13 | 1 | 14 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 14 | 2 | 15 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 15 | 3 | 16 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 16 | 3 | 17 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 17 | 2 | 18 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 18 | 1 | 19 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 19 | 4 | 20 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 20 | 0 | 21 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 21 | 3 | 22 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 22 | 0 | 23 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 23 | 0 | 24 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 24 | 4 | 25 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 25 | 0 | 26 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 26 | 2 | 27 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 27 | 0 | 28 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 28 | 1 | 29 | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 29 | 1 | 30 | +----+---+---+---+---+-----+----+---+---+----+---+-----+ 30 row(s) fetched. Elapsed 0.003 seconds.

The physical plan here has a Hash Repartition on the and no SortExec node above. Despite this as seen the results are still sorted meaning that sorting was preserved.

The metadata for when order is preserved is not shown as seen in the above plan. The EXPLAIN output should display this but does not. This is tracked in the field maintains_input_order. I can create an issue for this and link it here

To recreate this I had to have: explicit ordering in the WITH ORDER and ORDER BY clause and set the prefer_hash_join flag to false to force SortMergeJoin. This means with the default config if a user is using pre-sorted files they might miss speed ups by keeping unneeded SortExec nodes. Might be worth further discussion if there is a better way to handle this or open an issue to look further into this.

Let me know what you think
@2010YOUY01

Yes, this seems a different issue that is triggered by the change, would be great to investigate in the future.

NGA-TRAN · 2025-11-08T14:26:38Z

datafusion/sqllogictest/test_files/joins.slt

-15)--------------ProjectionExec: expr=[a0@0 as a0, a@1 as a, b@2 as b, c@3 as c, d@4 as d, row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@5 as rn1]
-16)----------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": UInt64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
-17)------------------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_ordering=[a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST], file_type=csv, has_header=true
+02)--SortMergeJoin: join_type=Inner, on=[(a@1, a@1)]


See my three comments below. If the data in each corresponding partition/stream of the join input is already sorted, we can skip the re-sorting step and simply perform a merge join on each matching partition/stream.

NGA-TRAN · 2025-11-08T14:27:17Z

datafusion/sqllogictest/test_files/joins.slt

 09)--------WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
 10)----------TableScan: annotated_data projection=[a0, a, b, c, d]
 physical_plan
 01)SortPreservingMergeExec: [a@1 ASC, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST, rn1@11 ASC NULLS LAST]


And because there are 2 sorted partitions/streams after the join, we need this SortPreservingMergeExec to merge them

This the conclusion I cam to as well off of intutition but will confirm 😄

NGA-TRAN · 2025-11-08T14:38:40Z

@2010YOUY01 : The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

@gene-bordegaray : Can you calrify the data you used to create the tables, thank you

@2010YOUY01 : The reason Gene asked you to verify your test data is in this explanation. It seems in the build side your lineitem table is too small to repartition. Thus the plan after the fix does not repartition it. That is the reason for the speedup

gene-bordegaray · 2025-11-08T14:42:13Z

@2010YOUY01 : The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

@gene-bordegaray : Can you calrify the data you used to create the tables, thank you

@2010YOUY01 : The reason Gene asked you to verify your test data is in this explanation. It seems in the build side your lineitem table is too small to repartition. Thus the plan after the fix does not repartition it. That is the reason for the speedup

Yes, I am suspecting that i is due to the size of the tables because the logic for repartitioning at the file level lives above and independent of the repartitioning logic changes I have made

NGA-TRAN · 2025-11-08T14:48:46Z

@2010YOUY01 : The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

@gene-bordegaray : Can you calrify the data you used to create the tables, thank you

@NGA-TRAN : @2010YOUY01 : The reason Gene asked you to verify your test data is #18341 (comment). It seems in the build side your lineitem table is too small to repartition. Thus the plan after the fix does not repartition it. That is the reason for the speedup

@gene-bordegaray: Yes, I am suspecting that i is due to the size of the tables because the logic for repartitioning at the file level lives above and independent of the repartitioning logic changes I have made

That’s likely the case—there’s a predicate on the lineitem table, and we’ve pushed down a kind-of bloom filter to prune data during the scan. As a result, the dataset could be quite small. I wouldn’t worry about this scenario; it’s a reasonable outcome and one of the motivations behind the fix. Plus, it clearly improves query performance.

2010YOUY01 · 2025-11-09T02:42:56Z

Thank you for the amazing work.
Removing unnecessary consecutive RepartitionExec makes sense to me, however I have went through half of the test and found 2 plan changes that is not 2 RepartitionExec -> 1 RepartitionExec, I'm wondering is that expected? Do we have other plan changes that is not removing 1 of the consecutive RepartitionExec?
The first one see review comment.
The second one is tpch-q4, I saw a 20% speedup in benchmark result, so I checked the query plan, and the difference is it's removing a round robin repartition above parquet reader:

Can you calrify the data you used to create the tables, thank you

I'm using tpch-sf0.1 dataset generated by https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli
tpchgen-cli -s 0.1 --format=parquet
And register the dataset in datafusion-cli like

CREATE EXTERNAL TABLE customer
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/customer.parquet';

CREATE EXTERNAL TABLE nation
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/nation.parquet';

CREATE EXTERNAL TABLE part
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/part.parquet';

CREATE EXTERNAL TABLE region
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/region.parquet';

CREATE EXTERNAL TABLE lineitem
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/lineitem.parquet';

CREATE EXTERNAL TABLE orders
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/orders.parquet';

CREATE EXTERNAL TABLE partsupp
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/partsupp.parquet';

CREATE EXTERNAL TABLE supplier
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/supplier.parquet';

This PR collected the HJ build side into a single partition, while previously it doesn't. I checked with

SET datafusion.explain.show_statistics = true;
EXPLAIN ANALYZE...

And the statistics estimates the scan output row to be below datafusion.optimizer.hash_join_single_partition_threshold_rows, so the behavior is expected. Previously it's failing to follow the threshold for unknown reasons.

I think the result is good, nothing to worry about.

2010YOUY01

LGTM, thank you.

I’ve reviewed all the test changes, everything looks good except several SortExec additions and removals — these appear to be redundant. It’s not a correctness issue, but it could affect performance, so it might be worth investigating further in the future.

2010YOUY01 · 2025-11-09T03:03:15Z

datafusion/sqllogictest/test_files/window.slt

-02)--ProjectionExec: expr=[c3@0 as c3, sum(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c3 DESC NULLS FIRST, aggregate_test_100.c9 DESC NULLS FIRST, aggregate_test_100.c2 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@2 as sum1, sum(aggregate_test_100.c9) PARTITION BY [aggregate_test_100.c3] ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@3 as sum2]
-03)----BoundedWindowAggExec: wdw=[sum(aggregate_test_100.c9) PARTITION BY [aggregate_test_100.c3] ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { "sum(aggregate_test_100.c9) PARTITION BY [aggregate_test_100.c3] ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW": nullable UInt64 }, frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
-04)------SortExec: expr=[c3@0 ASC NULLS LAST, c9@1 DESC], preserve_partitioning=[true]
+02)--SortExec: TopK(fetch=5), expr=[c3@0 ASC NULLS LAST], preserve_partitioning=[true]


Here it adds a SortExec similarly, might also worth investigating later.

2010YOUY01 · 2025-11-09T03:25:46Z

BTW, I think it's a great opportunity to add a section in the tuning guide https://datafusion.apache.org/user-guide/configs.html#tuning-guide explaining the default repartitioning behavior and how related configuration options (such as the minimum file size for repartitioning, minimum repartition size for hash joins, etc.) can affect it.

Eliminate consecutive repartitions and update tests

d66b393

github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 6, 2025

Merge branch 'main' into gene.bordegaray/2025/10/avoid_consecutive_re…

288817b

…partition_exec

github-actions bot added the documentation Improvements or additions to documentation label Nov 7, 2025

Fix benchmark and tests

746d6ba

gene-bordegaray force-pushed the gene.bordegaray/2025/10/avoid_consecutive_repartition_exec branch from c7f39df to 746d6ba Compare November 7, 2025 15:09

gene-bordegaray marked this pull request as ready for review November 7, 2025 15:35

gene-bordegaray changed the title ~~Eliminate consecutive repartitions and update tests~~ fix: Eliminate consecutive repartitions and update tests Nov 7, 2025

gene-bordegaray changed the title ~~fix: Eliminate consecutive repartitions and update tests~~ fix: Eliminate consecutive repartitions Nov 7, 2025

alamb mentioned this pull request Nov 7, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-03 #18486

Open

46 tasks

alamb reviewed Nov 7, 2025

View reviewed changes

NGA-TRAN approved these changes Nov 7, 2025

View reviewed changes

NGA-TRAN mentioned this pull request Nov 7, 2025

Make repartitioning in PhysicalPlan output less confusing #9370

Open

2010YOUY01 reviewed Nov 8, 2025

View reviewed changes

NGA-TRAN reviewed Nov 8, 2025

View reviewed changes

2010YOUY01 approved these changes Nov 9, 2025

View reviewed changes

fix: Eliminate consecutive repartitions #18521

Are you sure you want to change the base?

fix: Eliminate consecutive repartitions #18521

Conversation

gene-bordegaray commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

NGA-TRAN left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 8, 2025

Uh oh!

alamb commented Nov 8, 2025

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN commented Nov 8, 2025

Uh oh!

gene-bordegaray commented Nov 8, 2025

Uh oh!

NGA-TRAN commented Nov 8, 2025

Uh oh!

2010YOUY01 commented Nov 9, 2025

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

gene-bordegaray commented Nov 6, 2025 •

edited

Loading

NGA-TRAN Nov 8, 2025 •

edited

Loading

gene-bordegaray commented Nov 8, 2025 •

edited

Loading

gene-bordegaray Nov 9, 2025 •

edited

Loading

NGA-TRAN Nov 8, 2025 •

edited

Loading