[Data] Streamline `HashShuffleAggregator` protocol #59973

alexeykudinkin · 2026-01-08T19:08:14Z

Context

This change aims at revisiting of the HashShuffleAggregator protocol by

Removing global lock (per aggregator)
Making shard accepting flow lock-free
Relocating all state from ShuffleAggregation into Aggregator itself
Adding dynamic compaction (exponentially increasing compaction period) to amortize compaction costs
Adding debugging state dumps

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

gemini-code-assist

Code Review

This pull request is a significant and well-executed refactoring of the HashShuffleAggregator protocol. It moves from a stateful to a stateless aggregation model, which simplifies the aggregation logic and removes the need for state management within aggregation components. The introduction of per-partition locking and a lock-free queue for block submission is a major performance improvement, removing the global lock bottleneck. The addition of dynamic compaction with an exponentially increasing period is a clever optimization to amortize compaction costs. The new tests for HashShuffleAggregator are comprehensive and cover the new features well.

I've found a critical issue related to undefined constants in the new compaction logic that could lead to runtime errors. I've also identified a couple of minor issues: a typo in a new method name and a bug in an error message. Overall, this is a high-quality change that significantly improves the architecture of hash-based shuffling in Ray Data.

python/ray/data/_internal/execution/operators/hash_shuffle.py

python/ray/data/_internal/execution/operators/join.py

python/ray/data/_internal/table_block.py

python/ray/data/_internal/execution/operators/hash_shuffle.py

python/ray/data/_internal/execution/operators/join.py

python/ray/data/tests/test_hash_shuffle_aggregator.py

cursor · 2026-01-08T19:36:31Z

python/ray/data/_internal/execution/operators/hash_aggregate.py

-    def clear(self, partition_id: int):
-        self._aggregated_blocks: List[Block] = []
+        if not blocks:
+            return


Empty partitions silently yield no output blocks

Medium Severity

The new implementation changes behavior for empty partitions. The old ReducingShuffleAggregation.finalize() always yielded a block (returning ArrowBlockAccessor._empty_table() for empty partitions), while the new code in both ReducingAggregation.finalize() and ConcatAggregation.finalize() has if not blocks: return which yields nothing. Additionally, HashShuffleAggregator.finalize() sets blocks = iter([]) when partition_shards_map is empty. This behavior change could break downstream code expecting a block per partition and causes the new test at line 129 to fail with IndexError when accessing results[0].

Additional Locations (2)

python/ray/data/_internal/execution/operators/hash_shuffle.py#L205-L207

python/ray/data/_internal/execution/operators/hash_shuffle.py#L1757-L1763

python/ray/data/_internal/execution/operators/hash_shuffle.py

cursor · 2026-01-08T20:24:54Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+        if (
+            self._aggregation.is_compacting()
+            and bucket.queue.qsize()
+            >= self._current_compaction_thresholds[partition_id]


TypeError when compacting aggregation lacks compaction thresholds

Low Severity

When is_compacting() returns True but min_max_shards_compaction_thresholds is None, the comparison bucket.queue.qsize() >= self._current_compaction_thresholds[partition_id] will raise a TypeError because _current_compaction_thresholds is a defaultdict that returns None (the value of _min_num_blocks_compaction_threshold). Currently this doesn't trigger because only ReducingAggregation returns is_compacting() = True, and it's paired with HashAggregateOperator which provides thresholds. However, custom aggregations returning is_compacting() = True with operators not overriding _get_min_max_partition_shards_compaction_thresholds() would crash.

Additional Locations (1)

python/ray/data/_internal/execution/operators/hash_shuffle.py#L1715-L1718

…nd optimize it Signed-off-by: Alexey Kudinkin <[email protected]>

Signed-off-by: Alexey Kudinkin <[email protected]>

Override compaction thresholds for Aggregate Signed-off-by: Alexey Kudinkin <[email protected]>

Signed-off-by: Alexey Kudinkin <[email protected]>

python/ray/data/_internal/execution/operators/hash_shuffle.py

iamjustinhsu · 2026-01-09T00:42:48Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+                # NOTE: We revise compaction thresholds for partition after every
+                #       compaction to amortize the cost of compaction.
+                self._current_compaction_thresholds[partition_id] = min(
+                    self._current_compaction_thresholds[partition_id] * 2,


Copy over a conversation

I'm a bit confused why u are multiplying by 2?
Increasing threshold

I guess I want to understand the motivation behind multiplying by 2? Why not scale by 2 when you can keep it constant. It's not intuitive to me why 2 is used

@alexeykudinkin discussed offline, i think you update ur comment to refer to only Concat aggregations, as this does not affect sum or count

The motivation is the same as with doubling the size of the list whenever resizing the list while appending to it: amortizing the cost of copying existing data in the list over N appends.

Similarly here for non-reducing aggregations (like AsList) b/c after compaction we add the block back into the queue we want to reduce amount of time we're copying the data over and over again.

Let me update the comment

goutamvenkat-anyscale · 2026-01-09T00:44:44Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+        exec_stats_builder = BlockExecStats.builder()
+
+        # Collect partition shards from all input sequences for this partition
+        partition_shards_map: Dict[int, List[Block]] = {}


let's create a typevar to denote that this is a partitionId

iamjustinhsu · 2026-01-09T00:45:14Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

        )
-        self._target_max_block_size = target_max_block_size
+
+        self._current_compaction_thresholds: Dict[int, int] = defaultdict(


Copy over a conversation

Hmm, i feel that a more accurate way to compact should be based on object store usage. Do u think it makes more sense to refactor PartitionState to accurately keep track of byte size too, and then compact based on that? Curious about ur intentions behind making it "block" based
It's put in place to manage performance of Pyarrow table concatenation
For aggregations (sum, etc) it allows us to run most of the aggregations while shuffling itself is ongoing (ie finalization is really fast)

hmm, i understand the intention, it's more that if blocks are very large (1GiB block), or (1KB) per block) then this heuristic isn't very strong. Or, are you assuming block sizes are roughly 128MiB?

discussed offline: This is simpliest, so he is going with this, but can be experimented with.

goutamvenkat-anyscale · 2026-01-09T01:09:43Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+
+        # Per-sequence mapping of partition-id to `PartitionState` with individual
+        # locks for thread-safe block accumulation
+        self._input_seq_partition_buckets: Dict[


Let's use typevars

Signed-off-by: Alexey Kudinkin <[email protected]>

goutamvenkat-anyscale · 2026-01-09T01:45:33Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+    lock: threading.Lock
+    queue: queue.Queue  # Queue[Block]
+
+    def drain_queue(self) -> List[Block]:


For associative reducing aggs, you can fold per block instead of holding onto a reference of blocks.

def drain(self, fold_fn: Callable[[(Block, AggState), Block], Block]) -> Optional[Block]: acc = None while True: try: block = self.queue.get(False) acc = block if acc is None else fold_fn(acc, block) except Empty: break return acc

Why would want to do that?

Signed-off-by: Alexey Kudinkin <[email protected]>

python/ray/data/_internal/execution/operators/hash_shuffle.py

iamjustinhsu · 2026-01-12T18:21:12Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+                # Drain the queue to perform compaction
+                to_compact = bucket.drain_queue()
+                # We revise up compaction thresholds for partition after every
+                # compaction so that for "non-compacting" aggregations (like


Hmm, I feel like the usage of "non-compacting" is confusing here since we know is_compacting() == True. Maybe something like "aggregations that don't drastically reduce data size"

iamjustinhsu · 2026-01-12T18:25:16Z

python/ray/data/_internal/execution/operators/hash_shuffle.py

+                # compaction so that for "non-compacting" aggregations (like
+                # `Unique`) we amortize the cost of compaction processing the same
+                # elements multiple times.
+                bucket.compaction_threshold = min(


nit: not strictly necessary, but I think this can be moved outside of the lock

Signed-off-by: Alexey Kudinkin <[email protected]>

Context --- This change aims at revisiting of the `HashShuffleAggregator` protocol by - Removing global lock (per aggregator) - Making shard accepting flow lock-free - Relocating all state from `ShuffleAggregation` into Aggregator itself - Adding dynamic compaction (exponentially increasing compaction period) to amortize compaction costs - Adding debugging state dumps ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <[email protected]> Signed-off-by: jasonwrwang <[email protected]>

Context --- This change aims at revisiting of the `HashShuffleAggregator` protocol by - Removing global lock (per aggregator) - Making shard accepting flow lock-free - Relocating all state from `ShuffleAggregation` into Aggregator itself - Adding dynamic compaction (exponentially increasing compaction period) to amortize compaction costs - Adding debugging state dumps ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <[email protected]>

alexeykudinkin requested a review from a team as a code owner January 8, 2026 19:08

alexeykudinkin added the go add ONLY when ready to merge, run all tests label Jan 8, 2026

alexeykudinkin changed the base branch from master to ak/hsh-shfl-strm January 8, 2026 19:09

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Outdated Show resolved Hide resolved

python/ray/data/_internal/execution/operators/join.py Show resolved Hide resolved

python/ray/data/_internal/table_block.py Outdated Show resolved Hide resolved

cursor bot reviewed Jan 8, 2026

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Show resolved Hide resolved

python/ray/data/_internal/execution/operators/join.py Show resolved Hide resolved

python/ray/data/tests/test_hash_shuffle_aggregator.py Outdated Show resolved Hide resolved

alexeykudinkin force-pushed the ak/agg-shfl-opt branch from 1258442 to 02c03f6 Compare January 8, 2026 19:25

cursor bot reviewed Jan 8, 2026

View reviewed changes

Base automatically changed from ak/hsh-shfl-strm to master January 8, 2026 19:44

alexeykudinkin force-pushed the ak/agg-shfl-opt branch from 187c86a to ee35949 Compare January 8, 2026 20:12

cursor bot reviewed Jan 8, 2026

View reviewed changes

alexeykudinkin added 19 commits January 8, 2026 14:12

Revisiting HashShuffleAggregator protocol to streamline, simplify a…

30898ae

…nd optimize it Signed-off-by: Alexey Kudinkin <[email protected]>

Added tests for HashShuffleAggregator

d813073

Signed-off-by: Alexey Kudinkin <[email protected]>

Rebased Join;

a1fe381

Signed-off-by: Alexey Kudinkin <[email protected]>

Rebased Aggregate;

bce9f8d

Signed-off-by: Alexey Kudinkin <[email protected]>

Allow operators to override compaction thresholds;

8b17302

Override compaction thresholds for Aggregate Signed-off-by: Alexey Kudinkin <[email protected]>

Missing changes

38bd366

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

b3389f7

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed refs

03218f6

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

e9baa9b

Signed-off-by: Alexey Kudinkin <[email protected]>

Tidying up

1df562c

Signed-off-by: Alexey Kudinkin <[email protected]>

Streamlined partition buckets init

f8343df

Signed-off-by: Alexey Kudinkin <[email protected]>

Wired in num_input_seqs

b47ca98

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed invalid ref

b289924

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

0f593a9

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed refs

06af88b

Signed-off-by: Alexey Kudinkin <[email protected]>

Bump up _shuffle_block num_cpus to 1

2480037

Signed-off-by: Alexey Kudinkin <[email protected]>

Reverting task retries for _shuffle_block

a05dc27

Signed-off-by: Alexey Kudinkin <[email protected]>

Missing bazel target

d92688d

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

a30e114

Signed-off-by: Alexey Kudinkin <[email protected]>

alexeykudinkin force-pushed the ak/agg-shfl-opt branch from ee35949 to a30e114 Compare January 8, 2026 22:19

iamjustinhsu reviewed Jan 9, 2026

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Outdated Show resolved Hide resolved

iamjustinhsu reviewed Jan 9, 2026

View reviewed changes

goutamvenkat-anyscale reviewed Jan 9, 2026

View reviewed changes

iamjustinhsu reviewed Jan 9, 2026

View reviewed changes

goutamvenkat-anyscale reviewed Jan 9, 2026

View reviewed changes

Reduce scope of the lock

608c10d

Signed-off-by: Alexey Kudinkin <[email protected]>

ray-gardener bot added the data Ray Data-related issues label Jan 9, 2026

goutamvenkat-anyscale reviewed Jan 9, 2026

View reviewed changes

alexeykudinkin added 3 commits January 8, 2026 17:54

Relocated compaction_threshold into the PartitionBucket

7bc588a

Signed-off-by: Alexey Kudinkin <[email protected]>

Updated compaction comment

035730c

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

e127c66

Signed-off-by: Alexey Kudinkin <[email protected]>

cursor bot reviewed Jan 9, 2026

View reviewed changes

python/ray/data/_internal/execution/operators/hash_shuffle.py Show resolved Hide resolved

iamjustinhsu approved these changes Jan 12, 2026

View reviewed changes

alexeykudinkin added 6 commits January 12, 2026 16:12

Fixed tests

a294375

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed invalid ref

24e23ac

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed reported incremental_resource_usage for hash-shuffle

6175c8e

Signed-off-by: Alexey Kudinkin <[email protected]>

Fixed test

c247dd3

Signed-off-by: Alexey Kudinkin <[email protected]>

lint

c5004f0

Signed-off-by: Alexey Kudinkin <[email protected]>

Tidying up

116f46d

Signed-off-by: Alexey Kudinkin <[email protected]>

alexeykudinkin enabled auto-merge (squash) January 13, 2026 00:43

alexeykudinkin merged commit ac77bfc into master Jan 13, 2026
7 checks passed

alexeykudinkin deleted the ak/agg-shfl-opt branch January 13, 2026 01:23

[Data] Streamline HashShuffleAggregator protocol #59973

[Data] Streamline HashShuffleAggregator protocol #59973

Conversation

alexeykudinkin commented Jan 8, 2026

Context

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 8, 2026

Choose a reason for hiding this comment

Empty partitions silently yield no output blocks

Uh oh!

Uh oh!

cursor bot Jan 8, 2026

Choose a reason for hiding this comment

TypeError when compacting aggregation lacks compaction thresholds

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Data] Streamline `HashShuffleAggregator` protocol #59973

[Data] Streamline `HashShuffleAggregator` protocol #59973