-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Data] Streamline HashShuffleAggregator protocol
#59973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
30898ae
Revisiting `HashShuffleAggregator` protocol to streamline, simplify a…
alexeykudinkin d813073
Added tests for `HashShuffleAggregator`
alexeykudinkin a1fe381
Rebased Join;
alexeykudinkin bce9f8d
Rebased Aggregate;
alexeykudinkin 8b17302
Allow operators to override compaction thresholds;
alexeykudinkin 38bd366
Missing changes
alexeykudinkin b3389f7
`lint`
alexeykudinkin 03218f6
Fixed refs
alexeykudinkin e9baa9b
`lint`
alexeykudinkin 1df562c
Tidying up
alexeykudinkin f8343df
Streamlined partition buckets init
alexeykudinkin b47ca98
Wired in `num_input_seqs`
alexeykudinkin b289924
Fixed invalid ref
alexeykudinkin 0f593a9
`lint`
alexeykudinkin 06af88b
Fixed refs
alexeykudinkin 2480037
Bump up `_shuffle_block` num_cpus to 1
alexeykudinkin a05dc27
Reverting task retries for _shuffle_block
alexeykudinkin d92688d
Missing bazel target
alexeykudinkin a30e114
`lint`
alexeykudinkin 608c10d
Reduce scope of the lock
alexeykudinkin 7bc588a
Relocated `compaction_threshold` into the `PartitionBucket`
alexeykudinkin 035730c
Updated compaction comment
alexeykudinkin e127c66
`lint`
alexeykudinkin a294375
Fixed tests
alexeykudinkin 24e23ac
Fixed invalid ref
alexeykudinkin 6175c8e
Fixed reported `incremental_resource_usage` for hash-shuffle
alexeykudinkin c247dd3
Fixed test
alexeykudinkin c5004f0
`lint`
alexeykudinkin 116f46d
Tidying up
alexeykudinkin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty partitions silently yield no output blocks
Medium Severity
The new implementation changes behavior for empty partitions. The old
ReducingShuffleAggregation.finalize()always yielded a block (returningArrowBlockAccessor._empty_table()for empty partitions), while the new code in bothReducingAggregation.finalize()andConcatAggregation.finalize()hasif not blocks: returnwhich yields nothing. Additionally,HashShuffleAggregator.finalize()setsblocks = iter([])whenpartition_shards_mapis empty. This behavior change could break downstream code expecting a block per partition and causes the new test at line 129 to fail withIndexErrorwhen accessingresults[0].Additional Locations (2)
python/ray/data/_internal/execution/operators/hash_shuffle.py#L205-L207python/ray/data/_internal/execution/operators/hash_shuffle.py#L1757-L1763