-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Data] Fixes to DownstreamCapacityBackpressuePolicy #59990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Fixes to DownstreamCapacityBackpressuePolicy #59990
Conversation
Signed-off-by: Srinath Krishnamachari <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the calculation of object store budget fractions to correctly account for downstream ineligible operators, which fixes an issue in DownstreamCapacityBackpressuePolicy. The logic is centralized into new helper functions, improving code clarity and reusability. The changes are well-tested and include necessary updates to test mocks. My main feedback is to address some code duplication that was introduced during the refactoring to further improve maintainability.
| def get_op_usage_object_store_with_downstream(self, op: PhysicalOperator) -> int: | ||
| """Get total object store usage of the given operator and its downstream ineligible ops.""" | ||
| total_usage = int(self.get_op_usage(op).object_store_memory) | ||
| for next_op in self._get_downstream_ineligible_ops(op): | ||
| total_usage += int(self.get_op_usage(next_op).object_store_memory) | ||
| return total_usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve maintainability and reduce code duplication, you can extract the logic for calculating the total object store usage of downstream ineligible operators into a private helper method. This logic is also used in get_op_outputs_object_store_usage_with_downstream.
After applying the suggestion below, you should also update get_op_outputs_object_store_usage_with_downstream to use _get_downstream_ineligible_ops_usage.
| def get_op_usage_object_store_with_downstream(self, op: PhysicalOperator) -> int: | |
| """Get total object store usage of the given operator and its downstream ineligible ops.""" | |
| total_usage = int(self.get_op_usage(op).object_store_memory) | |
| for next_op in self._get_downstream_ineligible_ops(op): | |
| total_usage += int(self.get_op_usage(next_op).object_store_memory) | |
| return total_usage | |
| def _get_downstream_ineligible_ops_usage(self, op: PhysicalOperator) -> int: | |
| """Get the total object store memory usage of downstream ineligible operators.""" | |
| usage = 0 | |
| for next_op in self._get_downstream_ineligible_ops(op): | |
| usage += int(self.get_op_usage(next_op).object_store_memory) | |
| return usage | |
| def get_op_usage_object_store_with_downstream(self, op: PhysicalOperator) -> int: | |
| """Get total object store usage of the given operator and its downstream ineligible ops.""" | |
| total_usage = int(self.get_op_usage(op).object_store_memory) | |
| total_usage += self._get_downstream_ineligible_ops_usage(op) | |
| return total_usage |
Description
[Data] Fixes to DownstreamCapacityBackpressuePolicy
Fixes
In DownstreamCapacityBackpressuePolicy, when calculating available/utilized object store ratio per Op, also include the downstream in-eligible Ops.
Related issues
Additional information