-
Notifications
You must be signed in to change notification settings - Fork 165
chore: improve sorting operation #1900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1b647b1 to
6507949
Compare
|
@triceo The PR is ready for review and will rerun all previous model validations, just like we did when adding the sorting feature. |
|
There is an issue with the quickstart compilation job, and I will take a look. |
triceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not excited about how sorting creates yet another completely separate path for some selectors, and another part for others.
ValueRangeManager needs to be the ultimate source of truth. Think of the future, where we are inevitably going to be asked to enable mutable value ranges on entities. (When variables change, so do value ranges.) This complexity simply cannot be managed, if value ranges are all over the place. This needs to be centralized.
This future is closer than you think; early next year I expect.
IMO we should start cutting off the paths that lead to these discrepancies. Deprecating, failing fast. Especially when we introduce new functionality, like here - with new functionality, there is no backwards compatibility. We can simply decide to make some things impossible. We can start to reduce the impossible cross-combinations of move selector configurations.
Also, you are far too attached to the ValueRange type. In the future, this type will not exist. Every ValueRange will just be a Collection. Make your future you's life easier, and stop treating the ValueRange interface as a building block for everything.
We only have a limited amount of time to spend on this, and there are good things in here. Let's discuss where we go from here.
core/src/main/java/ai/timefold/solver/core/api/domain/valuerange/ValueRange.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/domain/valuerange/ValueRangeCache.java
Outdated
Show resolved
Hide resolved
...ava/ai/timefold/solver/core/impl/heuristic/selector/common/decorator/SelectionSetSorter.java
Outdated
Show resolved
Hide resolved
...a/ai/timefold/solver/core/impl/heuristic/selector/value/FromEntityPropertyValueSelector.java
Show resolved
Hide resolved
...rc/main/java/ai/timefold/solver/core/impl/heuristic/selector/value/ValueSelectorFactory.java
Outdated
Show resolved
Hide resolved
...rc/main/java/ai/timefold/solver/core/impl/heuristic/selector/value/ValueSelectorFactory.java
Outdated
Show resolved
Hide resolved
The Vert.x exception is a red herring. Your real problem is here: |
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Outdated
Show resolved
Hide resolved
|
I'm thinking we're approaching this all wrong. We're building this hierarchy of selectors, decorating with additional behaviors. But value ranges shouldn't work like that. Value ranges are meant to be the root selector. When sorting is necessary, it should be applied to the root selector directly, not via some decorator later. The root iteration should already happen in the sorted order. How do values come into the value range selector? Who puts them there? That is where we need to start - and that source of data needs to be the value range manager. Which will already have everything sorted. And then we can start building all the other functionality on top of that - the filter decorator etc., all of that can be layered on top of what's already sorted. This is a large refactor, and maybe we don't want to do it right now. But this is the way to make sure that value range manager is the only source of truth for data. And consequently, it is the only way how we can safely produce value ranges which change with their entity variables. |
That's the current approach. The base method
Yes, that's correct. However, the range filtering and the nearby nodes ignore the root selector sort order because they use different structures to fetch the values: reachable values and the distance matrix.
I have a solution to maintain the |
|
@triceo I'll begin reviewing the necessary changes to the enterprise repository, and This change is needed because both |
306bb78 to
43f59de
Compare
This reverts commit ca3c813.
f5a71e4 to
2670a63
Compare
triceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already confirmed that the code works, so I'm only focusing on quality of the code itself.
This is a good proof of concept, but you cannot stop there - after you've proven that your idea works, you have to clean up the code. Turn it into something that will be easy to maintain going forward, and perhaps even by people who aren't you.
ValueRangeManager and related classes, as it stands now, are spaghetti code. It is hard to read, it is hard to understand, and it will be hard to maintain. No page-long methods, no 1000-line classes. I give some advice in comments.
...old/solver/core/impl/heuristic/selector/entity/decorator/FilteringEntityByValueSelector.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void solvingEnded(SolverScope<Solution_> solverScope) { | ||
| super.solvingEnded(solverScope); | ||
| this.listVariableStateSupply = null; | ||
| } | ||
|
|
||
| @Override | ||
| public void phaseEnded(AbstractPhaseScope<Solution_> phaseScope) { | ||
| super.phaseEnded(phaseScope); | ||
| this.solverScope = null; | ||
| this.workingRandom = null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are suddenly adding this to all the selectors, and I have to ask... why now? Why was it not a problem ever before, in the 10+ years that the selector code has existed?
This smells. We are introducing new behavior, IMO without fully understanding why it worked perfectly fine without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ListRuinRecreateMoveSelector isn't that old. The purpose of these changes is to release allocated resources and prevent memory leaks. I reviewed all nodes to identify non-final fields whose references we haven't cleared by the end of the phase and the solver.
For example, the CH repository PlacerBasedMoveRepository was not clearing the placementIterator reference, causing related resources such as value range caches and reachable value structures to never be reclaimed after the CH phase.
Also, these are private instances that are not used once they are cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See, this is the strange part to me.
Why would we clear the inside of something, when the outside should just get garbage collected? We should remove a reference to the thing, and everything inside the thing will get collected automatically.
Specifically in the case of PlacerBasedMoveRepository - why does that even survive the CH phase? Something is keeping it alive, and that is what we should be solving. What you're doing here is curing the symptoms, not the disease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit PlacerBasedMoveRepository. The CH phase instance remains with the AbstractSolver until the process completes. This means that DefaultConstructionHeuristicPhase contains a final field moveRepository that is not cleared. Since the final field references an iterator (placementIterator) that may also hold references to ReachableValues, the range cache and all internal reachable values structures will not be released, even if the ValueRangeManager is reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can confirm what I'm telling you at AbstractSolver#runPhases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you. :-) But my point is - by removing the phase instance once it's no longer needed, you solve the root cause.
Whether we want to do it is another thing. But it is cleaner than resetting things and hoping that everyone always remembers to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the phases after the process finishes is already done. The point is to keep data from finished phases while another is still running.
core/src/main/java/ai/timefold/solver/core/impl/phase/AbstractPhase.java
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/score/director/AbstractScoreDirector.java
Outdated
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Outdated
Show resolved
Hide resolved
| return valueRange; | ||
| } | ||
|
|
||
| private static <T> BitSet getBitSetValueRange(CountableValueRange<T> valueRange, Map<Object, Integer> valueIndexMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will be very expensive. It happens for every single value range, it iterates over it (possibly 20k items) and for each item it will do a hash lookup. If there is one place that would benefit from micro-optimization, it is this one. And then when the lookup is done, there will be a conversion from Integer to int.
My bet is that this is going to jump out in profiling, and by far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure how we can improve this. Remember that the index map, which includes all planning values and the ordinal IDs, is not available when the descriptors are created or when we extract the data.
The class CompositeCountableValueRange could benefit from access to the map of values and ordinal IDs, but the current logic doesn't support passing this information, and the solution to enable that seems worse to me. On the other hand, other range caches like ListValueRange need to read all the data to compute the BitSet.
Please let me know if you have any suggestions for improving it.
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/score/director/ValueRangeManager.java
Show resolved
Hide resolved
core/src/main/java/ai/timefold/solver/core/impl/solver/recaller/BestSolutionRecaller.java
Show resolved
Hide resolved
|
triceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting there.
I am making suggestions for how to break this class down into manageable subcomponents.
If you make each of them small, each will be easy to understand.
And ValueRangeManager will become just a conductor of an orchestra - and will become easy to understand too.
| // Left item from the newly created item | ||
| newItem.leftItem(), | ||
| newItem.leftSorter(), | ||
| // Current right item | ||
| item.rightItem(), | ||
| item.rightSorter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice: This is code duplication. It is also not important at this point in time for me to know that this happens.
If you hide this in the ValueRangeItem constructor, this method immediately becomes easier to read. Hide complexity, separate concerns.
| // Current left item | ||
| item.leftItem(), | ||
| item.leftSorter(), | ||
| // Right item from the newly created item | ||
| newItem.leftItem(), | ||
| newItem.leftSorter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different than above, so we cannot simply share the code. But what we can do is to create two static constructors on ValueRangeItem which will hide this away.
| this.fromSolution = new ValueRangeItem[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.fromSolutionValueIndexMap = new Map[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.reachableValues = new ValueRangeItem[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.fromEntityBitSet = new HashMap[solutionDescriptor.getValueRangeDescriptorCount()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is now being used from exactly one method. You know what that tells me?
The field and the method can be extracted to a separate type. Hide the complexity, expose a well-named type with a well-named method. Nobody needs to know that there is an array, and hashmaps etc. etc.
| this.reachableValues = new ReachableValues[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.fromSolution = new ValueRangeItem[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.fromSolutionValueIndexMap = new Map[solutionDescriptor.getValueRangeDescriptorCount()]; | ||
| this.reachableValues = new ValueRangeItem[solutionDescriptor.getValueRangeDescriptorCount()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code dealing with this field is also hidden in just a few methods. They do not deal with much of the other state. This tells me they can be extracted into another class. Hide complexity, separate concerns, avoid massive classes.
| * For safety, prefer using {@link #of(SolutionDescriptor, Object)} to create an instance of this class | ||
| * with a solution already set. | ||
| */ | ||
| public ValueRangeManager(SolutionDescriptor<Solution_> solutionDescriptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these arrays are based on value range descriptors.
Which means this class is effectively N instances within 1 class; 1 for each value range descriptor.
Arguably, if you refactor the logic so that each VRM works with exactly one VRD, the code will also get much simpler. (You will just have to have one "router" class to delegate requests to these instances based on the specific VRD used.) Something like valueRangeManager.getFor(valueRangeDescriptor).



This PR enables sorting the value ranges at the
ValueRangeManagerlevel.The sorting feature is likely useful during the CH phase, and the new approach can address the sorting requirement for the existing CH configurations. However, the existing selector sorting nodes remain available, as they are required for range filtering or nearby selectors. Also, these corner cases may only make sense for the LS phases.