-
Notifications
You must be signed in to change notification settings - Fork 2k
scheduler: perform feasibility checks for system canaries before computing placements #26953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: perform feasibility checks for system canaries before computing placements #26953
Conversation
b057f84 to
88f1fb4
Compare
838bcd8 to
e1234c1
Compare
e1234c1 to
a6cd581
Compare
Two groups on the same job cannot both have a static port assignment, but this ends up getting configured in the update block test for system deployments. This test setup bug has complicated landing the fix in #26953.
Two groups on the same job cannot both have a static port assignment, but this ends up getting configured in the update block test for system deployments. This test setup bug has complicated landing the fix in #26953.
697d0ff to
0c70ac8
Compare
… state Previously we copied the behavior found in the generic scheduler, where we rely on reconciler results to decide if there's enough placements made. In the system scheduler we always know exactly how many placements there should be based on the DesiredTotal field of the deployment state, so a better way to check completeness of the deployment is to simplify it and base on dstate alone.
…iler In contrast to the cluster reconciler, in the node reconciler we go node-by-node as opposed to alloc-by-alloc, and thus the state of the reconciler has to be managed differently. If we override old deployments on every run of `cancelUnneededDeployments`, we end up with unnecessarily created deployments for job version that already had them.
incorrect condition for detecting jobspec change on ineligible nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's get this merged and then we can mop-up and remaining minor issues.
|
|
||
| // computePlacements computes placements for allocations | ||
| func (s *SystemScheduler) computePlacements(place []reconciler.AllocTuple, existingByTaskGroup map[string]bool) error { | ||
| func (s *SystemScheduler) computePlacements( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cleanup later: this no longer matches the behavior of the other schedulers' computePlacements method. We should consider renaming this to something that describes what's happening here as distinct from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. would you say placeAllocs is more apt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I think of "place" as "find a node for this alloc" and we've already done that. But I don't off the top of my head have a better verb.
| // we should have an entry for every node that is looked | ||
| // up. if we don't, something must be wrong | ||
| s.logger.Error("failed to locate node feasibility information", | ||
| "node-id", node.ID, "task_group", tgName) | ||
| // provide a stubbed metric to work with | ||
| metrics = &structs.AllocMetric{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment confused me a bit with the if option == nil block below:
| // we should have an entry for every node that is looked | |
| // up. if we don't, something must be wrong | |
| s.logger.Error("failed to locate node feasibility information", | |
| "node-id", node.ID, "task_group", tgName) | |
| // provide a stubbed metric to work with | |
| metrics = &structs.AllocMetric{} | |
| // we should have an entry for every node that is looked up | |
| // (potentially with a nil value). if we don't, something must be wrong | |
| s.logger.Error("failed to locate node feasibility information", | |
| "node-id", node.ID, "task_group", tgName) | |
| // provide a stubbed metric to work with | |
| metrics = &structs.AllocMetric{} |
I checked coverage and there's no test case that hits this code path. Are we sure it's reachable?
| type taskGroupNodes []*taskGroupNode | ||
|
|
||
| // feasible returns all taskGroupNode that are feasible for placement | ||
| func (t taskGroupNodes) feasible() (feasibleNodes []*taskGroupNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total number of nodes may be large (100s or 1000s). We probably should initialize the capacity here so that we're not having to perform a bunch of small reallocations (ex https://go.dev/play/p/EEMwoeGObjs)
| return result | ||
| } | ||
|
|
||
| // cancelUnneededServiceDeployments cancels any deployment that is not needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // cancelUnneededServiceDeployments cancels any deployment that is not needed. | |
| // cancelUnneededSystemDeployments cancels any deployment that is not needed. |
|
👏👏 |
I was about to do some mopping but I saw Chris has some interesting ideas on how to refactor the whole feasibility-check-before-placement. I want to talk to him before we do any changes on main, perhaps it'll be a larger shift. |
Canaries for system jobs are placed on a
tg.update.canarypercent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment.The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called
evictUnneededCanariesis called (much likeevictAndPlaceis for honoring MaxParallel) which removes those canary placements that are not needed. We also change the behavior ofcomputePlacementswhich no longer performs node feasibility checks, as these are performed earlier for every allocation and node. This way we get accurate counts of all feasible nodes that let us correctly set deployment state fields.Fixes: #26885
Fixes: #26886