scheduler: perform feasibility checks for system canaries before computing placements #26953

pkazmierczak · 2025-10-15T18:20:42Z

Canaries for system jobs are placed on a tg.update.canary percent of eligible nodes. Some of these nodes may not be feasible, and until now we removed infeasible nodes during placement computation. However, if it happens to be that the first eligible node we picked to place a canary on is infeasible, this will lead to the scheduler halting deployment.

The solution presented here simplifies canary deployments: initially, system jobs that use canary updates get allocations placed on all eligible nodes, but before we start computing actual placements, a method called evictUnneededCanaries is called (much like evictAndPlace is for honoring MaxParallel) which removes those canary placements that are not needed. We also change the behavior of computePlacements which no longer performs node feasibility checks, as these are performed earlier for every allocation and node. This way we get accurate counts of all feasible nodes that let us correctly set deployment state fields.

Fixes: #26885
Fixes: #26886

scheduler/scheduler_system.go

Two groups on the same job cannot both have a static port assignment, but this ends up getting configured in the update block test for system deployments. This test setup bug has complicated landing the fix in #26953.

An attempt to work up some failing unit tests for #26885 and #26886. Ref: #26885 Ref: #26886

scheduler/util.go

scheduler/scheduler_system.go

… state Previously we copied the behavior found in the generic scheduler, where we rely on reconciler results to decide if there's enough placements made. In the system scheduler we always know exactly how many placements there should be based on the DesiredTotal field of the deployment state, so a better way to check completeness of the deployment is to simplify it and base on dstate alone.

…iler In contrast to the cluster reconciler, in the node reconciler we go node-by-node as opposed to alloc-by-alloc, and thus the state of the reconciler has to be managed differently. If we override old deployments on every run of `cancelUnneededDeployments`, we end up with unnecessarily created deployments for job version that already had them.

incorrect condition for detecting jobspec change on ineligible nodes

…bleNodes

tgross

LGTM. Let's get this merged and then we can mop-up and remaining minor issues.

tgross · 2025-10-31T19:03:37Z

scheduler/scheduler_system.go


 // computePlacements computes placements for allocations
-func (s *SystemScheduler) computePlacements(place []reconciler.AllocTuple, existingByTaskGroup map[string]bool) error {
+func (s *SystemScheduler) computePlacements(


For cleanup later: this no longer matches the behavior of the other schedulers' computePlacements method. We should consider renaming this to something that describes what's happening here as distinct from that.

good point. would you say placeAllocs is more apt?

I guess I think of "place" as "find a node for this alloc" and we've already done that. But I don't off the top of my head have a better verb.

tgross · 2025-10-31T19:10:53Z

scheduler/scheduler_system.go

+			// we should have an entry for every node that is looked
+			// up. if we don't, something must be wrong
+			s.logger.Error("failed to locate node feasibility information",
+				"node-id", node.ID, "task_group", tgName)
+			// provide a stubbed metric to work with
+			metrics = &structs.AllocMetric{}


The comment confused me a bit with the if option == nil block below:

Suggested change

// we should have an entry for every node that is looked

// up. if we don't, something must be wrong

s.logger.Error("failed to locate node feasibility information",

"node-id", node.ID, "task_group", tgName)

// provide a stubbed metric to work with

metrics = &structs.AllocMetric{}

// we should have an entry for every node that is looked up

// (potentially with a nil value). if we don't, something must be wrong

s.logger.Error("failed to locate node feasibility information",

"node-id", node.ID, "task_group", tgName)

// provide a stubbed metric to work with

metrics = &structs.AllocMetric{}

I checked coverage and there's no test case that hits this code path. Are we sure it's reachable?

tgross · 2025-10-31T19:25:21Z

scheduler/scheduler_system.go

+type taskGroupNodes []*taskGroupNode
+
+// feasible returns all taskGroupNode that are feasible for placement
+func (t taskGroupNodes) feasible() (feasibleNodes []*taskGroupNode) {


The total number of nodes may be large (100s or 1000s). We probably should initialize the capacity here so that we're not having to perform a bunch of small reallocations (ex https://go.dev/play/p/EEMwoeGObjs)

tgross · 2025-11-04T13:58:34Z

scheduler/reconciler/reconcile_node.go

+	return result
+}
+
+// cancelUnneededServiceDeployments cancels any deployment that is not needed.


Suggested change

// cancelUnneededServiceDeployments cancels any deployment that is not needed.

// cancelUnneededSystemDeployments cancels any deployment that is not needed.

ivan-kiselev · 2025-11-04T18:02:59Z

👏👏

pkazmierczak · 2025-11-05T08:57:58Z

we can mop-up and remaining minor issues

I was about to do some mopping but I saw Chris has some interesting ideas on how to refactor the whole feasibility-check-before-placement. I want to talk to him before we do any changes on main, perhaps it'll be a larger shift.

vercel bot deployed to Preview – nomad-ui October 15, 2025 18:21 View deployment

vercel bot deployed to Preview – nomad-ui October 15, 2025 18:23 View deployment

vercel bot deployed to Preview – nomad-ui October 16, 2025 16:37 View deployment

vercel bot deployed to Preview – nomad-ui October 17, 2025 15:43 View deployment

tgross reviewed Oct 17, 2025

View reviewed changes

scheduler/scheduler_system.go Outdated Show resolved Hide resolved

scheduler/scheduler_system.go Show resolved Hide resolved

pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from b057f84 to 88f1fb4 Compare October 23, 2025 14:40

vercel bot deployed to Preview – nomad-ui October 23, 2025 14:41 View deployment

vercel bot deployed to Preview – nomad-ui October 23, 2025 15:13 View deployment

vercel bot deployed to Preview – nomad-ui October 23, 2025 15:21 View deployment

tgross reviewed Oct 23, 2025

View reviewed changes

scheduler/scheduler_system.go Outdated Show resolved Hide resolved

tgross reviewed Oct 23, 2025

View reviewed changes

scheduler/scheduler_system.go Outdated Show resolved Hide resolved

vercel bot deployed to Preview – nomad-ui October 24, 2025 17:51 View deployment

pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from 838bcd8 to e1234c1 Compare October 28, 2025 08:30

vercel bot deployed to Preview – nomad-ui October 28, 2025 08:31 View deployment

pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from e1234c1 to a6cd581 Compare October 28, 2025 17:33

vercel bot deployed to Preview – nomad-ui October 28, 2025 17:34 View deployment

pkazmierczak mentioned this pull request Oct 28, 2025

scheduler: system deployments need to always take into account infeasible nodes #26890

Closed

tgross mentioned this pull request Oct 28, 2025

system deployment tests: fix port collisions #27004

Merged

vercel bot deployed to Preview – nomad-ui October 28, 2025 18:29 View deployment

pkazmierczak force-pushed the f-system-deployments-canaries-evict-refactor branch from 697d0ff to 0c70ac8 Compare October 28, 2025 18:29

vercel bot deployed to Preview – nomad-ui October 28, 2025 18:30 View deployment

vercel bot deployed to Preview – nomad-ui October 28, 2025 18:32 View deployment

tgross mentioned this pull request Oct 28, 2025

system deployment tests: fix and annotate counts #27006

Merged

vercel bot deployed to Preview – nomad-ui October 28, 2025 18:51 View deployment

vercel bot deployed to Preview – nomad-ui October 28, 2025 19:16 View deployment

vercel bot deployed to Preview – nomad-ui October 28, 2025 19:25 View deployment

vercel bot deployed to Preview – nomad-ui October 29, 2025 10:25 View deployment

vercel bot deployed to Preview – nomad-ui October 29, 2025 13:17 View deployment

pkazmierczak requested a review from jrasell October 31, 2025 08:11

tgross and others added 2 commits October 31, 2025 09:17

system deployments: failing tests

fcbe34e

An attempt to work up some failing unit tests for #26885 and #26886. Ref: #26885 Ref: #26886

system scheduler: handle empty deployment states correctly

91c7acc

vercel bot deployed to Preview – nomad-ui October 31, 2025 10:11 View deployment

jrasell reviewed Oct 31, 2025

View reviewed changes

scheduler/util.go Show resolved Hide resolved

scheduler/scheduler_system.go Outdated Show resolved Hide resolved

scheduler/scheduler_system.go Show resolved Hide resolved

scheduler/scheduler_system.go Show resolved Hide resolved

comments from @jrasell

6fe7a98

vercel bot deployed to Preview – nomad-ui October 31, 2025 13:53 View deployment

system scheduler: reset eligibility when selecting nodes

d55ab6c

vercel bot deployed to Preview – nomad-ui October 31, 2025 17:24 View deployment

vercel bot deployed to Preview – nomad-ui October 31, 2025 17:42 View deployment

tgross mentioned this pull request Oct 31, 2025

system deployments: failing tests #26891

Closed

system scheduler: remove obsolete limitReached property

06ebff0

vercel bot deployed to Preview – nomad-ui November 3, 2025 09:15 View deployment

vercel bot deployed to Preview – nomad-ui November 3, 2025 20:46 View deployment

chrisroberts added 2 commits November 4, 2025 08:57

system scheduler: unset current deployment when canceling

f8fbb95

system scheduler: stop ineligible allocs if job modified

c844d13

vercel bot deployed to Preview – nomad-ui November 4, 2025 07:59 View deployment

pkazmierczak added 2 commits November 4, 2025 09:52

system scheduler: unflake TestSystemSched_evictUnneededCanaries

bde8049

system scheduler: correct a typo in the node reconciler

690e265

incorrect condition for detecting jobspec change on ineligible nodes

vercel bot deployed to Preview – nomad-ui November 4, 2025 09:34 View deployment

e2e: correction to TestSystemScheduler/testCanaryDeploymentToAllEligi…

f9b6c1f

…bleNodes

vercel bot deployed to Preview – nomad-ui November 4, 2025 11:04 View deployment

tgross approved these changes Nov 4, 2025

View reviewed changes

pkazmierczak merged commit 00e69d9 into main Nov 4, 2025
40 checks passed

pkazmierczak deleted the f-system-deployments-canaries-evict-refactor branch November 4, 2025 14:10

pkazmierczak mentioned this pull request Nov 4, 2025

system scheduler: account for alloc resources when checking node feasibility #27055

Merged

	// cancelUnneededServiceDeployments cancels any deployment that is not needed.
	// cancelUnneededSystemDeployments cancels any deployment that is not needed.

scheduler: perform feasibility checks for system canaries before computing placements #26953

scheduler: perform feasibility checks for system canaries before computing placements #26953

Conversation

pkazmierczak commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgross left a comment

Choose a reason for hiding this comment

Uh oh!

tgross Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pkazmierczak Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ivan-kiselev commented Nov 4, 2025

Uh oh!

pkazmierczak commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pkazmierczak commented Oct 15, 2025 •

edited

Loading