[KEP-4816] Simple scoring for DRA Prioritized List feature #5633

mortent · 2025-10-07T20:56:09Z

One-line PR description: Add simple scoring for the DRA Prioritized List feature
Issue link: DRA: Prioritized Alternatives in Device Requests #4816

k8s-ci-robot · 2025-10-07T20:56:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mortent
Once this PR has been reviewed and has the lgtm label, please assign sanposhiho for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

johnbelamaric · 2025-10-08T19:45:07Z

LGTM from the DRA perspective. No additional PRR approval needed, the existing one is sufficient. So, this just needs SIG Scheduling approval.

macsko · 2025-10-08T13:00:03Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+We don't see a need to normalize the scores for now, but this might be needed when
+we implement more complicated scoring in the future.


If you don't normalize the score, it will result in low scores for the DRA plugin, and other plugins may have a much stronger influence on the placement decision.

Furthermore, what about the weight of the DRA plugin in the default scheduler config? Since this scoring is based on user preference, it might be worth setting the weight to >1.

Thanks for pointing this out. I've updated the design to describe how we should normalize the score and which weight we should use.

bart0sh · 2025-10-09T14:41:34Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

 DRA does not yet implement scoring, which means that
 the selected devices might not be optimal. For example, if a prioritized


Should this frase be changed since we're going to implement limited scoring? Should it refer to the Scoring chapter you're adding?

Thanks for pointing this out. I've rewritten this section.

keps/sig-scheduling/4816-dra-prioritized-list/README.md

bart0sh · 2025-10-09T14:54:31Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+all claims referenced by the Pod. Since the number of subrequests for each request
+is capped at 8, we will compute a score between 1 and 8 for each request, with 8
+being the best (i.e. the first option was chosen) and 1 if the 8th subrequest was
+chosen. We save the score of 0 in case we want to implement optional requests. Since


Should optional requests be given a score > 0 to give nodes with devices higher score comparing to nodes without them?

I'm not sure I follow. The idea is that any choice that ends up selecting devices should get a higher score than deciding not to select devices at all. The suggestion here only works if we allow for optional subrequests though. If we add support for making any request optional, we need to revisit the scoring anyway, since it requires that we provide a score for every request, regardless of whether it uses Prioritized List or not.

I was confused by reserving 0 (which for me means no scoring) to something that we potentially want to influence total node score. If we reserve something > 0, but don't use it for now, then we can use it, adding that value to the total node score. This would give a node slightly higher score if it has optional device. However, I'm still not sure I understand how can we use 0 in that case as 0 doesn't have any value. Am I missing something?

So what this logic essentially do, is just to create a score for a pod on every node where a higher number is better. So for every request that uses sub requests, we just give it a score from 1 to 8, and then sum it up across all requests and claims. The idea was that if we added support in the Prioritized List feature for optional requests (i.e. if none of the sub requests can be satisfied, we just don't allocate any), a score of zero would be worse than 1, which is the score for the last subrequest allowed. But this idea wasn't fully thought out and we will need a different change to the scoring logic if we decide to implement this, since we would need to score requests that have been satisfied higher than those that have not, regardless of whether Prioritized List is in use. So I've removed the sentence about the score of zero from the design. If we ever decide to implement optional requests, addressing scoring will have to be part of that design.

bart0sh · 2025-10-10T08:18:35Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+is capped at 8, we will compute a score between 1 and 8 for each request, with 8
+being the best (i.e. the first option was chosen) and 1 if the 8th subrequest was
+chosen. We save the score of 0 in case we want to implement optional requests. Since
+the score for every node is computed based on the same claims, we end up with a


Would it make sense to explain how the total node score is computed if pod requests multiple claims/devices? Is it a sum of scores for each claim, weighted sum, average etc?

I've added a sentence about this. I think we should just do a sum, since all scores for a single pod will have the same claims. And we will do normalization anyway to make sure the score falls within the allowed boundaries in the scheduling framework.

bart0sh · 2025-10-10T08:37:39Z

keps/sig-scheduling/4816-dra-prioritized-list/README.md

+The allocation result for each node will be given a score based on the ranking of
+the chosen subrequests across all requests using the `FirstAvailable` field across
+all claims referenced by the Pod. Since the number of subrequests for each request
+is capped at 8, we will compute a score between 1 and 8 for each request, with 8


Linear ranking might not match user intent. Would it make sense to use exponential ranking here, giving more priority to the nodes with higher ranked devices?

Are you suggesting that we should have something like that the lowest ranked option gets a score of 1, then 2, 4, 8, 16, ...? I did think about other ways to do ranking, but none seemed clearly better than linear ranking. As an example, if I have a claim with two requests, each with three subrequests, would an allocation where the first subrequest gets allocated on the first request and the third on the second request be better than the second on both? I think linear has be benefit that it is pretty easy to understand and reason about.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 7, 2025

k8s-ci-robot requested review from dom4ha and macsko October 7, 2025 20:56

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Oct 7, 2025

github-project-automation bot added this to SIG Scheduling Oct 7, 2025

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 7, 2025

mortent mentioned this pull request Oct 7, 2025

DRA: Prioritized Alternatives in Device Requests #4816

Open

8 tasks

[KEP-4816] Simple scoring for DRA Prioritized List feature

66b1bc9

mortent force-pushed the SimpleScoringForPrioritizedList branch from 80b8090 to 66b1bc9 Compare October 7, 2025 21:02

Update latest milestone

74588a8

macsko reviewed Oct 9, 2025

View reviewed changes

bart0sh reviewed Oct 9, 2025

View reviewed changes

keps/sig-scheduling/4816-dra-prioritized-list/README.md Outdated Show resolved Hide resolved

bart0sh reviewed Oct 9, 2025

View reviewed changes

Addressed comments

6ae8e04

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 9, 2025

mortent requested a review from macsko October 9, 2025 20:34

bart0sh reviewed Oct 10, 2025

View reviewed changes

Addressed comments

59fb547

		We don't see a need to normalize the scores for now, but this might be needed when
		we implement more complicated scoring in the future.

		DRA does not yet implement scoring, which means that
		the selected devices might not be optimal. For example, if a prioritized

[KEP-4816] Simple scoring for DRA Prioritized List feature #5633

Are you sure you want to change the base?

[KEP-4816] Simple scoring for DRA Prioritized List feature #5633

Uh oh!

Conversation

mortent commented Oct 7, 2025

Uh oh!

k8s-ci-robot commented Oct 7, 2025

Uh oh!

johnbelamaric commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bart0sh Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bart0sh Oct 10, 2025 •

edited

Loading