diff --git a/keps/sig-scheduling/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md index 3cbe5895213..93db94618ab 100644 --- a/keps/sig-scheduling/4816-dra-prioritized-list/README.md +++ b/keps/sig-scheduling/4816-dra-prioritized-list/README.md @@ -91,6 +91,7 @@ tags, and then generate with `hack/update-toc.sh`. - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [Scheduler Implementation](#scheduler-implementation) + - [Scoring](#scoring) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit tests](#unit-tests) @@ -290,15 +291,13 @@ type `DeviceSubRequest`. The `DeviceSubRequest` type is similar to available when providing multiple alternatives. The list provided in the `FirstAvailable` field is considered a priority order, such that the scheduler will use the first entry in the list that satisfies the -requirements. +requirements. -DRA does not yet implement scoring, which means that -the selected devices might not be optimal. For example, if a prioritized -list is provided in a request, DRA might choose entry number two on node A, -even though entry number one would meet the requirements on node B. This is -consistent with current behavior in DRA where the first match will be chosen. -Scoring is something that will be implemented later, with early discussions -in https://github.com/kubernetes/enhancements/issues/4970 +DRA does not yet implement full scoring (tracked in +https://github.com/kubernetes/enhancements/issues/4970), but we will implement +a limited form of scoring for this feature. This is to make sure nodes which +can satisfy a claim with higher ranked subrequests are preferred over others. The +details are described in the [Scoring](#scoring) section. ### User Stories @@ -619,13 +618,11 @@ type DeviceRequest struct { // // This field may only be set in the entries of DeviceClaim.Requests. // - // DRA does not yet implement scoring, so the scheduler will - // select the first set of devices that satisfies all the - // requests in the claim. And if the requirements can - // be satisfied on more than one node, other scheduling features - // will determine which node is chosen. This means that the set of - // devices allocated to a claim might not be the optimal set - // available to the cluster. Scoring will be implemented later. + // DRA does not yet implement full scoring, but it implements limited + // scoring so that nodes that can satisfy high ranked subrequests are + // preferred over others. The node ultimately chosen also depends on + // other scheduling features, so it is not guaranteed that the node + // preferred by DRA is chosen. // // +optional // +oneOf=deviceRequestType @@ -899,6 +896,35 @@ would need a higher score, which currently is planned for beta of this feature. For alpha, the scheduler may still pick a node with a less preferred device, if there are nodes with each type of device available. +#### Scoring + +Full support for scoring in DRA is not in scope for this feature, but we will +implement limited scoring to make sure that nodes which can satisfy claims with +higher ranked subrequests are preferred over others. + +We will implement this by letting the dynamicresources scheduler plugin implement +the `Score` and `NormalizeScore` interfaces. + +The allocation result for each node will be given a score based on the ranking of +the chosen subrequests across all requests using the `FirstAvailable` field across +all claims referenced by the Pod. Since the number of subrequests for each request +is capped at 8, we will compute a score between 1 and 8 for each request, with 8 +being the best (i.e. the first option was chosen) and 1 if the 8th subrequest was +chosen. If there are more than one request using the `FirstAvailable` field the score from +all of them will be added up to get the score for the pod on the node. +Since +the score for every node is computed based on the same claims, we end up with a +ranking of the results from all nodes. + +We will implement the `NormalizeScore` interface to normalize the results. We aim +to do this in a way such that the worst selection possible (where the last subrequest +is chosen for all requests specifying the `FirstAvailable` field) is given a score of +0, and the best selection possible (where the first subrequest is chosen for all requests +specifying the `FirstAvailable` field) is given a score of 100. This makes sure that the +score accurately reflects "how much better" one alternative is compared to another. + +We will give the plugin a weight of 2 since it reflects scoring based on user preference. + ### Test Plan