Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KNI] resync 20240624 #212

Merged
merged 13 commits into from
Jun 24, 2024
Merged

[KNI] resync 20240624 #212

merged 13 commits into from
Jun 24, 2024

Conversation

ffromani
Copy link
Member

resync upstream 20240624

googs1025 and others added 13 commits May 28, 2024 13:33
A upcoming change in the overreserve cache wants to
consume nodeconfig utilities, so in order to enable both
packages to consume this code, we move it in its own
package. Trivial code movement and only necessary interface
changes (renames/public <-> private symbols).

Signed-off-by: Francesco Romani <[email protected]>
create and require watchable client. The watchable client
is fully supported by the controller-runtime client and
it's a superset of the regular client, so this change
is fully backward compatible.

Signed-off-by: Francesco Romani <[email protected]>
In the NodeResourceTopologyMatch plugin we only need
to consume resources, never to change them.
So let's use a Reader instead of a full client.

Signed-off-by: Francesco Romani <[email protected]>
One of the key assumptions we took when designing the
NodeResourceTopology (NRT) plugin is that the kubelet
configuration of the worker node changes *very* rarely,
if at all, during the cluster lifetime.
As rule of thumb, it was expected to change with a
frequency of like once every quarter (3 months) or so,
and likely less often. So the event of changing during
a scheduling cycle was deemed extremely low.

However, we fail to notice kubelet configuration changes
(the bits we care reported by NRT data) and we update
them incidentally when resyncing the cache.

These updates are expected to be rare, but still failing
to noticing them is a  much worse issue: with out of
date configuration information, the scheduler plugin
will take wrong decisions until restarted.

Up until now, the mitigation was to restart the scheduler
once kubelet config changes; this works, but it is
unpractical and requires more orchestration.

We add the option to resync the NRT data if the attribute
change (and nothing else did) to overcome this limitation.
In the current framework, because of how controller-runtime/client
works, this will require a new connection to the apiserver to
watch the changes and react to them, adding the node
to the set of the one being resynced in the next resync iteration.

Even considering HA scheduler scenarios, this will cause
a very limited amount of new connections, and should not
cause any scalability concern. Nevertheless, we make
the feature opt-in.

Signed-off-by: Francesco Romani <[email protected]>
Switch the default and prefer correctness over
backward compatibility, because the previous
behavior was buggy (see: kubernetes-sigs#621)

Signed-off-by: Francesco Romani <[email protected]>
Convert update-codegen.sh to new codegen
…-updates

[noderesourcetopology] update NRT when attributes change
Signed-off-by: Francesco Romani <[email protected]>
@openshift-ci openshift-ci bot requested review from swatisehgal and yanirq June 24, 2024 12:18
Copy link

openshift-ci bot commented Jun 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2024
@yanirq
Copy link
Member

yanirq commented Jun 24, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 6833e49 into master Jun 24, 2024
7 checks passed
@ffromani ffromani deleted the resync-20240624 branch June 24, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants