-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Controller Revision (Implementation of KEP #238) #277
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Edwinhr716 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round, I didn't look at tests.
log := ctrl.LoggerFrom(ctx).WithValues("leaderworkerset", klog.KObj(lws)) | ||
ctx = ctrl.LoggerInto(ctx, log) | ||
controllerHistory := history.NewHistory(k8sClient, ctx) | ||
selector, err := metav1.LabelSelectorAsSelector(&metav1.LabelSelector{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be selecting the revisions of this specific lws? An empty selector selects everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the lws name as a label to the revision, and only select the revisions of that specific lws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments on simplifying the history pkg
@@ -288,3 +293,24 @@ func MakeLeaderPodSpecWithTPUResource() corev1.PodSpec { | |||
Subdomain: "default", | |||
} | |||
} | |||
|
|||
func RawLWSTemplate(lws *leaderworkerset.LeaderWorkerSet) runtime.RawExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should be using a function like this one in our tests; our tests shouldn't rely on a copy the function that we partially try to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is not needed, but there are cases where we do need to duplicate functions as otherwise a dependency cycle is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the commented out code please.
Addressed majority of comments, still need to add more tests and rebase to be ready to merge |
Great, i will look later today, please ensure we have sufficient integration test coverage |
…ented TruncateHistory and fixed getPatch
e674b72
to
8d3da19
Compare
@Edwinhr716: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
} | ||
|
||
func (r *LeaderWorkerSetReconciler) leaderWorkerSetUpdated(ctx context.Context, sts *appsv1.StatefulSet, lws *leaderworkerset.LeaderWorkerSet) (bool, error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove empty line
return ctrl.Result{}, err | ||
} | ||
|
||
if err := r.createControllerRevisionIfNonExist(ctx, leaderSts, lws); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return the revision that the function below leaderWorkerSetUpdated
and rollingUpdateParameters
will use.
// Creates a Controller Revision if the leader statefulset exists but no revisions have been created yet. This happens when updating from a version that doesn't | ||
// support controller revision | ||
func (r *LeaderWorkerSetReconciler) createControllerRevisionIfNonExist(ctx context.Context, sts *appsv1.StatefulSet, lws *leaderworkerset.LeaderWorkerSet) error { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove the empty line
return err | ||
} | ||
|
||
if !existingControllerRevisions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an existing controller revision doesn't mean we shouldn't create, what if there is no one with a matching hash label?
return revisions[0], nil | ||
} | ||
|
||
func ExistingControllerRevisions(ctx context.Context, k8sClient client.Client, lws *leaderworkerset.LeaderWorkerSet) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this function, the function above that looks up a revision of a specific hash should be the one we use.
return ctrl.Result{}, err | ||
} | ||
|
||
templateHash := getLeaderWorkerTemplateHash(leaderSts, lws, lwsUpdated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not create a template hash here, it should come from the revision returned as suggested in line 109 above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still need a hashing function that treats NetworkConfig == nil
and NetworkConfig.SubdomainShared
as the same though?
When updating from a version that doesn't have networkConfig the steps would be:
Create a controller revision where the NetworkConfig si nil. When checking if LWS was updated, it will determine that it isn't because the hash is the same.
Another change is made, similar to #240 (comment) triggering the lws webhook, and defaulting NetworkConfig. A new controller revision will be created, as the data will now be different. Now, the hash won't match the hash that is in the leaderSts, triggering a rolling update when it isn't supposed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not comparing hashes anymore though, the hash is just a key to the controller revision. We depend on semantic diff of the revisions.
if err := revisionutils.CreateLeaderWorkerSetRevision(ctx, r.Client, lws, templateHash); err != nil { | ||
log.Error(err, "Creating LWS Revision") | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be creating here, the creation will be done at the beginning of the reconcile function via createControllerRevisionIfNonExist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this one + removing templateHash
requires changes to the way we set the templateHash
label in the revision. With these changes, we rely on leader sts to determine what templateHash to use. That is why createControllerRevisionIfNonExist
checks if the leaderSts exists before attempting to create a new one. When a leaderSts doesn't exist however, we need to determine what templateHash to use. But the templateHash depends on a revision being created.
So in the case where the leaderSts doesn't exist yet, we would need to set the value of templateHash
to the hashed controller revision, basically having two labels with the same value.
Or, we can change the controller.kubernetes.io/hash
label to be our selector instead. In the case where we pass a templateHash, we set that label equal to the templateHash value.
@@ -387,6 +410,9 @@ func (r *LeaderWorkerSetReconciler) updateConditions(ctx context.Context, lws *l | |||
conditions = append(conditions, makeCondition(leaderworkerset.LeaderWorkerSetUpgradeInProgress)) | |||
} else if updatedAndReadyCount == int(*lws.Spec.Replicas) { | |||
conditions = append(conditions, makeCondition(leaderworkerset.LeaderWorkerSetAvailable)) | |||
if err := revisionutils.TruncateHistory(ctx, r.Client, lws, templateHash); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow up, we can store the current and updated revisions for debugging purposes. The current one can be set here, while the updated one can be set in the if block above.
@@ -118,8 +119,12 @@ func (r *PodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.R | |||
log.V(2).Info("defer the creation of the worker statefulset because leader pod is not ready.") | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
statefulSet, err := constructWorkerStatefulSetApplyConfiguration(pod, leaderWorkerSet) | |||
currentRevision, err := revisionutils.GetLeaderWorkerSetRevisionFromTemplateHash(ctx, r.Client, &leaderWorkerSet, pod.Labels[leaderworkerset.TemplateRevisionHashKey]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, I would not call it current because it could be current or the updated one, call it revision.
@@ -288,3 +293,24 @@ func MakeLeaderPodSpecWithTPUResource() corev1.PodSpec { | |||
Subdomain: "default", | |||
} | |||
} | |||
|
|||
func RawLWSTemplate(lws *leaderworkerset.LeaderWorkerSet) runtime.RawExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the commented out code please.
What type of PR is this?
/kind feature
What this PR does / why we need it
This PR adds controller revision. This allows the controller to store previous iterations of the LWS object, which then makes it possible to select the correct pod template spec to use if a replica is restarted during rolling update.
Which issue(s) this PR fixes
Fixes #238
Fixes #239
Fixes #240
Fixes #281
Special notes for your reviewer
Does this PR introduce a user-facing change?