-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement operator logic #25
Implement operator logic #25
Conversation
a40ba29
to
5038e32
Compare
Signed-off-by: Artyom Lukianov <[email protected]>
Our controller will operate on different type of components: - macineconfigpool - machineconfig - kubeletconfig - featuregate - tuned This PR introduces helper methods to create, update and delete all required components. Signed-off-by: Artyom Lukianov <[email protected]>
The controller will watch resources: - PerformanceProfile - MachineConfigPool - MachineConfig - KubeletConfig - FeatureGate - Tuned once the PerformanceProfile created the controller will start reconcile loop and will create all needed resources for performance tuning and it will continue to save on desired state of all components, the only one way to update resources owned by controller is to update the PerformanceProfile resource. Signed-off-by: Artyom Lukianov <[email protected]>
By default the operator-sdk creates the cache only for resources from the operator namespace, but we have wide cluster resources. Signed-off-by: Artyom Lukianov <[email protected]>
5038e32
to
fd9054e
Compare
/cc @davidvossel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks nice, few minor questions inside
} | ||
|
||
if performanceProfile.Spec.CPU.NonIsolated == nil { | ||
return fmt.Errorf("you should provide non isolated CPU set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I mean, why can't we just compute the non-isolated set by difference? Or do we want to do it in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fromanirh welcome back:)
Ok, in general, we can have 3 different workflows under the host:
- OS workflows
- some user application workflows(not the k8s one)
- containers workflows
and for each of these workflows, we can have the different CPU sets (one of the CPU set can be a subset of the other, but we can not precisely calculate values).
For example:
We set reserved
to 0-3, it means these CPU's will not be used for container workflows, so these CPU's can be used or for OS workflows or for other user application that requires CPU isolation, and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated
), that leaves us with additional variable isolated
that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.
I hope it was clear enough, give me know if you need some additional information.
BTW, I do not say that we can not optimize it somehow, but for sure it will be part of another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated), that leaves us with additional variable isolated that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.
It seems like everything can be expressed with reserved
and isolated
example. 8 core machine with 1 reserved isolated core, 1 reserved non-isolated core, and 6 isolated cores for container workloads.
reserved: "0-1"
isolated: "1-7"
If you wanted to leave a general cores for container workloads as well, you could do something like this.
reserved: "0-1"
isolated: "1-5"
That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.
then non-isolated would be assumed to be "0,6-7"
would that optimization make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, it looks fine. I will give it another thought and will reflect it under the PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to leave a general cores for container workloads as well, you could do something like this.
reserved: "0-1" isolated: "1-5"
That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.
then non-isolated would be assumed to be "0,6-7"
On the second though, non-isolated in our context means that these CPUs will hardly be used by the operating system, so it does not good to use them for container workloads, so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.
reserved is 0-1. so 0 would be used for OS and 6-7 for containers. What am i missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any more thoughts here on removing non-isolated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so our assumption that nonIsolatedCpu's should be a subset of reserved
CPU's and does not have an intersection with isolated
Cpu's, I believe it will be correct in most cases, still, I feel unsure regarding this kind of calculations for all use cases
@fromanirh @vladikr Guys, do you have some thoughts?
@davidvossel Can we leave this discussion for another PR, to avoid unmerge of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take: first, let's move this (important) discussion to another issue/PR to let this PR move forward.
Second: isolated
and reserved
in the current implementation must not overlap. So the question boils down if we actually need cpus which are neiter isolated or reserved, which form a shared pool of cores on which container workload can run.
After a bit more thought I agree that non-reserved, non-isolated cores is a worthy concept, but we should present it nicely.
I think the takeaway is this little topic deserves more attention and perhaps a better abstraction, which is another hint we should move this discussion to a separate ticket.
// 2. None namespace | ||
namespaces := []string{ | ||
components.NamespaceNodeTuningOperator, | ||
metav1.NamespaceNone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a one-line comment to remind our future selves what "None namespace" means in this context? (e.g. watch all the namespaces, even though IIRC this is not the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
size: 3 | ||
cpu: | ||
isolated: "2-3" | ||
nonIsolated: "0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify the state of CPU 1? how could it be neither isolated
nor nonIsolated
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe even add a validation hook on that later on ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is just example that will be injected under the CSV and does not the real case
regarding validation webhook I left TODO
-
performance-addon-operators/pkg/controller/performanceprofile/performanceprofile_controller.go
Line 229 in fd9054e
// TODO: we need to check if all under performance profiles values != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, good enough for me now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO examples should be easy to understand, people will see and copy/paste this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slintes this CR example injected into CSV, so I do not add some comments to it(unsure regarding parsing), what example do you prefer?
Signed-off-by: Artyom Lukianov <[email protected]>
Signed-off-by: Artyom Lukianov <[email protected]>
Signed-off-by: Artyom Lukianov <[email protected]>
Tests run reconcile loop and verify: - components creation - components deletion - finalizer adding - finalizer removing - basic fields verification
fd9054e
to
4e7b6d9
Compare
W/A for https://bugzilla.redhat.com/show_bug.cgi?id=1787907 Signed-off-by: Artyom Lukianov <[email protected]>
ec3aa48
to
3bb4d65
Compare
It is the temporary W/A until https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed. Signed-off-by: Artyom Lukianov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great start! Here are a few comments/questions
@@ -1,4 +0,0 @@ | |||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought this was autogenerated. does 0.13.0 operator-sdk not generate this automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically it required only for manual deployment(not via OLM), CSV generator does not use service account resources at all. Give me know if you prefer to have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, okay. I actually like that these manifests are being removed then if we aren't using them directly. It forces everyone down the CSV path, which is what we want at the moment.
hack/clean-deploy.sh
Outdated
@@ -32,3 +32,5 @@ spec: | |||
publisher: Red Hat | |||
sourceType: grpc | |||
EOF | |||
|
|||
$OC_TOOL -n openshift-performance-addon delete csv --all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should be deleting the openshift-performance-addon namespace too here just to get rid of everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cynepco3hahue I don't see the ns deletion, forgot to commit / push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I waited for your review to commit all together:)
// TOOD: uncomment once https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed | ||
// FeatureGateLatencySensetiveName = "latency-sensetive" | ||
FeatureGateLatencySensetiveName = "cluster" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the cluster
feature set enable topology manager? How is this a workaround for that bz?
fyi, it's latency-sensitive
instead of latency-sensetive
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not feature set, it the name of the feature gate resource.
The initial idea was to create additional feature gate resource that should enable topology manager(that will be managed by us), but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster
feature gate resource)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster feature gate resource)
oh, so does this mean we can't enable topology manager until that BZ is resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, it just means that we should update the default feature gate resource cluster
instead of creating a new one(that can create some conflicts with user configuration, so I wanted to avoid it), also it should be updated before the kubelet config creation, see - https://bugzilla.redhat.com/show_bug.cgi?id=1788061#c3
mcpNew := e.ObjectOld.(*mcov1.MachineConfigPool) | ||
mcpOld := e.ObjectNew.(*mcov1.MachineConfigPool) | ||
|
||
mcpNew.Spec.Paused = mcpOld.Spec.Paused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this mutating a pointer that's going to be cached by the informer? We might need to do a deepcopy on the mcpNew object here. I'm not 100% certain though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add deep copy just to be sure:)
pod := newPodForCR(instance) | ||
if instance.DeletionTimestamp != nil { | ||
// delete components | ||
if err := r.deleteComponents(instance); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should wait until both all components are deleted and removed from the cluster before removing the finalizer. Right now I believe we're just issuing the delete. We should also be waiting for the components to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.
Waiting for deletion guarantees us that we have 100% successfully cleaned up all dependent resources before the parent resource is completely removed. Without this, sometimes issues during tear down are masked. CI wouldn't necessarily catch this because the parent resource was deleted while dependent resources might still remain in a pending deletion type phase.
so basically, it ensures our teardown logic works. otherwise these issues go un-noticed and only manifest themselves in odd situations where people attempt to delete a resource and then re-post it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// for now let's assume that all parameters needed for assets scrips are required | ||
if err := r.verifyPerformanceProfileParameters(instance); err != nil { | ||
// we do not want to reconcile again in case of error, because a user will need to update the PerformanceProfile anyway | ||
klog.Errorf("failed to reconcile: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to set this error on a Condition in the Status section as well as create an Event. otherwise it won't be obvious to the user that their CR isn't being reconciled because of an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave events and status sections for following PR's, this PR already too big?
Created:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good 👍
// we will need to fetch the name of the machine config pool on runtime | ||
updatedMcp, err := r.getMachineConfigPool(mcpName) | ||
if errors.IsNotFound(err) { | ||
klog.Warning("failed to set pause, the machine config pool does not exist, probably it was deleted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not 100% sure, but I think since the MCP is being watched, that the GET request will result in looking up the MCP in cache. if the MCP was just created and then paused, then attempting to do the GET here might fail because the cache hasn't populated.
just something to be aware of if you encounter this.
} | ||
|
||
// pause machine config pool | ||
if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to issu APIs call every time the reconcile loop pops unless something changes. We'll need to check to see if modifications are going to occur and only pause/unpause as modifications happen.
Also, this pause/unpause logic seems like it could have a race condition. For example, what guarantees us that the machine config operator sees that we paused a mcp before seeing a new machine config is posted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).
Regarding the race condition, you are right once we will have MaxConcurrentReconciles
bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents
method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).
What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.
so the flow would be something like this.
- look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.
- If not, fast succeed and move on to status calculations since there's nothing to be done
- else pause, apply changes, unpause then move on to status calculations.
I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.
In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus)
We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the race condition, you are right once we will have MaxConcurrentReconciles bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?
And with regards to the race condition.
What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.
What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?
Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the flow would be something like this.
look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.
- If not, fast succeed and move on to status calculations since there's nothing to be done
- else pause, apply changes, unpause then move on to status calculations.
@davidvossel But we already have this functionality because of custom update predicates, it just does not enter to the reconcile loop at all, if it wasn't important update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And with regards to the race condition.
What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.
What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?
Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.
Ah, I see, good point, but I unsure how we can avoid it, I can divide reconcile loop, something like:
- create or update MCP and pause it(
return reconcile.Result{Requeue: true, RequeueAfter: time.Second}
) - before creating all other components, check that MCP paused, if return again
reconcile.Result{Requeue: true, RequeueAfter: 10*time.Second}
- create all components
- unpause MCP
@davidvossel WDT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, the only thing I'd add is to only unpause after you observe all dependent resources have been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we unpause just when all mutated objects equal to nil
performance-addon-operators/pkg/controller/performanceprofile/performanceprofile_controller.go
Line 366 in c3dd292
if mcpMutated == nil && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 👍
Some comments inline.
cmd/manager/main.go
Outdated
printVersion() | ||
|
||
namespace, err := k8sutil.GetWatchNamespace() | ||
if err != nil { | ||
log.Error(err, "Failed to get watch namespace") | ||
klog.Error(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: klog.Exit(...) is a shortcut (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
size: 3 | ||
cpu: | ||
isolated: "2-3" | ||
nonIsolated: "0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO examples should be easy to understand, people will see and copy/paste this.
hack/clean-deploy.sh
Outdated
@@ -32,3 +32,5 @@ spec: | |||
publisher: Red Hat | |||
sourceType: grpc | |||
EOF | |||
|
|||
$OC_TOOL -n openshift-performance-addon delete csv --all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cynepco3hahue I don't see the ns deletion, forgot to commit / push?
// we want to initate reconcile loop only on change under labels or spec of the object | ||
p := predicate.Funcs{ | ||
UpdateFunc: func(e event.UpdateEvent) bool { | ||
if e.MetaOld == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did you get this from? I'm a bit surprised that this is necessary, looking at sigs.k8s.io/controller-runtime/pkg/source/internal/eventsource.go OnUpdate()
I think these fields are always filled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unsure when we will get this, but the default update predicate has same checks so I left it as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, works for me
return err | ||
} | ||
|
||
// we do not want initiate reconcile loop on the pause of machine config pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pause and configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"app": cr.Name, | ||
func (r *ReconcilePerformanceProfile) applyComponents(profle *performancev1alpha1.PerformanceProfile) error { | ||
// deploy machine config pool | ||
mcp := machineconfigpool.NewPerformance(profle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe NewPerformancePool
is a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have machineconfigpool
in the name of the package, see a nice link that @fedepaol sent me - https://blog.golang.org/package-names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see, interesting read about package names and New()
. Thanks!
But NewPerformance
still sounds a bit wrong to me... 🤔
Maybe go into the other direction and make the func name even shorter, only New()
?
Or sth different than New, e.g. FromProfile()
?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only New()
works for me, will change it
} | ||
|
||
// deploy machine config | ||
mc, err := machineconfig.NewPerformance(r.assetsDir, profle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewPerformanceConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the link above
} | ||
|
||
// deploy kubelet config | ||
kc := kubeletconfig.NewPerformance(profle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewPerformanceConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for following NewXyz() funcs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the link above
if err != nil { | ||
return err | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return r.client.Update(context.TODO(), updatedMcp) | ||
} | ||
|
||
func (r *ReconcilePerformanceProfile) verifyPerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not sure what the exact difference between the 2 words is, but mostly we talk about validate
instead of verify
(validation webhook) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@davidvossel @slintes A lot of thanks for the review! |
return err | ||
} | ||
|
||
updatedMcp.Spec.Paused = pause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only perform the update here if updatedMcp.Spec.Paused does not already match the desired pause value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
// pause machine config pool | ||
if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).
What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.
so the flow would be something like this.
- look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.
- If not, fast succeed and move on to status calculations since there's nothing to be done
- else pause, apply changes, unpause then move on to status calculations.
I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.
In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus)
We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.
Signed-off-by: Artyom Lukianov <[email protected]>
3084bbe
to
c3dd292
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much closer! a few more comments.
mutated.Spec = mcp.Spec | ||
|
||
// do not update the pause and configuration fields | ||
mutated.Spec.Paused = existing.Spec.Paused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this now. it shouldn't matter anymore. Reconciling based on whether the mcp is paused or not shouldn't impact us now with the new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure it correct. for example, our code will pause the machine config pool, but the MCP that we will generate will have spec.paused false, so it will recognize the change and will unpause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bug. the mcp we generate needs to reflect whether or not the pause is expected or not.
we can't prevent the reconcile loop from popping early, this logic sounds like we're depending on the reconcile loop not executing too early which would result in a early unpause.
} | ||
|
||
// update the machine config pool after that we paused it | ||
if mcpMutated != nil && !created { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets only update the mcp object once. If the mcp needs to be updated to be paused or updated for any other reason, that should be a single update.
writing to the mcp object to pause it, and then writing to it once again right afterwards for some other reason doesn't look right. Figure out what the state of the mcp object should be, and issue a single create or update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}, | ||
}, | ||
}, | ||
mcpMutated, created, err := r.getMutatedMachineConfigPool(mcp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the created
value here is confusing. created=false when mcp exists. and created=true when mcp needs to be created.
can we call created
something like nonCreated
, or reverse the meaning of created
to reflect the status of the object being observed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After merging pause
and createOrUpdate
of MCP I dropped it, I just overthought it X_X
} | ||
|
||
// create machine config pool and pause it | ||
if mcpMutated != nil && created { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we know the mcp needs to be created here, and we're just going to immediately pause it after creation, then why not post the manifest with paused=true to begin with?
so
if mcpMutated != nil && needsToBeCreated {
// post a new paused mcp.
} else {
// get and pause existing mcp
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
networkLatencyTunedMutated == nil && | ||
realTimeKernelTunedMutated == nil { | ||
// if nothing needed to be updated and machine config pool paused, unpause it | ||
mcpUpdate, err := r.getMachineConfigPool(mcp.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mcp object has references to objects in the status section.
the mcp.status.configuration.source
list. Can we look at that to make sure the mcp picked up all our configs before unpausing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm feature gate and kubelet config machine configs rendered always with the same name, so I unsure how can we check it, see - https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/controller/kubelet-config/helpers.go#L60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, we'd only be able to detect the new MC was applied?
here's an issue #35 lets follow up on this discussion and see if there's a way to detect when unpause is safe. or at least detect the mc we posted has been picked up before unpausing.
} | ||
|
||
if performanceProfile.Spec.CPU.NonIsolated == nil { | ||
return fmt.Errorf("you should provide non isolated CPU set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any more thoughts here on removing non-isolated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one more pass. The main thing that caught my eye this time was the realtime repo URL that's in the api.
If we can take that out of the api and make realtimeKernel simply a true|false boolean, I think that would be best. if we need the url still for testing, keep the env var.
I also had a few more minor comments. I'll look at this first thing in the morning tomorrow so we can make more progress asap. I know there's pressure to get this merged.
} | ||
|
||
// get mutated real time kernel tuned | ||
realTimeKernelTuned, err := tuned.NewWorkerRealTimeKernel(r.assetsDir, profile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tunnelling through the code a bit, i found that each place this assetsDir is called, we're reading in a file from disk. The way this is done means we're reading from disk every time the reconcile loop pops.
Can we either create an function to read all the assets into memory before starting the controller, or at least wrap the reads in some sort of sync.Once
function to ensure we aren't reading from disk every time this loop executes.
creating an issue to follow up on this is fine as well if you don't want to address it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed issue: #32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect, thanks
return fmt.Errorf("you should provide non isolated CPU set") | ||
} | ||
|
||
if performanceProfile.Spec.RealTimeKernel == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought realtimeKernel was going to be a boolean, and not a required option.
basically just, profile.Spec.Realtime=true/false
Also this RepoURL shouldn't be a part of the API. We don't expect users to supply the URL for the realtime kernel. If we need this for testing purposes for now, just use the environment variable or even an annotation if it needs to be associated with the profile object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidvossel According to Marc comment, can we leave it for following PR, because it will need change under scripts and API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, @slintes thanks for following up on this.
return true, r.client.Update(context.TODO(), mcp) | ||
} | ||
|
||
func (r *ReconcilePerformanceProfile) validatePerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only required argument here should be nodeSelector. The rest of the features seem like they could be completely independent from one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added all parameters that needed for scripts as required, otherwise, we can have some unpredicted behavior under our scripts. To change it we will need to change our scripts, so I prefer to leave it for another PR, opened #33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alrighty, as long as we're tracking this, thanks
} | ||
|
||
if r.isComponentsExist(instance) { | ||
return reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get an info level log message here indicating that deletion is pending dependent resources being removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
if updated { | ||
return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when RequeueAfter
is set, Reqeueu
is true implicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm wasn't aware of it, but I see under the controller code
} else if result.RequeueAfter > 0 {
// The result.RequeueAfter request will be lost, if it is returned
// along with a non-nil error. But this is intended as
// We need to drive to stable reconcile loops before queuing due
// to result.RequestAfter
c.Queue.Forget(obj)
c.Queue.AddAfter(req, result.RequeueAfter)
ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "requeue_after").Inc()
return true
}
Will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also in the doc of the Result
struct ;)
|
||
// we want to give time to the machine-config controllers to get updated values | ||
if updated { | ||
return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when RequeueAfter
is set, Reqeueu
is true implicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
if performanceProfile.Spec.CPU.NonIsolated == nil { | ||
return fmt.Errorf("you should provide non isolated CPU set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...
return fmt.Errorf("you should provide non isolated CPU set") | ||
} | ||
|
||
if performanceProfile.Spec.RealTimeKernel == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.
d3d46fd
to
58e7782
Compare
According to discussion Pause the machine config pool only when it neccessary Split the reconcile loop to phases Creation: - add finalizer - check if MCP exists, if no create it - pause MCP, and wait for some time specified under the code to give to machine-config controllers time to get the update - if needed updated MCP, and all other resources - after creation or update, wait again - unpause MCP Deletion: - check if MCP exists - if yes, pause the MCP and wait some time - delete all components - wait until all components will be deleted - remove finalizer All wait cycles implemenetd via requeue with `Result{Requeue: true, RequeueAfter: 10 * time.Second}` Signed-off-by: Artyom Lukianov <[email protected]>
58e7782
to
6be8f31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
there are some optimizations being done with the requeueAfter logic that we may find don't always produce consistent results for us, but I think this is a good start.
great work @cynepco3hahue 👍
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cynepco3hahue, davidvossel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cynepco3hahue: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Have a custom pool that matches two out of three nodes.
The controller will watch resources:
PerformanceProfile
MachineConfigPool
MachineConfig
KubeletConfig
FeatureGate
Tuned
once the PerformanceProfile created the controller will start to reconcile
and will create all needed resources for performance tuning and it will continue
to save on the desired state of all components, the only way to update resources
owned by the controller is to update the PerformanceProfile resource.
I tried to make the controller less noisy as possible, that the reason why I provided custom predicates for all resources except
PerformanceProfile
it due to the fact that for all other resources we have openshift controllers that can update different fields of these resources.An additional thing that mentions saying, I changed the scope of
PerformanceProfile
from namespaced to the cluster, because we can not set owner reference on cluster scope resources if the owner will have namespaced scope.