Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement operator logic #25

Conversation

cynepco3hahue
Copy link
Contributor

@cynepco3hahue cynepco3hahue commented Dec 31, 2019

The controller will watch resources:

  • PerformanceProfile
  • MachineConfigPool
  • MachineConfig
  • KubeletConfig
  • FeatureGate
  • Tuned

once the PerformanceProfile created the controller will start to reconcile
and will create all needed resources for performance tuning and it will continue
to save on the desired state of all components, the only way to update resources
owned by the controller is to update the PerformanceProfile resource.

I tried to make the controller less noisy as possible, that the reason why I provided custom predicates for all resources except PerformanceProfile it due to the fact that for all other resources we have openshift controllers that can update different fields of these resources.

An additional thing that mentions saying, I changed the scope of PerformanceProfile from namespaced to the cluster, because we can not set owner reference on cluster scope resources if the owner will have namespaced scope.

  • test creation of all components
  • test deletion of all components
  • provide unit tests
  • integrate CI with the operator to deploy all components(I believe I will do it under additional PR)
  • provide functional test(I believe I will do it under additional PR)

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 31, 2019
@cynepco3hahue cynepco3hahue mentioned this pull request Jan 2, 2020
6 tasks
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from a40ba29 to 5038e32 Compare January 2, 2020 14:35
Artyom Lukianov added 4 commits January 2, 2020 16:40
Our controller will operate on different type of components:
- macineconfigpool
- machineconfig
- kubeletconfig
- featuregate
- tuned

This PR introduces helper methods to create, update and delete all required components.

Signed-off-by: Artyom Lukianov <[email protected]>
The controller will watch resources:

- PerformanceProfile
- MachineConfigPool
- MachineConfig
- KubeletConfig
- FeatureGate
- Tuned

once the PerformanceProfile created the controller will start reconcile loop
and will create all needed resources for performance tuning and it will continue
to save on desired state of all components, the only one way to update resources
owned by controller is to update the PerformanceProfile resource.

Signed-off-by: Artyom Lukianov <[email protected]>
By default the operator-sdk creates the cache only for resources
from the operator namespace, but we have wide cluster resources.

Signed-off-by: Artyom Lukianov <[email protected]>
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from 5038e32 to fd9054e Compare January 2, 2020 14:42
@cynepco3hahue
Copy link
Contributor Author

/cc @davidvossel

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice, few minor questions inside

}

if performanceProfile.Spec.CPU.NonIsolated == nil {
return fmt.Errorf("you should provide non isolated CPU set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I mean, why can't we just compute the non-isolated set by difference? Or do we want to do it in a future PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fromanirh welcome back:)
Ok, in general, we can have 3 different workflows under the host:

  • OS workflows
  • some user application workflows(not the k8s one)
  • containers workflows

and for each of these workflows, we can have the different CPU sets (one of the CPU set can be a subset of the other, but we can not precisely calculate values).

For example:
We set reserved to 0-3, it means these CPU's will not be used for container workflows, so these CPU's can be used or for OS workflows or for other user application that requires CPU isolation, and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated), that leaves us with additional variable isolated that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.

I hope it was clear enough, give me know if you need some additional information.

BTW, I do not say that we can not optimize it somehow, but for sure it will be part of another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated), that leaves us with additional variable isolated that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.

It seems like everything can be expressed with reserved and isolated

example. 8 core machine with 1 reserved isolated core, 1 reserved non-isolated core, and 6 isolated cores for container workloads.

reserved: "0-1"
isolated: "1-7"

If you wanted to leave a general cores for container workloads as well, you could do something like this.

reserved: "0-1"
isolated: "1-5"

That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.

then non-isolated would be assumed to be "0,6-7"

would that optimization make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, it looks fine. I will give it another thought and will reflect it under the PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to leave a general cores for container workloads as well, you could do something like this.

reserved: "0-1"
isolated: "1-5"

That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.

then non-isolated would be assumed to be "0,6-7"

On the second though, non-isolated in our context means that these CPUs will hardly be used by the operating system, so it does not good to use them for container workloads, so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.

reserved is 0-1. so 0 would be used for OS and 6-7 for containers. What am i missing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any more thoughts here on removing non-isolated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so our assumption that nonIsolatedCpu's should be a subset of reserved CPU's and does not have an intersection with isolated Cpu's, I believe it will be correct in most cases, still, I feel unsure regarding this kind of calculations for all use cases
@fromanirh @vladikr Guys, do you have some thoughts?

@davidvossel Can we leave this discussion for another PR, to avoid unmerge of this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take: first, let's move this (important) discussion to another issue/PR to let this PR move forward.

Second: isolated and reserved in the current implementation must not overlap. So the question boils down if we actually need cpus which are neiter isolated or reserved, which form a shared pool of cores on which container workload can run.
After a bit more thought I agree that non-reserved, non-isolated cores is a worthy concept, but we should present it nicely.

I think the takeaway is this little topic deserves more attention and perhaps a better abstraction, which is another hint we should move this discussion to a separate ticket.

// 2. None namespace
namespaces := []string{
components.NamespaceNodeTuningOperator,
metav1.NamespaceNone,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a one-line comment to remind our future selves what "None namespace" means in this context? (e.g. watch all the namespaces, even though IIRC this is not the case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

size: 3
cpu:
isolated: "2-3"
nonIsolated: "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify the state of CPU 1? how could it be neither isolated nor nonIsolated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe even add a validation hook on that later on ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just example that will be injected under the CSV and does not the real case
regarding validation webhook I left TODO -

// TODO: we need to check if all under performance profiles values != nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, good enough for me now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO examples should be easy to understand, people will see and copy/paste this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slintes this CR example injected into CSV, so I do not add some comments to it(unsure regarding parsing), what example do you prefer?

Artyom Lukianov added 4 commits January 5, 2020 10:31
Signed-off-by: Artyom Lukianov <[email protected]>
Tests run reconcile loop and verify:
- components creation
- components deletion
- finalizer adding
- finalizer removing
- basic fields verification
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from fd9054e to 4e7b6d9 Compare January 5, 2020 08:32
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from ec3aa48 to 3bb4d65 Compare January 5, 2020 14:14
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start! Here are a few comments/questions

@@ -1,4 +0,0 @@
apiVersion: v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought this was autogenerated. does 0.13.0 operator-sdk not generate this automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it required only for manual deployment(not via OLM), CSV generator does not use service account resources at all. Give me know if you prefer to have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay. I actually like that these manifests are being removed then if we aren't using them directly. It forces everyone down the CSV path, which is what we want at the moment.

@@ -32,3 +32,5 @@ spec:
publisher: Red Hat
sourceType: grpc
EOF

$OC_TOOL -n openshift-performance-addon delete csv --all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should be deleting the openshift-performance-addon namespace too here just to get rid of everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cynepco3hahue I don't see the ns deletion, forgot to commit / push?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I waited for your review to commit all together:)

// TOOD: uncomment once https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed
// FeatureGateLatencySensetiveName = "latency-sensetive"
FeatureGateLatencySensetiveName = "cluster"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the cluster feature set enable topology manager? How is this a workaround for that bz?

fyi, it's latency-sensitive instead of latency-sensetive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not feature set, it the name of the feature gate resource.
The initial idea was to create additional feature gate resource that should enable topology manager(that will be managed by us), but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster feature gate resource)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed typo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster feature gate resource)

oh, so does this mean we can't enable topology manager until that BZ is resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, it just means that we should update the default feature gate resource cluster instead of creating a new one(that can create some conflicts with user configuration, so I wanted to avoid it), also it should be updated before the kubelet config creation, see - https://bugzilla.redhat.com/show_bug.cgi?id=1788061#c3

mcpNew := e.ObjectOld.(*mcov1.MachineConfigPool)
mcpOld := e.ObjectNew.(*mcov1.MachineConfigPool)

mcpNew.Spec.Paused = mcpOld.Spec.Paused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this mutating a pointer that's going to be cached by the informer? We might need to do a deepcopy on the mcpNew object here. I'm not 100% certain though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add deep copy just to be sure:)

pod := newPodForCR(instance)
if instance.DeletionTimestamp != nil {
// delete components
if err := r.deleteComponents(instance); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wait until both all components are deleted and removed from the cluster before removing the finalizer. Right now I believe we're just issuing the delete. We should also be waiting for the components to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.

Waiting for deletion guarantees us that we have 100% successfully cleaned up all dependent resources before the parent resource is completely removed. Without this, sometimes issues during tear down are masked. CI wouldn't necessarily catch this because the parent resource was deleted while dependent resources might still remain in a pending deletion type phase.

so basically, it ensures our teardown logic works. otherwise these issues go un-noticed and only manifest themselves in odd situations where people attempt to delete a resource and then re-post it again.

Copy link
Contributor Author

@cynepco3hahue cynepco3hahue Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// for now let's assume that all parameters needed for assets scrips are required
if err := r.verifyPerformanceProfileParameters(instance); err != nil {
// we do not want to reconcile again in case of error, because a user will need to update the PerformanceProfile anyway
klog.Errorf("failed to reconcile: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set this error on a Condition in the Status section as well as create an Event. otherwise it won't be obvious to the user that their CR isn't being reconciled because of an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave events and status sections for following PR's, this PR already too big?
Created:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good 👍

// we will need to fetch the name of the machine config pool on runtime
updatedMcp, err := r.getMachineConfigPool(mcpName)
if errors.IsNotFound(err) {
klog.Warning("failed to set pause, the machine config pool does not exist, probably it was deleted")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not 100% sure, but I think since the MCP is being watched, that the GET request will result in looking up the MCP in cache. if the MCP was just created and then paused, then attempting to do the GET here might fail because the cache hasn't populated.

just something to be aware of if you encounter this.

}

// pause machine config pool
if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to issu APIs call every time the reconcile loop pops unless something changes. We'll need to check to see if modifications are going to occur and only pause/unpause as modifications happen.

Also, this pause/unpause logic seems like it could have a race condition. For example, what guarantees us that the machine config operator sees that we paused a mcp before seeing a new machine config is posted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

Regarding the race condition, you are right once we will have MaxConcurrentReconciles bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.

so the flow would be something like this.

  • look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.
    • If not, fast succeed and move on to status calculations since there's nothing to be done
    • else pause, apply changes, unpause then move on to status calculations.

I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.

In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus) We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the race condition, you are right once we will have MaxConcurrentReconciles bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?

And with regards to the race condition.

What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.

What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?

Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the flow would be something like this.

  • look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.

    • If not, fast succeed and move on to status calculations since there's nothing to be done
    • else pause, apply changes, unpause then move on to status calculations.

@davidvossel But we already have this functionality because of custom update predicates, it just does not enter to the reconcile loop at all, if it wasn't important update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And with regards to the race condition.

What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.

What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?

Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.

Ah, I see, good point, but I unsure how we can avoid it, I can divide reconcile loop, something like:

  • create or update MCP and pause it(return reconcile.Result{Requeue: true, RequeueAfter: time.Second})
  • before creating all other components, check that MCP paused, if return again reconcile.Result{Requeue: true, RequeueAfter: 10*time.Second}
  • create all components
  • unpause MCP
    @davidvossel WDT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the only thing I'd add is to only unpause after you observe all dependent resources have been created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we unpause just when all mutated objects equal to nil

, it means that the object that we get from the cache and the object that we expect to have it the same, so it means that all objects already exist, or do I miss something?

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👍
Some comments inline.

printVersion()

namespace, err := k8sutil.GetWatchNamespace()
if err != nil {
log.Error(err, "Failed to get watch namespace")
klog.Error(err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: klog.Exit(...) is a shortcut (same below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

size: 3
cpu:
isolated: "2-3"
nonIsolated: "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO examples should be easy to understand, people will see and copy/paste this.

@@ -32,3 +32,5 @@ spec:
publisher: Red Hat
sourceType: grpc
EOF

$OC_TOOL -n openshift-performance-addon delete csv --all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cynepco3hahue I don't see the ns deletion, forgot to commit / push?

// we want to initate reconcile loop only on change under labels or spec of the object
p := predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
if e.MetaOld == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you get this from? I'm a bit surprised that this is necessary, looking at sigs.k8s.io/controller-runtime/pkg/source/internal/eventsource.go OnUpdate() I think these fields are always filled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unsure when we will get this, but the default update predicate has same checks so I left it as it is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, works for me

return err
}

// we do not want initiate reconcile loop on the pause of machine config pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause and configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"app": cr.Name,
func (r *ReconcilePerformanceProfile) applyComponents(profle *performancev1alpha1.PerformanceProfile) error {
// deploy machine config pool
mcp := machineconfigpool.NewPerformance(profle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe NewPerformancePool is a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have machineconfigpool in the name of the package, see a nice link that @fedepaol sent me - https://blog.golang.org/package-names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, interesting read about package names and New(). Thanks!
But NewPerformance still sounds a bit wrong to me... 🤔
Maybe go into the other direction and make the func name even shorter, only New()?
Or sth different than New, e.g. FromProfile()?
WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only New() works for me, will change it

}

// deploy machine config
mc, err := machineconfig.NewPerformance(r.assetsDir, profle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewPerformanceConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the link above

}

// deploy kubelet config
kc := kubeletconfig.NewPerformance(profle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewPerformanceConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for following NewXyz() funcs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the link above

if err != nil {
return err
}
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return r.client.Update(context.TODO(), updatedMcp)
}

func (r *ReconcilePerformanceProfile) verifyPerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure what the exact difference between the 2 words is, but mostly we talk about validate instead of verify (validation webhook) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@cynepco3hahue
Copy link
Contributor Author

@davidvossel @slintes A lot of thanks for the review!

return err
}

updatedMcp.Spec.Paused = pause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only perform the update here if updatedMcp.Spec.Paused does not already match the desired pause value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// pause machine config pool
if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.

so the flow would be something like this.

  • look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.
    • If not, fast succeed and move on to status calculations since there's nothing to be done
    • else pause, apply changes, unpause then move on to status calculations.

I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.

In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus) We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.

Signed-off-by: Artyom Lukianov <[email protected]>
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from 3084bbe to c3dd292 Compare January 8, 2020 17:27
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much closer! a few more comments.

mutated.Spec = mcp.Spec

// do not update the pause and configuration fields
mutated.Spec.Paused = existing.Spec.Paused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this now. it shouldn't matter anymore. Reconciling based on whether the mcp is paused or not shouldn't impact us now with the new logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure it correct. for example, our code will pause the machine config pool, but the MCP that we will generate will have spec.paused false, so it will recognize the change and will unpause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bug. the mcp we generate needs to reflect whether or not the pause is expected or not.

we can't prevent the reconcile loop from popping early, this logic sounds like we're depending on the reconcile loop not executing too early which would result in a early unpause.

}

// update the machine config pool after that we paused it
if mcpMutated != nil && !created {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets only update the mcp object once. If the mcp needs to be updated to be paused or updated for any other reason, that should be a single update.

writing to the mcp object to pause it, and then writing to it once again right afterwards for some other reason doesn't look right. Figure out what the state of the mcp object should be, and issue a single create or update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
},
},
mcpMutated, created, err := r.getMutatedMachineConfigPool(mcp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the created value here is confusing. created=false when mcp exists. and created=true when mcp needs to be created.

can we call created something like nonCreated, or reverse the meaning of created to reflect the status of the object being observed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging pause and createOrUpdate of MCP I dropped it, I just overthought it X_X

}

// create machine config pool and pause it
if mcpMutated != nil && created {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we know the mcp needs to be created here, and we're just going to immediately pause it after creation, then why not post the manifest with paused=true to begin with?

so

if mcpMutated != nil && needsToBeCreated {
// post a new paused mcp. 
} else {
// get and pause existing mcp
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

networkLatencyTunedMutated == nil &&
realTimeKernelTunedMutated == nil {
// if nothing needed to be updated and machine config pool paused, unpause it
mcpUpdate, err := r.getMachineConfigPool(mcp.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mcp object has references to objects in the status section.

the mcp.status.configuration.source list. Can we look at that to make sure the mcp picked up all our configs before unpausing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm feature gate and kubelet config machine configs rendered always with the same name, so I unsure how can we check it, see - https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/controller/kubelet-config/helpers.go#L60

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, we'd only be able to detect the new MC was applied?

here's an issue #35 lets follow up on this discussion and see if there's a way to detect when unpause is safe. or at least detect the mc we posted has been picked up before unpausing.

}

if performanceProfile.Spec.CPU.NonIsolated == nil {
return fmt.Errorf("you should provide non isolated CPU set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any more thoughts here on removing non-isolated?

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did one more pass. The main thing that caught my eye this time was the realtime repo URL that's in the api.

If we can take that out of the api and make realtimeKernel simply a true|false boolean, I think that would be best. if we need the url still for testing, keep the env var.

I also had a few more minor comments. I'll look at this first thing in the morning tomorrow so we can make more progress asap. I know there's pressure to get this merged.

}

// get mutated real time kernel tuned
realTimeKernelTuned, err := tuned.NewWorkerRealTimeKernel(r.assetsDir, profile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tunnelling through the code a bit, i found that each place this assetsDir is called, we're reading in a file from disk. The way this is done means we're reading from disk every time the reconcile loop pops.

Can we either create an function to read all the assets into memory before starting the controller, or at least wrap the reads in some sort of sync.Once function to ensure we aren't reading from disk every time this loop executes.

creating an issue to follow up on this is fine as well if you don't want to address it in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed issue: #32

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, thanks

return fmt.Errorf("you should provide non isolated CPU set")
}

if performanceProfile.Spec.RealTimeKernel == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought realtimeKernel was going to be a boolean, and not a required option.

basically just, profile.Spec.Realtime=true/false

Also this RepoURL shouldn't be a part of the API. We don't expect users to supply the URL for the realtime kernel. If we need this for testing purposes for now, just use the environment variable or even an annotation if it needs to be associated with the profile object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidvossel According to Marc comment, can we leave it for following PR, because it will need change under scripts and API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, @slintes thanks for following up on this.

return true, r.client.Update(context.TODO(), mcp)
}

func (r *ReconcilePerformanceProfile) validatePerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only required argument here should be nodeSelector. The rest of the features seem like they could be completely independent from one another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added all parameters that needed for scripts as required, otherwise, we can have some unpredicted behavior under our scripts. To change it we will need to change our scripts, so I prefer to leave it for another PR, opened #33

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alrighty, as long as we're tracking this, thanks

}

if r.isComponentsExist(instance) {
return reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get an info level log message here indicating that deletion is pending dependent resources being removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

if updated {
return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when RequeueAfter is set, Reqeueu is true implicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm wasn't aware of it, but I see under the controller code

} else if result.RequeueAfter > 0 {
		// The result.RequeueAfter request will be lost, if it is returned
		// along with a non-nil error. But this is intended as
		// We need to drive to stable reconcile loops before queuing due
		// to result.RequestAfter
		c.Queue.Forget(obj)
		c.Queue.AddAfter(req, result.RequeueAfter)
		ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "requeue_after").Inc()
		return true
	}

Will remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also in the doc of the Result struct ;)


// we want to give time to the machine-config controllers to get updated values
if updated {
return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when RequeueAfter is set, Reqeueu is true implicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

if performanceProfile.Spec.CPU.NonIsolated == nil {
return fmt.Errorf("you should provide non isolated CPU set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...

return fmt.Errorf("you should provide non isolated CPU set")
}

if performanceProfile.Spec.RealTimeKernel == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.

According to discussion
Pause the machine config pool only when it neccessary

Split the reconcile loop to phases
Creation:
- add finalizer
- check if MCP exists, if no create it
- pause MCP, and wait for some time specified under the code
to give to machine-config controllers time to get the update
- if needed updated MCP, and all other resources
- after creation or update, wait again
- unpause MCP

Deletion:
- check if MCP exists
- if yes, pause the MCP and wait some time
- delete all components
- wait until all components will be deleted
- remove finalizer

All wait cycles implemenetd via requeue with `Result{Requeue: true, RequeueAfter: 10 * time.Second}`

Signed-off-by: Artyom Lukianov <[email protected]>
@cynepco3hahue cynepco3hahue force-pushed the implement_operator_logic branch from 58e7782 to 6be8f31 Compare January 9, 2020 15:07
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

there are some optimizations being done with the requeueAfter logic that we may find don't always produce consistent results for us, but I think this is a good start.

great work @cynepco3hahue 👍

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue, davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cynepco3hahue,davidvossel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit bb2157f into openshift-kni:master Jan 9, 2020
@openshift-ci-robot
Copy link
Collaborator

@cynepco3hahue: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp 6be8f31 link /test e2e-gcp

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dshchedr pushed a commit to dshchedr/performance-addon-operators that referenced this pull request Mar 26, 2020
Have a custom pool that matches two out of three nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants