Implement operator logic #25

cynepco3hahue · 2019-12-31T11:32:50Z

The controller will watch resources:

PerformanceProfile
MachineConfigPool
MachineConfig
KubeletConfig
FeatureGate
Tuned

once the PerformanceProfile created the controller will start to reconcile
and will create all needed resources for performance tuning and it will continue
to save on the desired state of all components, the only way to update resources
owned by the controller is to update the PerformanceProfile resource.

I tried to make the controller less noisy as possible, that the reason why I provided custom predicates for all resources except PerformanceProfile it due to the fact that for all other resources we have openshift controllers that can update different fields of these resources.

An additional thing that mentions saying, I changed the scope of PerformanceProfile from namespaced to the cluster, because we can not set owner reference on cluster scope resources if the owner will have namespaced scope.

test creation of all components
test deletion of all components
provide unit tests
integrate CI with the operator to deploy all components(I believe I will do it under additional PR)
provide functional test(I believe I will do it under additional PR)

Signed-off-by: Artyom Lukianov <[email protected]>

Our controller will operate on different type of components: - macineconfigpool - machineconfig - kubeletconfig - featuregate - tuned This PR introduces helper methods to create, update and delete all required components. Signed-off-by: Artyom Lukianov <[email protected]>

The controller will watch resources: - PerformanceProfile - MachineConfigPool - MachineConfig - KubeletConfig - FeatureGate - Tuned once the PerformanceProfile created the controller will start reconcile loop and will create all needed resources for performance tuning and it will continue to save on desired state of all components, the only one way to update resources owned by controller is to update the PerformanceProfile resource. Signed-off-by: Artyom Lukianov <[email protected]>

By default the operator-sdk creates the cache only for resources from the operator namespace, but we have wide cluster resources. Signed-off-by: Artyom Lukianov <[email protected]>

cynepco3hahue · 2020-01-02T14:42:57Z

/cc @davidvossel

ffromani

looks nice, few minor questions inside

ffromani · 2020-01-02T14:46:33Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	if performanceProfile.Spec.CPU.NonIsolated == nil {
+		return fmt.Errorf("you should provide non isolated CPU set")


Why? I mean, why can't we just compute the non-isolated set by difference? Or do we want to do it in a future PR?

@fromanirh welcome back:)
Ok, in general, we can have 3 different workflows under the host:

OS workflows

some user application workflows(not the k8s one)

containers workflows

and for each of these workflows, we can have the different CPU sets (one of the CPU set can be a subset of the other, but we can not precisely calculate values).

For example:
We set reserved to 0-3, it means these CPU's will not be used for container workflows, so these CPU's can be used or for OS workflows or for other user application that requires CPU isolation, and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated), that leaves us with additional variable isolated that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.

I hope it was clear enough, give me know if you need some additional information.

BTW, I do not say that we can not optimize it somehow, but for sure it will be part of another PR.

and now we should separate CPU's that will be used for OS workflows and another user application, so we can now have CPU's 0-1 for OS workflows(nonIsolated), that leaves us with additional variable isolated that we can set to 2-5, it means that we have CPU's 2-3 for another user application that requires isolated CPU's and CPU's 4-5 for container workloads that requires isolated CPU's.

It seems like everything can be expressed with reserved and isolated

example. 8 core machine with 1 reserved isolated core, 1 reserved non-isolated core, and 6 isolated cores for container workloads.

reserved: "0-1" isolated: "1-7"

If you wanted to leave a general cores for container workloads as well, you could do something like this.

reserved: "0-1" isolated: "1-5"

That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.

then non-isolated would be assumed to be "0,6-7"

would that optimization make sense?

At first glance, it looks fine. I will give it another thought and will reflect it under the PR :)

If you wanted to leave a general cores for container workloads as well, you could do something like this.

reserved: "0-1" isolated: "1-5"

That would be 1 isolated core for reserved, 1 non-isolated core for reserved, 4 isolated cores for container workloads, and 2 non-isolated cores for container workloads.

then non-isolated would be assumed to be "0,6-7"

On the second though, non-isolated in our context means that these CPUs will hardly be used by the operating system, so it does not good to use them for container workloads, so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.

so you still need the non-isolated parameter to specify what part of 0,6-7 will be used for OS workflows.

reserved is 0-1. so 0 would be used for OS and 6-7 for containers. What am i missing here?

do we have any more thoughts here on removing non-isolated?

your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...

ok so our assumption that nonIsolatedCpu's should be a subset of reserved CPU's and does not have an intersection with isolated Cpu's, I believe it will be correct in most cases, still, I feel unsure regarding this kind of calculations for all use cases
@fromanirh @vladikr Guys, do you have some thoughts?

@davidvossel Can we leave this discussion for another PR, to avoid unmerge of this one?

My take: first, let's move this (important) discussion to another issue/PR to let this PR move forward.

Second: isolated and reserved in the current implementation must not overlap. So the question boils down if we actually need cpus which are neiter isolated or reserved, which form a shared pool of cores on which container workload can run.
After a bit more thought I agree that non-reserved, non-isolated cores is a worthy concept, but we should present it nicely.

I think the takeaway is this little topic deserves more attention and perhaps a better abstraction, which is another hint we should move this discussion to a separate ticket.

ffromani · 2020-01-02T14:48:04Z

cmd/manager/main.go

+	// 2. None namespace
+	namespaces := []string{
+		components.NamespaceNodeTuningOperator,
+		metav1.NamespaceNone,


Could you please add a one-line comment to remind our future selves what "None namespace" means in this context? (e.g. watch all the namespaces, even though IIRC this is not the case)

ffromani · 2020-01-02T14:50:14Z

deploy/crds/performance.openshift.io_v1alpha1_performanceprofile_cr.yaml

-  size: 3
+  cpu:
+    isolated: "2-3"
+    nonIsolated: "0"


Could you please clarify the state of CPU 1? how could it be neither isolated nor nonIsolated?

maybe even add a validation hook on that later on ?

it is just example that will be injected under the CSV and does not the real case
regarding validation webhook I left TODO -

performance-addon-operators/pkg/controller/performanceprofile/performanceprofile_controller.go

Line 229 in fd9054e

// TODO: we need to check if all under performance profiles values != nil

Ack, good enough for me now

IMHO examples should be easy to understand, people will see and copy/paste this.

@slintes this CR example injected into CSV, so I do not add some comments to it(unsure regarding parsing), what example do you prefer?

Signed-off-by: Artyom Lukianov <[email protected]>

Tests run reconcile loop and verify: - components creation - components deletion - finalizer adding - finalizer removing - basic fields verification

W/A for https://bugzilla.redhat.com/show_bug.cgi?id=1787907 Signed-off-by: Artyom Lukianov <[email protected]>

It is the temporary W/A until https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed. Signed-off-by: Artyom Lukianov <[email protected]>

davidvossel

great start! Here are a few comments/questions

davidvossel · 2020-01-06T20:10:23Z

deploy/service_account.yaml

@@ -1,4 +0,0 @@
-apiVersion: v1


i thought this was autogenerated. does 0.13.0 operator-sdk not generate this automatically?

technically it required only for manual deployment(not via OLM), CSV generator does not use service account resources at all. Give me know if you prefer to have it.

ah, okay. I actually like that these manifests are being removed then if we aren't using them directly. It forces everyone down the CSV path, which is what we want at the moment.

davidvossel · 2020-01-06T20:11:55Z

hack/clean-deploy.sh

@@ -32,3 +32,5 @@ spec:
  publisher: Red Hat
  sourceType: grpc
 EOF
+
+$OC_TOOL -n openshift-performance-addon delete csv --all


maybe we should be deleting the openshift-performance-addon namespace too here just to get rid of everything?

@cynepco3hahue I don't see the ns deletion, forgot to commit / push?

Yes, I waited for your review to commit all together:)

davidvossel · 2020-01-06T20:22:43Z

pkg/controller/performanceprofile/components/consts.go

+	// TOOD: uncomment once https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed
+	// FeatureGateLatencySensetiveName = "latency-sensetive"
+	FeatureGateLatencySensetiveName = "cluster"


does the cluster feature set enable topology manager? How is this a workaround for that bz?

fyi, it's latency-sensitive instead of latency-sensetive.

it does not feature set, it the name of the feature gate resource.
The initial idea was to create additional feature gate resource that should enable topology manager(that will be managed by us), but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster feature gate resource)

but because of the bug, the kubelet controller does not render correct machine config(it uses only default cluster feature gate resource)

oh, so does this mean we can't enable topology manager until that BZ is resolved?

nope, it just means that we should update the default feature gate resource cluster instead of creating a new one(that can create some conflicts with user configuration, so I wanted to avoid it), also it should be updated before the kubelet config creation, see - https://bugzilla.redhat.com/show_bug.cgi?id=1788061#c3

davidvossel · 2020-01-06T20:36:38Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+			mcpNew := e.ObjectOld.(*mcov1.MachineConfigPool)
+			mcpOld := e.ObjectNew.(*mcov1.MachineConfigPool)
+
+			mcpNew.Spec.Paused = mcpOld.Spec.Paused


is this mutating a pointer that's going to be cached by the informer? We might need to do a deepcopy on the mcpNew object here. I'm not 100% certain though.

will add deep copy just to be sure:)

davidvossel · 2020-01-06T20:45:49Z

pkg/controller/performanceprofile/performanceprofile_controller.go

-	pod := newPodForCR(instance)
+	if instance.DeletionTimestamp != nil {
+		// delete components
+		if err := r.deleteComponents(instance); err != nil {


I think we should wait until both all components are deleted and removed from the cluster before removing the finalizer. Right now I believe we're just issuing the delete. We should also be waiting for the components to be removed.

mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.

mm I do not remember that under some operator we waited for the real deletion, but if you insist I can add it.

Waiting for deletion guarantees us that we have 100% successfully cleaned up all dependent resources before the parent resource is completely removed. Without this, sometimes issues during tear down are masked. CI wouldn't necessarily catch this because the parent resource was deleted while dependent resources might still remain in a pending deletion type phase.

so basically, it ensures our teardown logic works. otherwise these issues go un-noticed and only manifest themselves in odd situations where people attempt to delete a resource and then re-post it again.

davidvossel · 2020-01-06T20:47:43Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	// for now let's assume that all parameters needed for assets scrips are required
+	if err := r.verifyPerformanceProfileParameters(instance); err != nil {
+		// we do not want to reconcile again in case of error, because a user will need to update the PerformanceProfile anyway
+		klog.Errorf("failed to reconcile: %v", err)


we need to set this error on a Condition in the Status section as well as create an Event. otherwise it won't be obvious to the user that their CR isn't being reconciled because of an error.

Can we leave events and status sections for following PR's, this PR already too big?
Created:

Add events to the performance profile controller #29

Add status section to the performance profile resource #30

sounds good 👍

davidvossel · 2020-01-06T20:50:30Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	// we will need to fetch the name of the machine config pool on runtime
+	updatedMcp, err := r.getMachineConfigPool(mcpName)
+	if errors.IsNotFound(err) {
+		klog.Warning("failed to set pause, the machine config pool does not exist, probably it was deleted")


i'm not 100% sure, but I think since the MCP is being watched, that the GET request will result in looking up the MCP in cache. if the MCP was just created and then paused, then attempting to do the GET here might fail because the cache hasn't populated.

just something to be aware of if you encounter this.

davidvossel · 2020-01-06T21:01:19Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// pause machine config pool
+	if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil {


we don't want to issu APIs call every time the reconcile loop pops unless something changes. We'll need to check to see if modifications are going to occur and only pause/unpause as modifications happen.

Also, this pause/unpause logic seems like it could have a race condition. For example, what guarantees us that the machine config operator sees that we paused a mcp before seeing a new machine config is posted?

What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

Regarding the race condition, you are right once we will have MaxConcurrentReconciles bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?

What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.

so the flow would be something like this.

look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.

If not, fast succeed and move on to status calculations since there's nothing to be done

else pause, apply changes, unpause then move on to status calculations.

I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.

In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus) We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.

Regarding the race condition, you are right once we will have MaxConcurrentReconciles bigger than one we will have troubles, but to be honest I do not have a good solution for it.
I can lock with the mutex applyComponents method, but it the same as running reconcile only with the one thread, and you can not really know if it is an additional update in progress.
For now, I will add a lock, but maybe do you have better ideas?

And with regards to the race condition.

What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.

What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?

Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.

so the flow would be something like this.

look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.

If not, fast succeed and move on to status calculations since there's nothing to be done

else pause, apply changes, unpause then move on to status calculations.

@davidvossel But we already have this functionality because of custom update predicates, it just does not enter to the reconcile loop at all, if it wasn't important update.

And with regards to the race condition.

What I was referring to isn't related to our reconcile loop. Regardless of how many concurrent reconcile loops we allow, we are guaranteed a single key is only being acted on by a single reconcile execution. There's no parallel execution of the same key, only parallel execution of separate keys.

What I was referring to is actually the machine config operator's reconcile loop. Is there anything guaranteeing us that the operator will be notified that we paused a pool before it's notified we posted new machine configs or kubelet configs?

Basically, do we have any guarantees that the order we post changes in our reconcile loop matches the order another operator will receive them in? so, If I post resource A then resource B, is it possible for another operator to observe resource B's creation before resource A? if resources A and B are of different kinds, i think this might actually be possible.

Ah, I see, good point, but I unsure how we can avoid it, I can divide reconcile loop, something like:

create or update MCP and pause it(return reconcile.Result{Requeue: true, RequeueAfter: time.Second})

before creating all other components, check that MCP paused, if return again reconcile.Result{Requeue: true, RequeueAfter: 10*time.Second}

create all components

unpause MCP
@davidvossel WDT?

Looks good, the only thing I'd add is to only unpause after you observe all dependent resources have been created.

Hm, we unpause just when all mutated objects equal to nil

performance-addon-operators/pkg/controller/performanceprofile/performanceprofile_controller.go

Line 366 in c3dd292

if mcpMutated == nil &&

, it means that the object that we get from the cache and the object that we expect to have it the same, so it means that all objects already exist, or do I miss something?

slintes

Great work 👍
Some comments inline.

slintes · 2020-01-07T10:00:35Z

cmd/manager/main.go

 	printVersion()

 	namespace, err := k8sutil.GetWatchNamespace()
 	if err != nil {
-		log.Error(err, "Failed to get watch namespace")
+		klog.Error(err.Error())


nit: klog.Exit(...) is a shortcut (same below)

slintes · 2020-01-07T10:07:03Z

deploy/crds/performance.openshift.io_v1alpha1_performanceprofile_cr.yaml

-  size: 3
+  cpu:
+    isolated: "2-3"
+    nonIsolated: "0"


IMHO examples should be easy to understand, people will see and copy/paste this.

slintes · 2020-01-07T10:11:07Z

hack/clean-deploy.sh

@@ -32,3 +32,5 @@ spec:
  publisher: Red Hat
  sourceType: grpc
 EOF
+
+$OC_TOOL -n openshift-performance-addon delete csv --all


@cynepco3hahue I don't see the ns deletion, forgot to commit / push?

slintes · 2020-01-07T12:43:14Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	// we want to initate reconcile loop only on change under labels or spec of the object
+	p := predicate.Funcs{
+		UpdateFunc: func(e event.UpdateEvent) bool {
+			if e.MetaOld == nil {


where did you get this from? I'm a bit surprised that this is necessary, looking at sigs.k8s.io/controller-runtime/pkg/source/internal/eventsource.go OnUpdate() I think these fields are always filled?

I unsure when we will get this, but the default update predicate has same checks so I left it as it is

ack, works for me

slintes · 2020-01-07T12:54:29Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		return err
+	}
+
+	// we do not want initiate reconcile loop on the pause of machine config pool


pause and configuration?

slintes · 2020-01-07T13:09:25Z

pkg/controller/performanceprofile/performanceprofile_controller.go

-		"app": cr.Name,
+func (r *ReconcilePerformanceProfile) applyComponents(profle *performancev1alpha1.PerformanceProfile) error {
+	// deploy machine config pool
+	mcp := machineconfigpool.NewPerformance(profle)


maybe NewPerformancePool is a better name?

we already have machineconfigpool in the name of the package, see a nice link that @fedepaol sent me - https://blog.golang.org/package-names

ah, I see, interesting read about package names and New(). Thanks!
But NewPerformance still sounds a bit wrong to me... 🤔
Maybe go into the other direction and make the func name even shorter, only New()?
Or sth different than New, e.g. FromProfile()?
WDYT?

Only New() works for me, will change it

slintes · 2020-01-07T13:10:55Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// deploy machine config
+	mc, err := machineconfig.NewPerformance(r.assetsDir, profle)


NewPerformanceConfig?

see the link above

slintes · 2020-01-07T13:12:11Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// deploy kubelet config
+	kc := kubeletconfig.NewPerformance(profle)


NewPerformanceConfig?

same for following NewXyz() funcs...

see the link above

slintes · 2020-01-07T13:15:14Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	if err != nil {
+		return err
+	}
+	if err != nil {


duplicate :)

slintes · 2020-01-07T13:17:12Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	return r.client.Update(context.TODO(), updatedMcp)
+}
+
+func (r *ReconcilePerformanceProfile) verifyPerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error {


nit: not sure what the exact difference between the 2 words is, but mostly we talk about validate instead of verify (validation webhook) 🤔

cynepco3hahue · 2020-01-07T13:51:10Z

@davidvossel @slintes A lot of thanks for the review!

davidvossel · 2020-01-07T15:15:00Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		return err
+	}
+
+	updatedMcp.Spec.Paused = pause


only perform the update here if updatedMcp.Spec.Paused does not already match the desired pause value.

davidvossel · 2020-01-07T15:22:44Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// pause machine config pool
+	if err := r.pauseMachineConfigPool(mcp.Name, true); err != nil {


What do you mean by modifications? Modifications to performance profile resource? We anyway run a very small number of reconciling loops because of our predicates(only when spec or labels was updated, so we really pause only on important modifications).

What I'm suggesting is to determine up front whether any config changes will occur that involve a restart, and only perform the "pause/unpause" logic when it's detected that changes will occur.

so the flow would be something like this.

look though cache to determine if any machine configs, kubelet configs, etc... either need to be posted or updated.

If not, fast succeed and move on to status calculations since there's nothing to be done

else pause, apply changes, unpause then move on to status calculations.

I know what I'm suggesting is kind of a pain, the issue is if we get into an error loop or some other unexpected reconcile loop spin, things like this mean we spam the api server with writes.

In practice we need to only GET or POST to the api server when it's absolutely necessary. This is why we do things like only posting an update to the status section when if ! reflect.DeepEquals(newStatus, oldStatus) We're avoiding a post to the API server that would otherwise occur every reconcile loop if we didn't determine that the change wasn't necessary by looking at cache.

Signed-off-by: Artyom Lukianov <[email protected]>

davidvossel

much closer! a few more comments.

davidvossel · 2020-01-08T21:22:53Z

pkg/controller/performanceprofile/resources.go

+	mutated.Spec = mcp.Spec
+
+	// do not update the pause and configuration fields
+	mutated.Spec.Paused = existing.Spec.Paused


I think we should remove this now. it shouldn't matter anymore. Reconciling based on whether the mcp is paused or not shouldn't impact us now with the new logic.

I am unsure it correct. for example, our code will pause the machine config pool, but the MCP that we will generate will have spec.paused false, so it will recognize the change and will unpause.

that's a bug. the mcp we generate needs to reflect whether or not the pause is expected or not.

we can't prevent the reconcile loop from popping early, this logic sounds like we're depending on the reconcile loop not executing too early which would result in a early unpause.

davidvossel · 2020-01-08T21:35:36Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// update the machine config pool after that we paused it
+	if mcpMutated != nil && !created {


lets only update the mcp object once. If the mcp needs to be updated to be paused or updated for any other reason, that should be a single update.

writing to the mcp object to pause it, and then writing to it once again right afterwards for some other reason doesn't look right. Figure out what the state of the mcp object should be, and issue a single create or update.

davidvossel · 2020-01-08T21:38:28Z

pkg/controller/performanceprofile/performanceprofile_controller.go

-				},
-			},
-		},
+	mcpMutated, created, err := r.getMutatedMachineConfigPool(mcp)


the created value here is confusing. created=false when mcp exists. and created=true when mcp needs to be created.

can we call created something like nonCreated, or reverse the meaning of created to reflect the status of the object being observed?

After merging pause and createOrUpdate of MCP I dropped it, I just overthought it X_X

davidvossel · 2020-01-08T21:41:33Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// create machine config pool and pause it
+	if mcpMutated != nil && created {


if we know the mcp needs to be created here, and we're just going to immediately pause it after creation, then why not post the manifest with paused=true to begin with?

so

if mcpMutated != nil && needsToBeCreated { // post a new paused mcp. } else { // get and pause existing mcp }

davidvossel · 2020-01-08T21:50:56Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		networkLatencyTunedMutated == nil &&
+		realTimeKernelTunedMutated == nil {
+		// if nothing needed to be updated and machine config pool paused, unpause it
+		mcpUpdate, err := r.getMachineConfigPool(mcp.Name)


the mcp object has references to objects in the status section.

the mcp.status.configuration.source list. Can we look at that to make sure the mcp picked up all our configs before unpausing it?

mm feature gate and kubelet config machine configs rendered always with the same name, so I unsure how can we check it, see - https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/controller/kubelet-config/helpers.go#L60

so, we'd only be able to detect the new MC was applied?

here's an issue #35 lets follow up on this discussion and see if there's a way to detect when unpause is safe. or at least detect the mc we posted has been picked up before unpausing.

davidvossel · 2020-01-08T21:51:41Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	if performanceProfile.Spec.CPU.NonIsolated == nil {
+		return fmt.Errorf("you should provide non isolated CPU set")


do we have any more thoughts here on removing non-isolated?

davidvossel

I did one more pass. The main thing that caught my eye this time was the realtime repo URL that's in the api.

If we can take that out of the api and make realtimeKernel simply a true|false boolean, I think that would be best. if we need the url still for testing, keep the env var.

I also had a few more minor comments. I'll look at this first thing in the morning tomorrow so we can make more progress asap. I know there's pressure to get this merged.

davidvossel · 2020-01-09T00:38:13Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	// get mutated real time kernel tuned
+	realTimeKernelTuned, err := tuned.NewWorkerRealTimeKernel(r.assetsDir, profile)


Tunnelling through the code a bit, i found that each place this assetsDir is called, we're reading in a file from disk. The way this is done means we're reading from disk every time the reconcile loop pops.

Can we either create an function to read all the assets into memory before starting the controller, or at least wrap the reads in some sort of sync.Once function to ensure we aren't reading from disk every time this loop executes.

creating an issue to follow up on this is fine as well if you don't want to address it in this PR.

Filed issue: #32

perfect, thanks

davidvossel · 2020-01-09T00:48:03Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		return fmt.Errorf("you should provide non isolated CPU set")
+	}
+
+	if performanceProfile.Spec.RealTimeKernel == nil {


I thought realtimeKernel was going to be a boolean, and not a required option.

basically just, profile.Spec.Realtime=true/false

Also this RepoURL shouldn't be a part of the API. We don't expect users to supply the URL for the realtime kernel. If we need this for testing purposes for now, just use the environment variable or even an annotation if it needs to be associated with the profile object.

@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.

@davidvossel According to Marc comment, can we leave it for following PR, because it will need change under scripts and API?

ack, @slintes thanks for following up on this.

davidvossel · 2020-01-09T00:52:14Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	return true, r.client.Update(context.TODO(), mcp)
+}
+
+func (r *ReconcilePerformanceProfile) validatePerformanceProfileParameters(performanceProfile *performancev1alpha1.PerformanceProfile) error {


I think the only required argument here should be nodeSelector. The rest of the features seem like they could be completely independent from one another.

I added all parameters that needed for scripts as required, otherwise, we can have some unpredicted behavior under our scripts. To change it we will need to change our scripts, so I prefer to leave it for another PR, opened #33

alrighty, as long as we're tracking this, thanks

davidvossel · 2020-01-09T00:53:32Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		}
+
+		if r.isComponentsExist(instance) {
+			return reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil


can we get an info level log message here indicating that deletion is pending dependent resources being removed.

slintes · 2020-01-08T18:06:35Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		}
+
+		if updated {
+			return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil


nit: when RequeueAfter is set, Reqeueu is true implicitly

hm wasn't aware of it, but I see under the controller code

} else if result.RequeueAfter > 0 { // The result.RequeueAfter request will be lost, if it is returned // along with a non-nil error. But this is intended as // We need to drive to stable reconcile loops before queuing due // to result.RequestAfter c.Queue.Forget(obj) c.Queue.AddAfter(req, result.RequeueAfter) ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "requeue_after").Inc() return true }

Will remove it.

it's also in the doc of the Result struct ;)

slintes · 2020-01-08T18:06:47Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+
+	// we want to give time to the machine-config controllers to get updated values
+	if updated {
+		return &reconcile.Result{Requeue: true, RequeueAfter: 10 * time.Second}, nil


nit: when RequeueAfter is set, Reqeueu is true implicitly

slintes · 2020-01-09T07:16:55Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+	}
+
+	if performanceProfile.Spec.CPU.NonIsolated == nil {
+		return fmt.Errorf("you should provide non isolated CPU set")


your suggestion makes sense to me on a first look, but I'm not familiar enough with isolated / reserved CPUs to have a strong opinion here...

slintes · 2020-01-09T07:19:52Z

pkg/controller/performanceprofile/performanceprofile_controller.go

+		return fmt.Errorf("you should provide non isolated CPU set")
+	}
+
+	if performanceProfile.Spec.RealTimeKernel == nil {


@cynepco3hahue @davidvossel heads up, I am already working on a follow up PR for making this a boolean, because with OCP 4.4 we don't need the repo url anymore at all, the RPMs are in the RHCOS image already. Also I remove validation for this and just enable the RT kernel systemd unit when the boolean is set and true.

According to discussion Pause the machine config pool only when it neccessary Split the reconcile loop to phases Creation: - add finalizer - check if MCP exists, if no create it - pause MCP, and wait for some time specified under the code to give to machine-config controllers time to get the update - if needed updated MCP, and all other resources - after creation or update, wait again - unpause MCP Deletion: - check if MCP exists - if yes, pause the MCP and wait some time - delete all components - wait until all components will be deleted - remove finalizer All wait cycles implemenetd via requeue with `Result{Requeue: true, RequeueAfter: 10 * time.Second}` Signed-off-by: Artyom Lukianov <[email protected]>

davidvossel

/lgtm

there are some optimizations being done with the requeueAfter logic that we may find don't always produce consistent results for us, but I think this is a good start.

great work @cynepco3hahue 👍

openshift-ci-robot · 2020-01-09T15:32:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue, davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cynepco3hahue,davidvossel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-01-09T16:14:24Z

@cynepco3hahue: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp	`6be8f31`	link	`/test e2e-gcp`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Have a custom pool that matches two out of three nodes.

openshift-ci-robot requested review from simon3z and slintes December 31, 2019 11:32

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 31, 2019

cynepco3hahue mentioned this pull request Jan 2, 2020

Codify performance manifests #24

Merged

6 tasks

cynepco3hahue force-pushed the implement_operator_logic branch from a40ba29 to 5038e32 Compare January 2, 2020 14:35

Artyom Lukianov added 4 commits January 2, 2020 16:40

Provide scheme for all resource that the controller should work with

4e5621a

Signed-off-by: Artyom Lukianov <[email protected]>

Provide cache for multiple namespaces

195dd9f

By default the operator-sdk creates the cache only for resources from the operator namespace, but we have wide cluster resources. Signed-off-by: Artyom Lukianov <[email protected]>

cynepco3hahue force-pushed the implement_operator_logic branch from 5038e32 to fd9054e Compare January 2, 2020 14:42

openshift-ci-robot requested a review from davidvossel January 2, 2020 14:42

ffromani reviewed Jan 2, 2020

View reviewed changes

Artyom Lukianov added 4 commits January 5, 2020 10:31

Update performance controller permission

b9ed4e6

Signed-off-by: Artyom Lukianov <[email protected]>

Change the PerformanceProfile CRD scope to cluster

cc903d3

Signed-off-by: Artyom Lukianov <[email protected]>

Update vendor

c58d32f

Signed-off-by: Artyom Lukianov <[email protected]>

Provide operator unittest

4e7b6d9

Tests run reconcile loop and verify: - components creation - components deletion - finalizer adding - finalizer removing - basic fields verification

cynepco3hahue force-pushed the implement_operator_logic branch from fd9054e to 4e7b6d9 Compare January 5, 2020 08:32

Use a shorter name for the KubeletConfig

3bb4d65

W/A for https://bugzilla.redhat.com/show_bug.cgi?id=1787907 Signed-off-by: Artyom Lukianov <[email protected]>

cynepco3hahue force-pushed the implement_operator_logic branch from ec3aa48 to 3bb4d65 Compare January 5, 2020 14:14

Update default feature gate resource instead of using new one

333d284

It is the temporary W/A until https://bugzilla.redhat.com/show_bug.cgi?id=1788061 fixed. Signed-off-by: Artyom Lukianov <[email protected]>

davidvossel reviewed Jan 6, 2020

View reviewed changes

slintes reviewed Jan 7, 2020

View reviewed changes

davidvossel reviewed Jan 7, 2020

View reviewed changes

Respond to comments

306f0ba

Signed-off-by: Artyom Lukianov <[email protected]>

cynepco3hahue force-pushed the implement_operator_logic branch from 3084bbe to c3dd292 Compare January 8, 2020 17:27

davidvossel reviewed Jan 8, 2020

View reviewed changes

davidvossel reviewed Jan 9, 2020

View reviewed changes

slintes reviewed Jan 9, 2020

View reviewed changes

cynepco3hahue force-pushed the implement_operator_logic branch 2 times, most recently from d3d46fd to 58e7782 Compare January 9, 2020 11:26

slintes mentioned this pull request Jan 9, 2020

Install RT kernel from RPMs included in RHCOS #34

Merged

3 tasks

davidvossel mentioned this pull request Jan 9, 2020

attempt to validate mcp has picked up chnages before issuing "unpause" #35

Closed

cynepco3hahue force-pushed the implement_operator_logic branch from 58e7782 to 6be8f31 Compare January 9, 2020 15:07

davidvossel approved these changes Jan 9, 2020

View reviewed changes

openshift-ci-robot assigned davidvossel Jan 9, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2020

openshift-merge-robot merged commit bb2157f into openshift-kni:master Jan 9, 2020

dshchedr pushed a commit to dshchedr/performance-addon-operators that referenced this pull request Mar 26, 2020

Merge pull request openshift-kni#25 from fedepaol/ci-sctp-setup

73e88c9

Have a custom pool that matches two out of three nodes.

Implement operator logic #25

Implement operator logic #25

Conversation

cynepco3hahue commented Dec 31, 2019 • edited Loading

cynepco3hahue commented Jan 2, 2020

ffromani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cynepco3hahue Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slintes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cynepco3hahue commented Dec 31, 2019 •

edited

Loading

cynepco3hahue Jan 7, 2020 •

edited

Loading