-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARO Operator - Reduce unnecessary reconciles by limiting watched resources/changes #3292
Conversation
a015b39
to
6ee7bae
Compare
Please rebase pull request. |
6ee7bae
to
92dc9fc
Compare
/azp run e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
92dc9fc
to
08bf84c
Compare
I'm validating any flake impact to e2e on this, and will include any required fixups on this PR. Adding hold until that is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. 2 thoughts that don't block this at all
- A quick grep shows me that we have 29 instances of
NewControllerManagedBy
in the repo and this PR will make more than half use theGenerationChangedPredicate
. I don't have a great sense of how big a problem all these extra calls were/are/will be but did we ever consider wrapping the controllerBuilder and changing the default from "load" to "quiet"? - I don't usually like this pattern but what would you say about wrapping
GenerationChangedPredicate
in something that lives with the otherpredicates
for discoverability? Since the default with the controllers is to shout and we need to add this force them to only fire on "real" changes making that option very clear seems like it might be worth it. What do you think?
// Any changes will trigger reconcile, but only for that config. | ||
return b. | ||
Named(ControllerName). | ||
// Controller adds ControllerManagedBy to KubeletConfit created by this controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Controller adds ControllerManagedBy to KubeletConfit created by this controller. | |
// Controller adds ControllerManagedBy to KubeletConfig created by this controller. |
I assume
Watches(&source.Kind{Type: &machinev1beta1.Machine{}}, &handler.EnqueueRequestForObject{}, builder.WithPredicates(predicates.MachineRoleMaster)). // to reconcile on master machine replacement | ||
Watches(&source.Kind{Type: &machinev1beta1.MachineSet{}}, &handler.EnqueueRequestForObject{}, builder.WithPredicates(predicates.MachineRoleWorker)). // to reconcile on worker machines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Borderline Rant - Not Blocking Anything 🙃 ]
These 2 lines are very long. I know Go doesn't have a max line length but we live in a physical world with fixed screens. A quick grep '.\{120\}' -r ./pkg/*/*.go | wc -l
tells me we have > 1000 lines longer than 120.
- Have we ever considered adding a max line length?
- Has this ever bothered anyone else?
- Do I need to stop worrying and learn to love some line wrapping plugin???
I'm not sure if it's possible or even desirable to make this change a "default", since GenerationChangedPredicate here is being applied specifically to the ARO cluster resource, not all resources. I also personally prefer to keep the composition of watch rules within each controller, so you can refer to them at-a-glance when making changes to them, rather than having to refer to multiple layers of abstraction to put together the rules. I think your second point is probably more preferable for reusability and deduplication here.
Along the same lines as my personal preference above, I think just extracting out the
|
Sorry I wasn't clear. I was suggesting something like a
Yeah I buy this. I thought after this PR we just used Thanks for explaining that and FWIW your suggestions make sense to me. |
This changes look great. What's stopping us merging it, since the E2E tests passed? Would we like to run the E2E pipeline against it to get a larger sample size? If so we have a new joiner and this would be a nice task for them to see some parts of the system. What do you think @tsatam? |
I think this change is mergeable, there are some future follow-up ideas in the description of the PR but they can be done later. This work was ultimately de-prioritized due to being fairly low on the impact/effort ratio, we ended up remediating the issue this PR seeked to reduce through more direct changes, but it is still a net positive IMO. |
Which issue this PR addresses:
Another proposed fix for ARO-4632
May help with ARO-5216 as well
What this PR does / why we need it:
Various changes to how the ARO operator watches resources:
Test plan for issue:
Note - I am expecting some E2E tests to become more flaky because of this change, as we have many E2E tests that are dependent on the operator reconciling more often than is necessary, and expecting an operator controller to perform changes without actually modifying any of the resources that controller is actually watching. I am planning to fix up any observed occurrences of such test flakes with individual commits on this branch for each test/controller, and each commit will likely contain changes to both the tests as well as the watched resources for that controller.
Is there any documentation that needs to be updated for this PR?
N/A
Additional ideas not implemented in this PR (yet?)