-
Notifications
You must be signed in to change notification settings - Fork 49
OCPBUGS-62325: Updates InfraMachine watch_filters for MachineSync controller #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-62325: Updates InfraMachine watch_filters for MachineSync controller #371
Conversation
WalkthroughAdds terminal-configuration error detection/handling to machine and machineset sync controllers, introduces ResolveCAPIMachineFromInfraMachine to enqueue owning CAPI Machines from Infra ownerRefs, replaces klog with controller-runtime/GinkgoLogr logging across utilities and tests, updates some reconciler signatures, and rewires many tests for ownerReference semantics. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Infra as InfraMachine / InfraTemplate
participant Watcher as ResolveCAPIMachineFromInfraMachine
participant Reconciler as Machine/MachineSet Sync Reconciler
participant API as Kubernetes API
Note over Watcher: parses ownerRefs (APIVersion/Kind) and\nenqueues owning CAPI Machine reconcile
Infra->>Watcher: ownerRef event (create/update/delete)
Watcher->>Reconciler: Enqueue reconcile request for CAPI Machine
Reconciler->>API: Get CAPI Machine / MachineSet
Reconciler->>API: Get MAPI Machine / MachineSet
Reconciler->>API: Get Infra resource (ref)
alt terminal invalid infra refs
Reconciler->>Reconciler: detect terminalConfiguration error
Reconciler-->>API: Emit warning SynchronizationError event
Reconciler-->>Reconciler: log terminalConfigurationErrorLog, return no requeue
else normal sync
Reconciler->>Reconciler: ensureSyncFinalizer(...) -> (shouldRequeue)
alt deletion path
Reconciler->>Reconciler: reconcile deletion -> (shouldRequeue, err)
else ensure/create infra resources and update status
Reconciler->>API: create/patch infra resources, update status/conditions
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
Skipping CI for Draft Pull Request. |
|
@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
4b03d42 to
a3b9582
Compare
a3b9582 to
6c6813f
Compare
d360bd1 to
5e6d8d4
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/controllers/machinesync/machine_sync_controller.go (2)
1039-1046: Do not wrap a nil error when removing the MAPI sync finalizer.This path now always returns an error (
fmt.Errorf(... %w, err) even whenerris nil, breaking the deletion flow.Use this fix:
- _, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer) - - return false, fmt.Errorf("failed to remove finalizer: %w", err) + changed, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer) + if err != nil { + return false, fmt.Errorf("failed to remove finalizer: %w", err) + } + return changed, nil
1161-1167: Likewise, avoid wrapping a nil error when pruning the CAPI sync finalizer.Here too,
fmt.Errorf(... %w, err)returns a non-nil error even whenerris nil, so reconciliation always fails instead of continuing.Patch it like this:
- _, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer) - - return false, fmt.Errorf("failed to remove finalizer: %w", err) + changed, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer) + if err != nil { + return false, fmt.Errorf("failed to remove finalizer: %w", err) + } + return changed, nil
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
pkg/controllers/machinesync/machine_sync_controller.go(8 hunks)pkg/controllers/machinesync/machine_sync_controller_test.go(14 hunks)pkg/controllers/machinesync/suite_test.go(2 hunks)pkg/util/watch_filters.go(4 hunks)
|
/test unit |
5e6d8d4 to
db0597d
Compare
856d7b6 to
125151e
Compare
|
@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
72dbdfd to
f66a0a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/controllers/machinesetsync/machineset_sync_controller.go (1)
239-255: Don’t treat an empty InfrastructureRef namespace as a terminal errorCluster API leaves
MachineSet.Spec.Template.Spec.InfrastructureRef.Namespaceunset by default, meaning “same namespace as the MachineSet”. With the new guard we now returnerrInvalidInfraMachineTemplateReferenceand short-circuit reconciliation for perfectly valid manifests (including the ones created by Cluster API itself). That turns routine syncs into permanent no-ops.Please keep the
Namecheck, but default an empty namespace tocapiMachineSet.Namespaceinstead of treating it as fatal, then build theinfraMachineTemplateKeyfrom that value.Apply this diff to normalize the namespace instead of failing:
- infraMachineTemplateRef := capiMachineSet.Spec.Template.Spec.InfrastructureRef - infraMachineTemplateKey := client.ObjectKey{ - Namespace: infraMachineTemplateRef.Namespace, - Name: infraMachineTemplateRef.Name, - } + infraMachineTemplateRef := capiMachineSet.Spec.Template.Spec.InfrastructureRef + if infraMachineTemplateRef.Name == "" { + return nil, nil, fmt.Errorf("machine set %s/%s: %w", + capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraMachineTemplateReference) + } + + infraMachineTemplateNamespace := infraMachineTemplateRef.Namespace + if infraMachineTemplateNamespace == "" { + infraMachineTemplateNamespace = capiMachineSet.Namespace + } + infraMachineTemplateKey := client.ObjectKey{ + Namespace: infraMachineTemplateNamespace, + Name: infraMachineTemplateRef.Name, + } - - if capiMachineSet.Spec.ClusterName == "" { - return nil, nil, fmt.Errorf("machine set %s/%s: %w", - capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraClusterReference) - } - - if infraMachineTemplateRef.Name == "" || infraMachineTemplateRef.Namespace == "" { - return nil, nil, fmt.Errorf("machine %s/%s: %w", - capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraMachineTemplateReference) - } + if capiMachineSet.Spec.ClusterName == "" { + return nil, nil, fmt.Errorf("machine set %s/%s: %w", + capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraClusterReference) + }pkg/controllers/machinesync/machine_sync_controller.go (1)
979-994: Treat missing InfrastructureRef namespace as “same namespace” instead of failing
Machine.Spec.InfrastructureRef.Namespaceis optional in Cluster API; it defaults to the Machine’s namespace and is omitted by upstream controllers. The new check classifies that common case aserrInvalidInfraMachineReference, marks it terminal, and prevents us from reconciling standard Cluster API machines.Please keep enforcing that the
Nameis present, but if the namespace is empty, substitutecapiMachine.Namespacebefore building the lookup key rather than returning a terminal error.A minimal fix looks like:
- infraMachineRef := capiMachine.Spec.InfrastructureRef - infraMachineKey := client.ObjectKey{ - Namespace: infraMachineRef.Namespace, - Name: infraMachineRef.Name, - } + infraMachineRef := capiMachine.Spec.InfrastructureRef + if infraMachineRef.Name == "" { + return nil, nil, fmt.Errorf("machine %s/%s: %w", + capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference) + } + + infraMachineNamespace := infraMachineRef.Namespace + if infraMachineNamespace == "" { + infraMachineNamespace = capiMachine.Namespace + } + infraMachineKey := client.ObjectKey{ + Namespace: infraMachineNamespace, + Name: infraMachineRef.Name, + } - - if infraMachineRef.Name == "" || infraMachineRef.Namespace == "" { - return nil, nil, fmt.Errorf("machine %s/%s: %w", - capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference) - }
🧹 Nitpick comments (1)
pkg/controllers/machinesetsync/machineset_sync_controller_test.go (1)
283-292: Factor out repeated synchronized-condition assertions
The block that verifies theSynchronizedcondition (type/status/reason/message tuple) shows up verbatim in dozens of contexts. Pulling it into a small helper (e.g.,expectSynchronizedCondition(k, mapiMachineSet, status, reason, message)) would make the tests easier to read and update when the phrasing or reason codes change.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (9)
pkg/controllers/machinemigration/suite_test.go(2 hunks)pkg/controllers/machinesetmigration/suite_test.go(2 hunks)pkg/controllers/machinesetsync/machineset_sync_controller.go(6 hunks)pkg/controllers/machinesetsync/machineset_sync_controller_test.go(3 hunks)pkg/controllers/machinesetsync/suite_test.go(2 hunks)pkg/controllers/machinesync/machine_sync_controller.go(10 hunks)pkg/controllers/machinesync/machine_sync_controller_test.go(13 hunks)pkg/controllers/machinesync/suite_test.go(2 hunks)pkg/util/watch_filters.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/controllers/machinesync/suite_test.go
🔇 Additional comments (1)
pkg/controllers/machinemigration/suite_test.go (1)
33-33: LGTM! Logger setup now consistent.Adding
ctrl.SetLogger(GinkgoLogr)alongside the existinglogf.SetLoggerensures both controller-runtime and its log package route output through Ginkgo's test infrastructure, improving log consistency across the test suite.Also applies to: 57-58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)
193-195: Fix manager logger reference
GinkgoLogris not defined in this package, so the test suite won’t compile. Use the exported logger from the komega package instead (we already import it). Line 194.- Logger: GinkgoLogr, + Logger: komega.GinkgoLogr,
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (9)
pkg/controllers/machinemigration/suite_test.go(2 hunks)pkg/controllers/machinesetmigration/suite_test.go(2 hunks)pkg/controllers/machinesetsync/machineset_sync_controller.go(6 hunks)pkg/controllers/machinesetsync/machineset_sync_controller_test.go(3 hunks)pkg/controllers/machinesetsync/suite_test.go(2 hunks)pkg/controllers/machinesync/machine_sync_controller.go(10 hunks)pkg/controllers/machinesync/machine_sync_controller_test.go(13 hunks)pkg/controllers/machinesync/suite_test.go(2 hunks)pkg/util/watch_filters.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/controllers/machinesync/suite_test.go
|
|
||
| // Validate that required references are not empty to avoid nil pointer issues later. | ||
| // These are terminal configuration errors that require user intervention. | ||
| if capiMachineSet.Spec.ClusterName == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: is this a real case? The field is required in CAPI AFAIK :-)
(no need to change anything, just thinking about if we need all the "terminal failure handling" if this is something which should never happen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #371 (comment)
| capiMachine.Namespace, capiMachine.Name, errInvalidInfraClusterReference) | ||
| } | ||
|
|
||
| if infraMachineRef.Name == "" || infraMachineRef.Namespace == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least: with v1beta2 this fields are required to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I hit a case where this was causing issues debugging tests where I'd forgot to set it :) So at the moment we can (at least in at test suite) get to this state.
The ClusterName was added for completeness, as we require the cluster too. I didn't realise it was required by the CAPI API :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold for your convenience
|
Scheduling tests matching the |
f66a0a0 to
f3dc56e
Compare
|
@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
Scheduling tests matching the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)
193-195: Use qualifiedkomega.GinkgoLogrreference.The unqualified
GinkgoLogridentifier may cause test build failures. Thekomegapackage must be used to referenceGinkgoLogrcorrectly.Apply this diff:
mgr, err = ctrl.NewManager(controllerCfg, ctrl.Options{ Scheme: testScheme, - Logger: GinkgoLogr, + Logger: komega.GinkgoLogr, Controller: config.Controller{ SkipNameValidation: ptr.To(true), },Based on past review feedback.
pkg/controllers/machinesync/machine_sync_controller.go (1)
1039-1041: Propagate the finalizer removal requeue signalWe’re still discarding the
changedflag fromutil.RemoveFinalizerhere. That means we always return(false, nil)whenever the call succeeds—even if we just removed our sync finalizer—which drops the requeue signal we rely on to drive the rest of the deletion flow. Please capture the bool and only wrap when an error is present:- _, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer) - - return false, fmt.Errorf("failed to remove finalizer: %w", err) + changed, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer) + if err != nil { + return false, fmt.Errorf("failed to remove finalizer: %w", err) + } + return changed, nil
🧹 Nitpick comments (1)
pkg/util/watch_filters.go (1)
96-113: Consider early return and nil for empty result.Based on past review feedback, InfraMachines typically have at most one Machine owner reference. Consider returning immediately when a match is found, and returning
nilinstead of an empty slice when no matches are found.Apply this diff:
- requests := []reconcile.Request{} - for _, ref := range obj.GetOwnerReferences() { gv, err := schema.ParseGroupVersion(ref.APIVersion) if err != nil { logger.Error(err, "Failed to parse GroupVersion", "APIVersion", ref.APIVersion) continue } if ref.Kind == "Machine" && gv.Group == clusterv1.GroupVersion.Group { - requests = append(requests, reconcile.Request{ + return []reconcile.Request{{ NamespacedName: client.ObjectKey{Namespace: namespace, Name: ref.Name}, - }) + }} } } - return requests + return nil
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (9)
pkg/controllers/machinemigration/suite_test.go(2 hunks)pkg/controllers/machinesetmigration/suite_test.go(2 hunks)pkg/controllers/machinesetsync/machineset_sync_controller.go(6 hunks)pkg/controllers/machinesetsync/machineset_sync_controller_test.go(3 hunks)pkg/controllers/machinesetsync/suite_test.go(2 hunks)pkg/controllers/machinesync/machine_sync_controller.go(10 hunks)pkg/controllers/machinesync/machine_sync_controller_test.go(13 hunks)pkg/controllers/machinesync/suite_test.go(2 hunks)pkg/util/watch_filters.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/controllers/machinemigration/suite_test.go
- pkg/controllers/machinesync/suite_test.go
- pkg/controllers/machinesetmigration/suite_test.go
|
@theobarberbany: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@huali9 PTAL :) |
|
/unhold |
|
@sunzhaohua2 @huali9 I don't think this needs manual verification. This is mostly a unit test structure & watch filters change, we have good coverage with e2es and unit tests :) Feel free to do any additional verification, but don't feel like you have to prioritise it :) |
|
/verified by e2es & units |
|
@theobarberbany: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@theobarberbany: Jira Issue Verification Checks: Jira Issue OCPBUGS-62325 Jira Issue OCPBUGS-62325 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.21.0-0.nightly-2025-11-05-234508 |
Summary by CodeRabbit
New Features
Refactor
Tests
Chores