Skip to content

Conversation

@theobarberbany
Copy link
Contributor

@theobarberbany theobarberbany commented Sep 29, 2025

Summary by CodeRabbit

  • New Features

    • Resolver added to map infra resources to Machine reconciles.
    • New terminal errors to surface invalid infrastructure or template references.
  • Refactor

    • Centralized terminal-error handling prevents futile requeues and emits warnings/events.
    • Reconciler and finalizer flows adjusted for clearer sync/deletion outcomes.
    • Structured logging standardized across controllers and tests.
  • Tests

    • Hardened ownership wiring, sentinel resources, and Ginkgo/controller-runtime logging.
  • Chores

    • Watch/filter utilities and logging integration updated.

@coderabbitai
Copy link

coderabbitai bot commented Sep 29, 2025

Walkthrough

Adds terminal-configuration error detection/handling to machine and machineset sync controllers, introduces ResolveCAPIMachineFromInfraMachine to enqueue owning CAPI Machines from Infra ownerRefs, replaces klog with controller-runtime/GinkgoLogr logging across utilities and tests, updates some reconciler signatures, and rewires many tests for ownerReference semantics.

Changes

Cohort / File(s) Summary
Machine sync controller
pkg/controllers/machinesync/machine_sync_controller.go
Add terminal error vars (errInvalidInfraClusterReference, errInvalidInfraMachineReference), terminalConfigurationErrorLog, and isTerminalConfigurationError; pre-validate infra refs in fetch; avoid requeue on terminal errors with warning events; change signatures of reconcileMAPItoCAPIMachineDeletion and ensureSyncFinalizer to return (shouldRequeue bool, err error); logging cleanup.
Machine sync tests
pkg/controllers/machinesync/machine_sync_controller_test.go
Add Logger: GinkgoLogr to manager options; rewire test setup and ownerReference wiring; replace direct infra creations with owner-referenced resources; update assertions and sequencing to reflect ownership and policy/admission behavior.
Watch filter utilities
pkg/util/watch_filters.go
Replace klog with controller-runtime structured logging; add ResolveCAPIMachineFromInfraMachine(namespace) to parse ownerRefs (APIVersion/Kind) and enqueue reconcile.Requests for owning CAPI Machine; preserve namespace filtering and improve structured logs.
Machine suite logging
pkg/controllers/machinesync/suite_test.go
Replace klog textlogger with GinkgoLogr; call logf.SetLogger(GinkgoLogr) and ctrl.SetLogger(GinkgoLogr) in BeforeSuite; update imports.
MachineSet sync controller
pkg/controllers/machinesetsync/machineset_sync_controller.go
Add terminal error vars (errInvalidInfraClusterReference, errInvalidInfraMachineTemplateReference), terminalConfigurationErrorLog, and isTerminalConfigurationError; validate cluster name and infra template refs early in fetch; avoid requeue on terminal errors and emit synchronization events; logging alignment.
MachineSet sync tests
pkg/controllers/machinesetsync/machineset_sync_controller_test.go
Large test restructuring: add contexts for authority/ownership, ownerReference wiring, scenarios for template existence, status propagation, finalizers, deletion flows, and policy-related sentinel resources; update expectations and sequencing.
MachineSet suite logging
pkg/controllers/machinesetsync/suite_test.go
Replace klog textlogger with GinkgoLogr and register with controller-runtime via ctrl.SetLogger(GinkgoLogr); update imports.
Other test suites
pkg/controllers/machinemigration/suite_test.go, pkg/controllers/machinesetmigration/suite_test.go
Replace klog textlogger with GinkgoLogr, call ctrl.SetLogger(GinkgoLogr) in BeforeSuite, and update imports.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Infra as InfraMachine / InfraTemplate
  participant Watcher as ResolveCAPIMachineFromInfraMachine
  participant Reconciler as Machine/MachineSet Sync Reconciler
  participant API as Kubernetes API
  Note over Watcher: parses ownerRefs (APIVersion/Kind) and\nenqueues owning CAPI Machine reconcile

  Infra->>Watcher: ownerRef event (create/update/delete)
  Watcher->>Reconciler: Enqueue reconcile request for CAPI Machine

  Reconciler->>API: Get CAPI Machine / MachineSet
  Reconciler->>API: Get MAPI Machine / MachineSet
  Reconciler->>API: Get Infra resource (ref)

  alt terminal invalid infra refs
    Reconciler->>Reconciler: detect terminalConfiguration error
    Reconciler-->>API: Emit warning SynchronizationError event
    Reconciler-->>Reconciler: log terminalConfigurationErrorLog, return no requeue
  else normal sync
    Reconciler->>Reconciler: ensureSyncFinalizer(...) -> (shouldRequeue)
    alt deletion path
      Reconciler->>Reconciler: reconcile deletion -> (shouldRequeue, err)
    else ensure/create infra resources and update status
      Reconciler->>API: create/patch infra resources, update status/conditions
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay attention to terminal-error classification and consistent handling at all fetch/reconcile sites.
  • Verify updated function signatures (reconcileMAPItoCAPIMachineDeletion, ensureSyncFinalizer) are propagated and callers/tests updated.
  • Review ResolveCAPIMachineFromInfraMachine APIVersion parsing and namespace handling for edge cases.
  • Inspect extensive test rewiring for correct ownerReference wiring, authority values, and event/assertion timing.

Poem

I nibble ownerRefs by lantern-light,
I hop through logs to set things right.
Finalizers snug, no needless queue,
Terminal errors flagged — we bid adieu.
A rabbit cheers: syncs now sleek and bright. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title partially relates to the changeset by mentioning InfraMachine watch_filters and MachineSync controller, but does not reflect the main changes which involve terminal configuration error handling, method signature updates, and logging refactoring across multiple controllers and files. Consider a more comprehensive title that captures the primary changes, such as: 'Add terminal configuration error handling and logging refactor for machine controllers' or 'OCPBUGS-62325: Implement terminal error handling in MachineSync and related controllers'
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 29, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@theobarberbany theobarberbany changed the title Updates InfraMachine watch_filters OCPBUGS-62325: Updates InfraMachine watch_filters Sep 29, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 29, 2025
@openshift-ci-robot
Copy link

@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch 2 times, most recently from 4b03d42 to a3b9582 Compare September 29, 2025 15:14
@theobarberbany theobarberbany changed the title OCPBUGS-62325: Updates InfraMachine watch_filters OCPBUGS-62325: Updates InfraMachine watch_filters for MachineSync controller Sep 29, 2025
@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch from a3b9582 to 6c6813f Compare October 1, 2025 16:41
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2025
@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch 2 times, most recently from d360bd1 to 5e6d8d4 Compare October 2, 2025 21:31
@theobarberbany
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Oct 2, 2025

✅ Actions performed

Full review triggered.

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2025
@openshift-ci-robot
Copy link

@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Summary by CodeRabbit

  • Refactor

  • Improved reconciliation timing with a unified requeue interval for smoother, more predictable sync behavior.

  • Refined event handling to trigger reconciles from infrastructure ownership relationships, enhancing responsiveness and correctness.

  • Streamlined logging for clearer, more consistent operational insights.

  • Tests

  • Expanded and hardened test scenarios around ownership wiring and reconciliation ordering to ensure reliability.

  • Unified test logging configuration for better diagnostic output.

  • Chores

  • Standardized watch/filter utilities and logging prefixes for consistent observability across components.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sunzhaohua2 October 2, 2025 21:33
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/controllers/machinesync/machine_sync_controller.go (2)

1039-1046: Do not wrap a nil error when removing the MAPI sync finalizer.

This path now always returns an error (fmt.Errorf(... %w, err) even when err is nil, breaking the deletion flow.

Use this fix:

-		_, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
-
-		return false, fmt.Errorf("failed to remove finalizer: %w", err)
+		changed, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
+		if err != nil {
+			return false, fmt.Errorf("failed to remove finalizer: %w", err)
+		}
+		return changed, nil

1161-1167: Likewise, avoid wrapping a nil error when pruning the CAPI sync finalizer.

Here too, fmt.Errorf(... %w, err) returns a non-nil error even when err is nil, so reconciliation always fails instead of continuing.

Patch it like this:

-		_, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer)
-
-		return false, fmt.Errorf("failed to remove finalizer: %w", err)
+		changed, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer)
+		if err != nil {
+			return false, fmt.Errorf("failed to remove finalizer: %w", err)
+		}
+		return changed, nil
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5d63584 and 5e6d8d4.

📒 Files selected for processing (4)
  • pkg/controllers/machinesync/machine_sync_controller.go (8 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (14 hunks)
  • pkg/controllers/machinesync/suite_test.go (2 hunks)
  • pkg/util/watch_filters.go (4 hunks)

@theobarberbany
Copy link
Contributor Author

/test unit

@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch from 5e6d8d4 to db0597d Compare October 3, 2025 11:49
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 11, 2025
@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch 3 times, most recently from 856d7b6 to 125151e Compare October 16, 2025 14:05
@openshift-ci-robot
Copy link

@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

Summary by CodeRabbit

  • New Features

  • Added resolver to map infra resources to machine reconciles.

  • New public terminal errors to indicate invalid infra references.

  • Refactor

  • Terminal-error handling added to avoid futile requeues and emit warnings.

  • Reconciler and finalizer flows adjusted for more predictable sync and deletion handling.

  • Standardized structured logging across controllers for clearer operational output.

  • Tests

  • Hardened ownership wiring, sentinel resources, and switched tests to controller-runtime/Ginkgo logging.

  • Chores

  • Updated watch/filter utilities and logging integration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch from 72dbdfd to f66a0a0 Compare November 4, 2025 15:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/controllers/machinesetsync/machineset_sync_controller.go (1)

239-255: Don’t treat an empty InfrastructureRef namespace as a terminal error

Cluster API leaves MachineSet.Spec.Template.Spec.InfrastructureRef.Namespace unset by default, meaning “same namespace as the MachineSet”. With the new guard we now return errInvalidInfraMachineTemplateReference and short-circuit reconciliation for perfectly valid manifests (including the ones created by Cluster API itself). That turns routine syncs into permanent no-ops.

Please keep the Name check, but default an empty namespace to capiMachineSet.Namespace instead of treating it as fatal, then build the infraMachineTemplateKey from that value.

Apply this diff to normalize the namespace instead of failing:

-	infraMachineTemplateRef := capiMachineSet.Spec.Template.Spec.InfrastructureRef
-	infraMachineTemplateKey := client.ObjectKey{
-		Namespace: infraMachineTemplateRef.Namespace,
-		Name:      infraMachineTemplateRef.Name,
-	}
+	infraMachineTemplateRef := capiMachineSet.Spec.Template.Spec.InfrastructureRef
+	if infraMachineTemplateRef.Name == "" {
+		return nil, nil, fmt.Errorf("machine set %s/%s: %w",
+			capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraMachineTemplateReference)
+	}
+
+	infraMachineTemplateNamespace := infraMachineTemplateRef.Namespace
+	if infraMachineTemplateNamespace == "" {
+		infraMachineTemplateNamespace = capiMachineSet.Namespace
+	}
+	infraMachineTemplateKey := client.ObjectKey{
+		Namespace: infraMachineTemplateNamespace,
+		Name:      infraMachineTemplateRef.Name,
+	}
-
-	if capiMachineSet.Spec.ClusterName == "" {
-		return nil, nil, fmt.Errorf("machine set %s/%s: %w",
-			capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraClusterReference)
-	}
-
-	if infraMachineTemplateRef.Name == "" || infraMachineTemplateRef.Namespace == "" {
-		return nil, nil, fmt.Errorf("machine %s/%s: %w",
-			capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraMachineTemplateReference)
-	}
+	if capiMachineSet.Spec.ClusterName == "" {
+		return nil, nil, fmt.Errorf("machine set %s/%s: %w",
+			capiMachineSet.Namespace, capiMachineSet.Name, errInvalidInfraClusterReference)
+	}
pkg/controllers/machinesync/machine_sync_controller.go (1)

979-994: Treat missing InfrastructureRef namespace as “same namespace” instead of failing

Machine.Spec.InfrastructureRef.Namespace is optional in Cluster API; it defaults to the Machine’s namespace and is omitted by upstream controllers. The new check classifies that common case as errInvalidInfraMachineReference, marks it terminal, and prevents us from reconciling standard Cluster API machines.

Please keep enforcing that the Name is present, but if the namespace is empty, substitute capiMachine.Namespace before building the lookup key rather than returning a terminal error.

A minimal fix looks like:

-	infraMachineRef := capiMachine.Spec.InfrastructureRef
-	infraMachineKey := client.ObjectKey{
-		Namespace: infraMachineRef.Namespace,
-		Name:      infraMachineRef.Name,
-	}
+	infraMachineRef := capiMachine.Spec.InfrastructureRef
+	if infraMachineRef.Name == "" {
+		return nil, nil, fmt.Errorf("machine %s/%s: %w",
+			capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference)
+	}
+
+	infraMachineNamespace := infraMachineRef.Namespace
+	if infraMachineNamespace == "" {
+		infraMachineNamespace = capiMachine.Namespace
+	}
+	infraMachineKey := client.ObjectKey{
+		Namespace: infraMachineNamespace,
+		Name:      infraMachineRef.Name,
+	}
-
-	if infraMachineRef.Name == "" || infraMachineRef.Namespace == "" {
-		return nil, nil, fmt.Errorf("machine %s/%s: %w",
-			capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference)
-	}
🧹 Nitpick comments (1)
pkg/controllers/machinesetsync/machineset_sync_controller_test.go (1)

283-292: Factor out repeated synchronized-condition assertions
The block that verifies the Synchronized condition (type/status/reason/message tuple) shows up verbatim in dozens of contexts. Pulling it into a small helper (e.g., expectSynchronizedCondition(k, mapiMachineSet, status, reason, message)) would make the tests easier to read and update when the phrasing or reason codes change.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 72dbdfd and f66a0a0.

📒 Files selected for processing (9)
  • pkg/controllers/machinemigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetmigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller.go (6 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller_test.go (3 hunks)
  • pkg/controllers/machinesetsync/suite_test.go (2 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (10 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (13 hunks)
  • pkg/controllers/machinesync/suite_test.go (2 hunks)
  • pkg/util/watch_filters.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controllers/machinesync/suite_test.go
🔇 Additional comments (1)
pkg/controllers/machinemigration/suite_test.go (1)

33-33: LGTM! Logger setup now consistent.

Adding ctrl.SetLogger(GinkgoLogr) alongside the existing logf.SetLogger ensures both controller-runtime and its log package route output through Ginkgo's test infrastructure, improving log consistency across the test suite.

Also applies to: 57-58

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)

193-195: Fix manager logger reference

GinkgoLogr is not defined in this package, so the test suite won’t compile. Use the exported logger from the komega package instead (we already import it). Line 194.

-			Logger: GinkgoLogr,
+			Logger: komega.GinkgoLogr,
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 72dbdfd and f66a0a0.

📒 Files selected for processing (9)
  • pkg/controllers/machinemigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetmigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller.go (6 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller_test.go (3 hunks)
  • pkg/controllers/machinesetsync/suite_test.go (2 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (10 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (13 hunks)
  • pkg/controllers/machinesync/suite_test.go (2 hunks)
  • pkg/util/watch_filters.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controllers/machinesync/suite_test.go


// Validate that required references are not empty to avoid nil pointer issues later.
// These are terminal configuration errors that require user intervention.
if capiMachineSet.Spec.ClusterName == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: is this a real case? The field is required in CAPI AFAIK :-)

(no need to change anything, just thinking about if we need all the "terminal failure handling" if this is something which should never happen)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capiMachine.Namespace, capiMachine.Name, errInvalidInfraClusterReference)
}

if infraMachineRef.Name == "" || infraMachineRef.Namespace == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least: with v1beta2 this fields are required to be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I hit a case where this was causing issues debugging tests where I'd forgot to set it :) So at the moment we can (at least in at test suite) get to this state.

The ClusterName was added for completeness, as we require the cluster too. I didn't realise it was required by the CAPI API :)

Copy link
Contributor

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/hold for your convenience

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Nov 4, 2025
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@theobarberbany theobarberbany force-pushed the tb/fix-machinesync-watches branch from f66a0a0 to f3dc56e Compare November 4, 2025 16:35
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2025
@openshift-ci-robot
Copy link

@theobarberbany: This pull request references Jira Issue OCPBUGS-62325, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

Summary by CodeRabbit

  • New Features

  • Resolver added to map infra resources to Machine reconciles.

  • New terminal errors to surface invalid infrastructure or template references.

  • Refactor

  • Centralized terminal-error handling prevents futile requeues and emits warnings/events.

  • Reconciler and finalizer flows adjusted for clearer sync/deletion outcomes.

  • Structured logging standardized across controllers and tests.

  • Tests

  • Hardened ownership wiring, sentinel resources, and Ginkgo/controller-runtime logging.

  • Chores

  • Watch/filter utilities and logging integration updated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chrischdi
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2025
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)

193-195: Use qualified komega.GinkgoLogr reference.

The unqualified GinkgoLogr identifier may cause test build failures. The komega package must be used to reference GinkgoLogr correctly.

Apply this diff:

 		mgr, err = ctrl.NewManager(controllerCfg, ctrl.Options{
 			Scheme: testScheme,
-			Logger: GinkgoLogr,
+			Logger: komega.GinkgoLogr,
 			Controller: config.Controller{
 				SkipNameValidation: ptr.To(true),
 			},

Based on past review feedback.

pkg/controllers/machinesync/machine_sync_controller.go (1)

1039-1041: Propagate the finalizer removal requeue signal

We’re still discarding the changed flag from util.RemoveFinalizer here. That means we always return (false, nil) whenever the call succeeds—even if we just removed our sync finalizer—which drops the requeue signal we rely on to drive the rest of the deletion flow. Please capture the bool and only wrap when an error is present:

-	_, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
-
-	return false, fmt.Errorf("failed to remove finalizer: %w", err)
+	changed, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
+	if err != nil {
+		return false, fmt.Errorf("failed to remove finalizer: %w", err)
+	}
+	return changed, nil
🧹 Nitpick comments (1)
pkg/util/watch_filters.go (1)

96-113: Consider early return and nil for empty result.

Based on past review feedback, InfraMachines typically have at most one Machine owner reference. Consider returning immediately when a match is found, and returning nil instead of an empty slice when no matches are found.

Apply this diff:

-	requests := []reconcile.Request{}
-
 	for _, ref := range obj.GetOwnerReferences() {
 		gv, err := schema.ParseGroupVersion(ref.APIVersion)
 		if err != nil {
 			logger.Error(err, "Failed to parse GroupVersion", "APIVersion", ref.APIVersion)
 			continue
 		}
 
 		if ref.Kind == "Machine" && gv.Group == clusterv1.GroupVersion.Group {
-			requests = append(requests, reconcile.Request{
+			return []reconcile.Request{{
 				NamespacedName: client.ObjectKey{Namespace: namespace, Name: ref.Name},
-			})
+			}}
 		}
 	}
 
-	return requests
+	return nil
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f66a0a0 and f3dc56e.

📒 Files selected for processing (9)
  • pkg/controllers/machinemigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetmigration/suite_test.go (2 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller.go (6 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller_test.go (3 hunks)
  • pkg/controllers/machinesetsync/suite_test.go (2 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (10 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (13 hunks)
  • pkg/controllers/machinesync/suite_test.go (2 hunks)
  • pkg/util/watch_filters.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/controllers/machinemigration/suite_test.go
  • pkg/controllers/machinesync/suite_test.go
  • pkg/controllers/machinesetmigration/suite_test.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

@theobarberbany: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sunzhaohua2
Copy link
Contributor

@huali9 PTAL :)

@theobarberbany
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2025
@theobarberbany
Copy link
Contributor Author

theobarberbany commented Nov 5, 2025

@sunzhaohua2 @huali9 I don't think this needs manual verification. This is mostly a unit test structure & watch filters change, we have good coverage with e2es and unit tests :)

Feel free to do any additional verification, but don't feel like you have to prioritise it :)

@theobarberbany
Copy link
Contributor Author

/verified by e2es & units

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 5, 2025
@openshift-ci-robot
Copy link

@theobarberbany: This PR has been marked as verified by e2es & units.

In response to this:

/verified by e2es & units

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 4106e8b into openshift:main Nov 5, 2025
26 checks passed
@openshift-ci-robot
Copy link

@theobarberbany: Jira Issue Verification Checks: Jira Issue OCPBUGS-62325
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62325 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

Summary by CodeRabbit

  • New Features

  • Resolver added to map infra resources to Machine reconciles.

  • New terminal errors to surface invalid infrastructure or template references.

  • Refactor

  • Centralized terminal-error handling prevents futile requeues and emits warnings/events.

  • Reconciler and finalizer flows adjusted for clearer sync/deletion outcomes.

  • Structured logging standardized across controllers and tests.

  • Tests

  • Hardened ownership wiring, sentinel resources, and Ginkgo/controller-runtime logging.

  • Chores

  • Watch/filter utilities and logging integration updated.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@theobarberbany theobarberbany deleted the tb/fix-machinesync-watches branch November 5, 2025 10:16
@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.21.0-0.nightly-2025-11-05-234508

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants