Skip to content

Conversation

@RadekManak
Copy link
Contributor

@RadekManak RadekManak commented Aug 27, 2025

Summary by CodeRabbit

  • New Features

    • Runtime-configurable CAPI/MAPI namespaces; AWS infra now derives and populates primary + optional secondary control‑plane load balancers from provider specs and exposes conversions for LB types.
  • Bug Fixes

    • Worker machines rejected for declaring LBs; control‑plane machines may use LBs. Improved control‑plane endpoint port handling and degraded-condition reporting when control plane is missing.
  • Refactor

    • Provider-spec retrieval and namespace usage unified under controller context across platforms; namespaces propagated into created resources and logs.
  • Tests

    • Expanded tests for namespaces, control‑plane discovery, LB ordering/validation, MachineSync conversion, and CAPI↔MAPI LB translations.
  • Chores

    • Operator cache includes Machine objects; test env adds ControlPlaneMachineSet CRD.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 27, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2025
@damdo
Copy link
Member

damdo commented Aug 28, 2025

@RadekManak hey LMK/ping me when this is ready for review, thanks!

@RadekManak RadekManak force-pushed the convert-loadbalancers branch from a8276f1 to 6950c52 Compare September 1, 2025 14:51
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2025
@RadekManak RadekManak marked this pull request as ready for review September 1, 2025 14:55
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 1, 2025
@openshift-ci openshift-ci bot requested review from damdo and mdbooth September 1, 2025 14:56
@RadekManak RadekManak changed the title Implement MAPI2CAPI conversion of loadbalancers OCPCLOUD-2709: Implement MAPI2CAPI conversion of loadbalancers Sep 1, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 1, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 1, 2025

@RadekManak: This pull request references OCPCLOUD-2709 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak RadekManak force-pushed the convert-loadbalancers branch from 6950c52 to d626052 Compare September 2, 2025 08:58
@RadekManak
Copy link
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Sep 9, 2025

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Sep 9, 2025

Walkthrough

Adds controller-scoped CAPINamespace/MAPINamespace fields; centralizes MAPI provider-spec retrieval (preferring CPMS or newest control-plane Machine); implements AWS MAPI-driven AWSCluster creation with load-balancer extraction and mapping; adds AWS-specific MAPI→CAPI load-balancer validation; updates platform controllers, tests, and envtest CRDs.

Changes

Cohort / File(s) Summary
InfraCluster core & tests
pkg/controllers/infracluster/infracluster_controller.go, pkg/controllers/infracluster/infracluster_controller_test.go, pkg/test/crdbuilder.go, pkg/test/envtest.go, pkg/util/controlplane.go
Adds public MAPINamespace and CAPINamespace fields; moves raw provider-spec retrieval to controller methods that prefer active CPMS or derive from newest control-plane Machine; adds control-plane detection helper; installs fake CPMS CRD and updates test wiring/namespaces and envtest CRDs.
AWS InfraCluster
pkg/controllers/infracluster/aws.go
Adds MAPI-driven AWSCluster creation: fetch/unmarshal AWSMachineProviderConfig, extract/validate 0–2 control-plane load balancers (with primary selection), map MAPI LB types to CAPA, build AWSCluster (ControlPlaneEndpoint, LB fields, namespace), and expose related errors/helpers.
Azure InfraCluster
pkg/controllers/infracluster/azure.go
Converts provider-spec helpers to receiver methods, uses r.CAPINamespace for AzureCluster/Secrets/Identity, updates callers and log messages to runtime CAPINamespace.
GCP InfraCluster
pkg/controllers/infracluster/gcp.go
Moves provider-spec getter to controller method r.getGCPMAPIProviderSpec, uses r.CAPINamespace for GCPCluster creation, and updates callers/logs.
Metal3 InfraCluster
pkg/controllers/infracluster/metal3.go
Replaces defaultCAPINamespace with r.CAPINamespace for Metal3Cluster creation and logs.
PowerVS InfraCluster
pkg/controllers/infracluster/powervs.go
Converts provider-spec getter to controller method delegating to generic helper, uses r.CAPINamespace, and removes unused YAML import.
VSphere InfraCluster
pkg/controllers/infracluster/vsphere.go
Converts provider-spec getter to controller method delegating to generic helper, replaces defaultCAPINamespace with r.CAPINamespace for clusters/secrets, and updates logs/callers.
MachineSync AWS validation & tests
pkg/controllers/machinesync/aws.go, pkg/controllers/machinesync/aws_test.go, pkg/controllers/machinesync/machine_sync_controller.go
Adds AWS-specific validation for MAPI→CAPI conversion: parse MAPI AWS providerSpec, derive expected LBs from AWSCluster, convert CAPA→MAPI LB defs, compare names/types, aggregate field errors; integrates validation gate into conversion flow and patches MAPI Machine synchronized condition on failure; adds unit/integration tests.
Conversion CAPI→MAPI (AWS)
pkg/conversion/capi2mapi/aws.go, pkg/conversion/capi2mapi/aws_test.go
Translates CAPA ControlPlaneLoadBalancer(s) into MAPI LoadBalancerReference entries (including secondary), adds ConvertAWSLoadBalancerToMAPI helper, and extends tests for Classic/NLB and secondary LB scenarios.
Conversion MAPI→CAPI (AWS)
pkg/conversion/mapi2capi/aws.go, pkg/conversion/mapi2capi/aws_test.go
Allows loadBalancers on control-plane MAPI machines (detected via CPMS ownerRef or master label); rejects loadBalancers for non-control-plane machines with updated error message; adds control-plane test cases.
Env & main
pkg/test/envtest.go, cmd/cluster-capi-operator/main.go
Adds fake ControlPlaneMachineSet CRD to envtest and a cache ByObject entry for mapiv1beta1.Machine in the operator's default cache options.

Sequence Diagram(s)

sequenceDiagram
    participant InfraCtrl as InfraClusterController
    participant API as Kubernetes API / MAPI
    note right of InfraCtrl #D6EAF8: Derive providerSpec then build AWSCluster
    InfraCtrl->>API: list ControlPlaneMachineSets in MAPINamespace
    alt CPMS found
        API-->>InfraCtrl: active CPMS
        InfraCtrl->>API: getRawMAPIProviderSpec(from CPMS)
    else none
        InfraCtrl->>API: list Machines with control-plane selector in MAPINamespace
        API-->>InfraCtrl: Machines
        InfraCtrl->>InfraCtrl: pick newest Machine, getRawMAPIProviderSpec
    end
    API-->>InfraCtrl: providerSpec (raw)
    InfraCtrl->>InfraCtrl: extractLoadBalancerConfigFromMAPIAWSProviderSpec
    alt 1–2 LBs valid
        InfraCtrl->>API: create/update AWSCluster (ControlPlaneEndpoint + LB(s), Namespace=r.CAPINamespace)
        API-->>InfraCtrl: AWSCluster created/updated
    else invalid
        InfraCtrl-->>API: return error (no/invalid LB config)
    end
Loading
sequenceDiagram
    participant MachineSync as MachineSyncReconciler
    participant API as Kubernetes API
    participant AWSCluster as AWSCluster
    note right of MachineSync #FDEBD0: Validate platform specifics before conversion
    MachineSync->>API: fetch MAPI Machine providerSpec
    MachineSync->>API: get AWSCluster
    API-->>MachineSync: AWSCluster
    MachineSync->>MachineSync: validateMAPIToCAPIPlatfromSpecifics (AWS -> validateMachineAWSLoadBalancers)
    alt validation passes
        MachineSync->>API: proceed with MAPI→CAPI conversion
    else validation fails
        MachineSync->>API: patch MAPI Machine SynchronizedCondition=False
        MachineSync-->>API: return conversion error
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

"I hopped through yamls with whiskers twitching bright,
parsed load balancers by moon and morning light.
Namespaces aligned, validations in tune,
I thumped my foot and sang to the infra moon. 🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the implementation of MAPI-to-CAPI load balancer conversion, matching the PR’s stated objective and capturing the primary feature change without extraneous detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
pkg/controllers/machinesync/machine_sync_controller.go (1)

1275-1283: Bug: comparing providerSpecs uses a.Spec twice (never compares against b)
This will mask differences and can prevent updates from being applied.

Apply this diff:

- ps2, err := mapi2capi.AWSProviderSpecFromRawExtension(a.Spec.ProviderSpec.Value)
+ ps2, err := mapi2capi.AWSProviderSpecFromRawExtension(b.Spec.ProviderSpec.Value)
pkg/controllers/infracluster/vsphere.go (1)

80-83: Bug: wrapping a nil/irrelevant error.

err here is unrelated and likely nil, producing “...: ”. Return a static error without %w.

- if r.Infra.Status.PlatformStatus == nil {
-   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
- }
+ if r.Infra.Status.PlatformStatus == nil {
+   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil")
+ }
pkg/controllers/infracluster/gcp.go (1)

62-64: Bug: wrapping a nil/irrelevant error.

Same issue as vSphere: err is unrelated and likely nil here.

- if r.Infra.Status.PlatformStatus == nil {
-   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
- }
+ if r.Infra.Status.PlatformStatus == nil {
+   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil")
+ }
pkg/controllers/infracluster/aws.go (2)

72-75: Stop wrapping a nil error for PlatformStatus check

Err is unrelated here and likely nil; the resulting message will be "...: ". Return a concrete error without %w.

- if r.Infra.Status.PlatformStatus == nil {
-   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
- }
+ if r.Infra.Status.PlatformStatus == nil {
+   return nil, errors.New("infrastructure PlatformStatus should not be nil")
+ }

67-71: Handle empty or missing API URL port; default to 6443

apiURL.Port() can be empty (default scheme). Fall back to 6443 to avoid creation failure.

- port, err := strconv.ParseInt(apiURL.Port(), 10, 32)
- if err != nil {
-   return nil, fmt.Errorf("failed to parse apiURL port: %w", err)
- }
+ var port64 int64
+ if ps := apiURL.Port(); ps == "" {
+   port64 = 6443
+ } else {
+   var perr error
+   port64, perr = strconv.ParseInt(ps, 10, 32)
+   if perr != nil {
+     return nil, fmt.Errorf("failed to parse apiURL port %q: %w", ps, perr)
+   }
+ }- awsCluster, err = r.newAWSCluster(providerSpec, apiURL, int32(port))
+ awsCluster, err = r.newAWSCluster(providerSpec, apiURL, int32(port64))

Also applies to: 81-83

pkg/controllers/infracluster/azure.go (1)

178-195: Align AzureClusterIdentity with CAPINamespace

AllowedNamespaces and ClientSecret.Namespace still reference defaultCAPINamespace, which will block usage when r.CAPINamespace differs and reference the wrong secret location.

 Spec: azurev1.AzureClusterIdentitySpec{
   Type:              azurev1.ServicePrincipal,
-  AllowedNamespaces: &azurev1.AllowedNamespaces{NamespaceList: []string{defaultCAPINamespace}},
+  AllowedNamespaces: &azurev1.AllowedNamespaces{NamespaceList: []string{r.CAPINamespace}},
   ClientID:          string(azureClientID),
   TenantID:          string(azureTenantID),
-  ClientSecret:      corev1.SecretReference{Name: clusterSecretName, Namespace: defaultCAPINamespace},
+  ClientSecret:      corev1.SecretReference{Name: clusterSecretName, Namespace: r.CAPINamespace},
 },
🧹 Nitpick comments (28)
pkg/controllers/machinesync/aws.go (6)

35-40: GoDoc claims “-int/-ext” suffix enforcement, but code doesn’t enforce it

Either enforce the naming or update the comment to reflect current behavior.

Apply this doc update to avoid misleading guidance:

-//   - Control plane machines must define an internal load balancer with "-int" suffix.
-//   - Control plane machines can define an secondary external load balancer with "-ext" suffix.
-//   - MAPI machine's load balancers must match the AWSCluster load balancers.
-//   - Worker machines must not define load balancers.
+//   - Control plane machines must include the AWSCluster control plane load balancer by name.
+//   - Control plane machines may optionally include the AWSCluster secondary control plane load balancer.
+//   - The set (name + type) must exactly match what AWSCluster specifies; no extras are allowed.
+//   - Worker machines must not define load balancers.

62-66: Use field.Required for missing CP LB on AWSCluster

Semantically this is a required field rather than an invalid value.

Apply:

- return field.ErrorList{field.Invalid(lbfieldPath, providerSpec.LoadBalancers, "no control plane load balancer configured on AWSCluster")}.ToAggregate()
+ return field.ErrorList{field.Required(lbfieldPath, "no control plane load balancer configured on AWSCluster")}.ToAggregate()

74-76: Guard secondary LB: error if Name is nil/empty

Without this, you produce a confusing “must include load balancer named ""” error.

Apply:

- if awsCluster.Spec.SecondaryControlPlaneLoadBalancer != nil {
-   errs = append(errs, ensureExpectedLoadBalancer(lbfieldPath, &providerSpec, loadBalancersCopy, awsCluster.Spec.SecondaryControlPlaneLoadBalancer)...)
- }
+ if awsCluster.Spec.SecondaryControlPlaneLoadBalancer != nil {
+   if n := ptr.Deref(awsCluster.Spec.SecondaryControlPlaneLoadBalancer.Name, ""); n == "" {
+     errs = append(errs, field.Invalid(lbfieldPath, providerSpec.LoadBalancers, "secondary control plane load balancer has empty name on AWSCluster"))
+   } else {
+     errs = append(errs, ensureExpectedLoadBalancer(lbfieldPath, &providerSpec, loadBalancersCopy, awsCluster.Spec.SecondaryControlPlaneLoadBalancer)...)
+   }
+ }

125-133: Prefer field.Required when the expected LB is missing

This conveys the intent better than Invalid.

Apply:

- if t, found := remainingLoadBalancers[expectedLBName]; !found {
-   errList = append(errList, field.Invalid(lbfieldPath, providerConfig.LoadBalancers, fmt.Sprintf("must include load balancer named %q", expectedLBName)))
+ if t, found := remainingLoadBalancers[expectedLBName]; !found {
+   errList = append(errList, field.Required(lbfieldPath, fmt.Sprintf("must include load balancer named %q", expectedLBName)))
 } else if t != expectedLBType {
   errList = append(errList, field.Invalid(lbfieldPath, providerConfig.LoadBalancers, fmt.Sprintf("load balancer %q must be of type %q to match AWSCluster", expectedLBName, expectedLBType)))
 }

140-150: Defaulting unknown CAPI LB type to Classic may hide misconfig

Consider failing fast (error) if an unexpected type appears to avoid silent mismatches.

Proposed change:

- default:
-   return machinev1beta1.ClassicLoadBalancerType
+ default:
+   // unexpected type; force validation to fail upstream
+   return machinev1beta1.AWSLoadBalancerType("")

And update ensureExpectedLoadBalancer to treat empty expected type as a mismatch with a clear message.


48-55: Workers: reject LB config — good; add explicit test for non-empty list

Minor: ensure there’s a test that a worker with any LoadBalancers fails with the exact error text.

I can add a unit test case covering this and the “empty secondary LB name” path above.

pkg/controllers/infracluster/metal3.go (2)

81-81: Nit: unify success log formatting.

Consider using klog.KObj(target) for consistency with other controllers, but optional.

- log.Info(fmt.Sprintf("InfraCluster '%s/%s' successfully created", r.CAPINamespace, r.Infra.Status.InfrastructureName))
+ log.Info(fmt.Sprintf("InfraCluster %s successfully created", klog.KRef(r.CAPINamespace, r.Infra.Status.InfrastructureName)))

78-78: Typo in error message ("creat").

Minor but user-facing.

- return nil, fmt.Errorf("failed to creat InfraCluster: %w", err)
+ return nil, fmt.Errorf("failed to create InfraCluster: %w", err)
pkg/conversion/mapi2capi/aws_test.go (1)

110-133: Control-plane LB test — good coverage; consider isolating signals.

You set both CPMS ownerRef and role label. Add one case with only ownerRef and another with only role label to exercise both paths of IsControlPlaneMAPIMachine.

+       Entry("With LoadBalancers on control plane machine with only CPMS ownerRef", awsMAPI2CAPIConversionInput{
+         machineBuilder: awsMAPIMachineBase.
+           WithOwnerReferences([]metav1.OwnerReference{{APIVersion: mapiv1.GroupVersion.String(), Kind: "ControlPlaneMachineSet", Name: "cluster", UID: "cpms-uid"}}).
+           WithProviderSpecBuilder(awsBaseProviderSpec.WithLoadBalancers([]mapiv1.LoadBalancerReference{{Name: "lb", Type: mapiv1.NetworkLoadBalancerType}})),
+         infra:            infra,
+         expectedErrors:   []string{},
+         expectedWarnings: []string{},
+       }),
+       Entry("With LoadBalancers on control plane machine with only role label", awsMAPI2CAPIConversionInput{
+         machineBuilder: awsMAPIMachineBase.
+           WithLabels(map[string]string{"machine.openshift.io/cluster-api-machine-role": "master"}).
+           WithProviderSpecBuilder(awsBaseProviderSpec.WithLoadBalancers([]mapiv1.LoadBalancerReference{{Name: "lb", Type: mapiv1.NetworkLoadBalancerType}})),
+         infra:            infra,
+         expectedErrors:   []string{},
+         expectedWarnings: []string{},
+       }),
pkg/conversion/capi2mapi/aws_test.go (2)

256-265: Worker machine ignores CP LB — test OK.

Consider asserting that no LoadBalancers are emitted in the resulting MAPI providerSpec to strengthen the guarantee.

- expectedErrors:    []string{},
- expectedWarnings:  []string{},
+ expectedErrors:    []string{},
+ expectedWarnings:  []string{},
+ // Additionally decode the returned Machine's ProviderSpec and assert .LoadBalancers == nil/empty.

267-305: Control-plane LB tests — solid; add value assertions.

Add checks on the produced MAPI providerSpec.LoadBalancers (names/types) for each case, not only absence of errors/warnings.

+ // After conversion, unmarshal the ProviderSpec and verify:
+ // - exactly 1 or 2 entries as applicable
+ // - types map Classic->classic, NLB->network
pkg/controllers/infracluster/powervs.go (1)

92-93: Nit: missing colon before wrapped error.

Improves readability of error messages.

- return nil, fmt.Errorf("unable to get PowerVS network %w", err)
+ return nil, fmt.Errorf("unable to get PowerVS network: %w", err)
pkg/util/controlplane.go (1)

32-47: CP detection logic — OK; consider accepting “control-plane” role value.

Some environments may evolve from “master” to “control-plane”. Consider treating both values as CP to be future-proof.

- if role, exists := machine.Labels[MachineRoleLabelName]; exists && role == "master" {
+ if role, exists := machine.Labels[MachineRoleLabelName]; exists && (role == "master" || role == "control-plane") {
   return true
 }
pkg/controllers/infracluster/gcp.go (1)

88-89: Guard against empty NetworkInterfaces to avoid panic.

Indexing [0] may panic if unset; add validation and error.

- Name: &providerSpec.NetworkInterfaces[0].Network,
+ Name: func() *string {
+   if len(providerSpec.NetworkInterfaces) == 0 {
+     return nil
+   }
+   return &providerSpec.NetworkInterfaces[0].Network
+ }(),

And return a clear error earlier if len(providerSpec.NetworkInterfaces)==0.

pkg/conversion/capi2mapi/aws.go (4)

38-39: Improve error specificity for diagnostics.

errUnsupportedLoadBalancerType is generic. Wrap it with the actual type value at the call site to aid debugging.

Apply this diff in convertAWSLoadBalancerToMAPI’s default branch:

- default:
-   return mapiv1.LoadBalancerReference{}, errUnsupportedLoadBalancerType
+ default:
+   return mapiv1.LoadBalancerReference{}, fmt.Errorf("%w: %q", errUnsupportedLoadBalancerType, loadBalancer.LoadBalancerType)

118-121: Surface all LB conversion errors and align field paths.

  • Only a single error is returned from convertAWSClusterLoadBalancersToMAPI; if both primary and secondary are invalid, only the first is surfaced. Consider returning field.ErrorList to aggregate.
  • The field path used for errors points to spec.controlPlaneLoadBalancer while the target object here is the AWSMachine’s provider spec. Consider pointing to spec.loadBalancers with a message noting the AWSCluster source field for clarity.

If you decide to aggregate, a minimal refactor looks like:

- mapiLoadBalancers, err := convertAWSClusterLoadBalancersToMAPI(fldPath, m.machine, m.awsCluster)
- if err != nil {
-   errors = append(errors, err)
- }
+ mapiLoadBalancers, lbErrs := convertAWSClusterLoadBalancersToMAPI(fldPath, m.machine, m.awsCluster)
+ if len(lbErrs) > 0 {
+   errors = append(errors, lbErrs...)
+ }

And update convertAWSClusterLoadBalancersToMAPI to return (… , field.ErrorList).

Also applies to: 157-157


582-610: LB conversion logic is correct; consider minor hardening.

  • Logic and ordering (primary then secondary) look good.
  • Optional hardening: short-circuit when machine == nil to avoid future misuse outside current guarded call sites.
 func convertAWSClusterLoadBalancersToMAPI(fldPath *field.Path, machine *clusterv1.Machine, awsCluster *awsv1.AWSCluster) ([]mapiv1.LoadBalancerReference, *field.Error) {
-  var loadBalancers []mapiv1.LoadBalancerReference
+  var loadBalancers []mapiv1.LoadBalancerReference
+  if machine == nil {
+    return nil, nil
+  }

611-637: Type mapping LGTM; add name presence check (optional).

Mapping Classic/ELB→classic and NLB→network, with Classic default, is correct. Optionally validate that Name is non-empty to avoid emitting unnamed LB refs.

 case awsv1.LoadBalancerTypeNLB:
   return mapiv1.LoadBalancerReference{
-    Name: ptr.Deref(loadBalancer.Name, ""),
+    Name: ptr.Deref(loadBalancer.Name, ""),
     Type: mapiv1.NetworkLoadBalancerType,
   }, nil
@@
- case "":
+ case "":
   // Default is classic in CAPI
-  return mapiv1.LoadBalancerReference{
-    Name: ptr.Deref(loadBalancer.Name, ""),
-    Type: mapiv1.ClassicLoadBalancerType,
-  }, nil
+  name := ptr.Deref(loadBalancer.Name, "")
+  if name == "" {
+    return mapiv1.LoadBalancerReference{}, field.Invalid(field.NewPath("spec", "controlPlaneLoadBalancer", "name"), name, "name must be set")
+  }
+  return mapiv1.LoadBalancerReference{Name: name, Type: mapiv1.ClassicLoadBalancerType}, nil
pkg/controllers/infracluster/infracluster_controller.go (2)

182-189: Set degraded condition consistently across providers.

Currently only the AWS path sets a degraded condition on ensure* failures. Consider doing the same for other providers for consistent operator UX.


355-357: Label selector matches typical masters; consider future-proofing.

Selecting by role/type=master is fine today. If/when these labels evolve, consider centralizing selectors (e.g., util.IsControlPlaneMAPIMachine) to avoid drift.

pkg/controllers/machinesync/aws_test.go (3)

142-144: Remove redundant builder call.

awsClusterBuilder.WithControlPlaneLoadBalancer(loadBalancerSpec) is called twice back-to-back.

- awsClusterBuilder.WithControlPlaneLoadBalancer(loadBalancerSpec)
  awsCluster := awsClusterBuilder.WithControlPlaneLoadBalancer(loadBalancerSpec).Build()

124-135: Cleanup looks adequate; consider using DeferCleanup for robustness.

Switching to DeferCleanup for namespace and object cleanup can simplify teardown and reduce flake risk if a test fails mid-way.


171-198: Test coverage is strong; add one more case for classic default (optional).

Consider a case where CAPA LB type is empty (defaults to classic) and the machine provides ClassicLoadBalancerType, to assert defaulting behavior.

I can add that test using the existing builders if you’d like.

Also applies to: 200-241, 243-285

pkg/controllers/infracluster/aws.go (2)

18-25: Avoid taking pointers to providerSpec fields; allocate stable string pointers

Using &lbX.Name captures a pointer to a field of a temp struct. Prefer ptr.To(...) for clarity and to avoid accidental aliasing.

 import (
   "context"
   "errors"
   "fmt"
   "net/url"
   "strconv"
   "strings"
+  ptr "k8s.io/utils/ptr"-   return &awsv1.AWSLoadBalancerSpec{
-     Name:             &lbPrimary.Name,
-     LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbPrimary.Type),
-   }, nil, nil
+   return &awsv1.AWSLoadBalancerSpec{
+     Name:             ptr.To(lbPrimary.Name),
+     LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbPrimary.Type),
+   }, nil, nil-   return &awsv1.AWSLoadBalancerSpec{
-       Name:             &lbFirst.Name,
-       LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbFirst.Type),
-     }, &awsv1.AWSLoadBalancerSpec{
-       Name:             &lbSecond.Name,
-       LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbSecond.Type),
-     }, nil
+   return &awsv1.AWSLoadBalancerSpec{
+       Name:             ptr.To(lbFirst.Name),
+       LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbFirst.Type),
+     }, &awsv1.AWSLoadBalancerSpec{
+       Name:             ptr.To(lbSecond.Name),
+       LoadBalancerType: convertMAPILoadBalancerTypeToCAPI(lbSecond.Type),
+     }, nil

Also applies to: 33-35, 163-167, 175-181


60-61: Use structured logging fields instead of Sprintf

Improves log queryability and consistency.

- log.Info(fmt.Sprintf("AWSCluster %s/%s does not exist, creating it", awsCluster.Namespace, awsCluster.Name))
+ log.Info("AWSCluster does not exist, creating it", "namespace", awsCluster.Namespace, "name", awsCluster.Name)- log.Info(fmt.Sprintf("InfraCluster '%s/%s' successfully created", r.CAPINamespace, r.Infra.Status.InfrastructureName))
+ log.Info("InfraCluster successfully created", "namespace", r.CAPINamespace, "name", r.Infra.Status.InfrastructureName)

Also applies to: 90-93

pkg/controllers/infracluster/azure.go (2)

221-229: Nit: consistent variable in error messages

Use “apiURL” in messages for consistency with variable name.

- return fmt.Errorf("failed to parse apiUrl: %w", err)
+ return fmt.Errorf("failed to parse apiURL: %w", err)- return fmt.Errorf("failed to parse apiUrl port: %w", err)
+ return fmt.Errorf("failed to parse apiURL port: %w", err)

166-167: Typo in comment

“createNewAzureClusterIdenity” → “createAzureClusterIdentity”.

pkg/controllers/infracluster/infracluster_controller_test.go (1)

169-176: Reduce flakiness from time.Sleep-based ordering

Sleeping to influence CreationTimestamp can be flaky under load. Prefer deriving the source provider spec deterministically (e.g., by using CPMS when present, or selecting by Name/UID ordering) or poll until the intended object is observed as youngest.

I can sketch a helper that waits for the desired youngest Machine by comparing CreationTimestamp after all creates complete.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 82bcb80 and d626052.

📒 Files selected for processing (18)
  • pkg/controllers/infracluster/aws.go (4 hunks)
  • pkg/controllers/infracluster/azure.go (11 hunks)
  • pkg/controllers/infracluster/gcp.go (4 hunks)
  • pkg/controllers/infracluster/infracluster_controller.go (6 hunks)
  • pkg/controllers/infracluster/infracluster_controller_test.go (8 hunks)
  • pkg/controllers/infracluster/metal3.go (3 hunks)
  • pkg/controllers/infracluster/powervs.go (4 hunks)
  • pkg/controllers/infracluster/vsphere.go (5 hunks)
  • pkg/controllers/machinesync/aws.go (1 hunks)
  • pkg/controllers/machinesync/aws_test.go (1 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (1 hunks)
  • pkg/conversion/capi2mapi/aws.go (4 hunks)
  • pkg/conversion/capi2mapi/aws_test.go (1 hunks)
  • pkg/conversion/mapi2capi/aws.go (1 hunks)
  • pkg/conversion/mapi2capi/aws_test.go (1 hunks)
  • pkg/test/crdbuilder.go (1 hunks)
  • pkg/test/envtest.go (1 hunks)
  • pkg/util/controlplane.go (1 hunks)
🔇 Additional comments (37)
pkg/test/envtest.go (1)

70-70: Add CPMS CRD to envtest — looks good

The fake ControlPlaneMachineSet CRD is correctly included; machine/v1 is already added to the scheme in init(), so installation should succeed.

pkg/test/crdbuilder.go (1)

79-87: Introduce MAPI v1 CPMS fake CRD — OK

Group/version/kind and pluralization via flect are correct; aligns with envtest addition.

pkg/controllers/machinesync/machine_sync_controller.go (1)

451-458: Early platform validation gate before MAPI→CAPI conversion — good placement

Short-circuits on validation errors and patches condition with the correct reason. Matches the controller’s existing error-handling pattern.

pkg/controllers/machinesync/aws.go (2)

57-60: Assumption: AWSCluster name == Infrastructure.Status.InfrastructureName

If this invariant can break (custom installs or future changes), consider a fallback lookup (e.g., by label/cluster name) or document the assumption clearly in code. At minimum, log the looked-up name to aid debugging.

Would you like a follow-up patch that tolerates NotFound by attempting a label-based query for AWSCluster?


153-160: Platform routing hook — OK

AWS path wired correctly; other platforms safely no-op.

pkg/controllers/infracluster/metal3.go (2)

36-37: Use dynamic CAPINamespace — looks good.

Switching to r.CAPINamespace avoids hard-coding. Please ensure r.CAPINamespace is always initialized (controller setup) to prevent resources landing in the default namespace.


61-67: Consistent namespace and ManagedBy annotation — OK.

Matches the pattern used across providers.

pkg/conversion/mapi2capi/aws.go (1)

330-334: Allowing LBs only for control-plane machines — correct guard.

Validation now matches the intended scope. Verify that control-plane LB refs are subsequently surfaced to the right CAPI resource (typically AWSCluster.{ControlPlaneLoadBalancer,SecondaryControlPlaneLoadBalancer}) so information isn’t dropped during conversion.

pkg/conversion/mapi2capi/aws_test.go (1)

98-109: Updated worker LB test — OK.

Matches the new validation message and behavior.

pkg/controllers/infracluster/vsphere.go (5)

59-59: Use r.CAPINamespace — good.


88-88: Namespace on created resource — good.


111-111: Success log updated — OK.


117-129: Refactor to method receiver — OK.

Aligns with other providers; delegates to r.getRawMAPIProviderSpec.


137-137: Secret namespace switched to r.CAPINamespace — correct.

pkg/controllers/infracluster/powervs.go (4)

51-52: Use r.CAPINamespace — good.


80-83: ProviderSpec retrieval via method — good.


98-99: Namespace on created resource — good.


125-137: Refactor to method receiver — OK.

pkg/util/controlplane.go (1)

24-30: Constants — OK.

Clear and appropriately scoped.

pkg/controllers/infracluster/gcp.go (6)

41-41: Use r.CAPINamespace — good.


71-74: ProviderSpec retrieval via method — good.


79-80: Namespace on created resource — good.


103-103: Success log updated — OK.


109-121: New method receiver for MAPI ProviderSpec — OK.


127-128: Call site updated — OK.

pkg/conversion/capi2mapi/aws.go (1)

32-32: Control-plane detection import looks appropriate.

Using capiutil.IsControlPlaneMachine is the right choice for gating LB emission to control-plane machines. No issues spotted.

pkg/controllers/infracluster/infracluster_controller.go (6)

66-70: New error vars are sensible and readable.

Names and messages are clear; no concerns.


232-232: Using r.CAPINamespace for OpenStack lookup is correct.

Prevents coupling to hard-coded namespaces. LGTM.


244-261: Degraded condition helper LGTM; message includes root cause.

Helper is straightforward and uses standard operatorstatus utilities. No issues.


285-296: Namespace defaulting in SetupWithManager LGTM.

Good defaults with test-time override support.


387-401: CPMS lookup scoping LGTM.

Name “cluster” and namespacing with r.MAPINamespace are correct for OCP. No issues.

Also applies to: 390-390


75-82: No direct usage of MAPINamespace/CAPINamespace outside SetupWithManager Confirmed via ripgrep that neither field is referenced elsewhere, so relying on SetupWithManager to default zero-values is safe.

pkg/controllers/machinesync/aws_test.go (1)

104-121: Manager wiring and RBAC for tests LGTM.

Using a dedicated user and dynamic namespaces is clean and isolates the suite. No issues.

pkg/controllers/infracluster/aws.go (1)

111-128: Null-safety: check AWS PlatformStatus sub-struct

If invoked on a non-AWS platform accidentally, r.Infra.Status.PlatformStatus.AWS could be nil.

Would you like me to add a defensive nil check on r.Infra.Status.PlatformStatus.AWS and return a clear error?

pkg/controllers/infracluster/infracluster_controller_test.go (3)

92-110: Good move to dynamic namespaces and passing them into the manager

This makes tests realistic and avoids hard-coded assumptions.

Also applies to: 302-334


139-156: Solid assertions for LB mapping and ordering

Verifies primary “-int” and nil secondary when only one LB is configured. Matches conversion intent.

Also applies to: 182-189


195-204: Operator degraded-path coverage looks good

Covers no CPMS/no masters scenario; helps guard regressions.

@damdo
Copy link
Member

damdo commented Sep 9, 2025

/assign @damdo @nrb @theobarberbany

@damdo
Copy link
Member

damdo commented Sep 9, 2025

/assign @racheljpg

Copy link
Contributor

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Not sure if they all make sense.


// Delete infraCluster for it to be recreated after all the machines are created.
// In real cluster the machines will be already present when the InfraCluster Controller is started.
Expect(cl.Delete(ctx, bareInfraCluster)).To(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, don't want to change it.

A good example where a real unit test with fake client may be better:

  • we could build the state as we already expect
  • including ensuring creationtimestamps are set to something specific

Relying on that this behaviour with deletion results in what we need for the test could maybe change over time.

return nil, fmt.Errorf("failed to get InfraCluster: %w", err)
} else if err == nil {
return target, nil
return awsCluster, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we only sync CAPI 2 MAPI for the load balancers right?

When someone switches back authorative to MAPI, the AWSCluster won't get any updates? (or are there none to be there?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sync LB MAPI Machine -> AWSCluster once.
From that point the AWSCluster LB are source of truth. CAPI->MAPI Machines source LB from AWSCluster. MAPI->CAPI Machines are not allowed to convert if their LB don't equal AWSCluster LB. This prevents change of LB after conversion.
Machine API does not have the capability to Create/Delete load balancers. Only associate machines to them. Changing LB would require creating it manually in AWS console which is not supported.

loadBalancersCopy[lb.Name] = lb.Type
}

errs := ensureExpectedLoadBalancer(lbfieldPath, &providerSpec, loadBalancersCopy, awsCluster.Spec.ControlPlaneLoadBalancer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that in this case we don't "ensure" to from the newest mapi machine (which is something we do on "creation")? I'd have expected to:

  • have CAPI2MAPI part for load balancer sync in machinesync controller (target is MAPI Machine)
  • have MAPI2CAPI part for load balancer sync in infracluster controller (target is the Infra Cluster)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create the InfraCluster LB by converting MAPI LB from the newest machine or CPMS. This function ensures that when the MAPI machine is being converted to CAPI that it has the same LoadBalancers as it will get in CAPI from the AWSCluster relation.
I don't expect the load balancer configuration to differ across the control plane machines. If it does then this fails the conversion for this machine until an SRE replaces the machines with mismatched load balancers via CPMS.

The AWSCluster is not being synchronized after creation. This is intentional as some of the load balancer fields are immutable.

lbFirst := providerSpec.LoadBalancers[0]
lbSecond := providerSpec.LoadBalancers[1]
// Prefer the load balancer with "-int" suffix as primary when two are present.
if strings.HasSuffix(lbSecond.Name, "-int") && !strings.HasSuffix(lbFirst.Name, "-int") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if neither, or both have -int suffix? I know this should never happen, but do we want to explicitly handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to restrict this unnecessarily.

The controller always adds both of them together. They just are two fields on the AWSCluster instead of array. Worst case they are swapped but it still works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ideally if/when CAPA's networking types are revisited, the load balancers can be either an array as Radek says, or we can have better field names that communicate internal/external.

iirc that I initially named the field that was being added to something like InternalLoadBalancer, but we decided to make it more generic upstream.

I think the actual consequence would not be their name, but whether the configuration on AWS is incorrect, like having 2 LBs that can't route externally. I'm not sure it's worth validating that here or not; that should be something the installer creates correctly and we shouldn't be modifying them. If someone or something else modifies them out of band, they'll likely have a broken cluster.

@RadekManak RadekManak force-pushed the convert-loadbalancers branch from d626052 to f2c26a3 Compare September 29, 2025 14:35
@damdo
Copy link
Member

damdo commented Oct 13, 2025

/retest

@sunzhaohua2
Copy link
Contributor

@huali9 This pr can be tested now.

@huali9
Copy link
Contributor

huali9 commented Oct 14, 2025

@huali9 This pr can be tested now.

Thank you @sunzhaohua2 I'll take a look.

@damdo
Copy link
Member

damdo commented Oct 14, 2025

Thanks @huali9
@theobarberbany keep us posted on your thougths, ty!

@damdo damdo requested a review from theobarberbany October 14, 2025 09:32
@theobarberbany
Copy link
Contributor

/unhold

LGTM! :)

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025
Core Changes:
Infrastructure Cluster Controller (AWS)
- Extracts load balancer configuration from MAPI control plane machines during AWSCluster creation
- Supports both single and dual load balancer configurations (internal and external)
- Automatically prefers load balancers with "-int" suffix as primary when two are present
- Validates load balancer types (Classic ELB and Network Load Balancer) during extraction
- Populates AWSCluster's ControlPlaneLoadBalancer and SecondaryControlPlaneLoadBalancer fields

Machine Sync Controller
- Adds pre-conversion validation to ensure MAPI machines are compatible with existing AWSCluster load balancer configuration
- Validates that control plane machines reference the correct load balancers with matching types
- Prevents conversion when load balancer mismatches are detected

Conversion Libraries
- MAPI→CAPI: Allows load balancers on control plane machines; rejects them on worker machines
- CAPI→MAPI: Converts AWSCluster load balancer specs to MAPI LoadBalancerReferences on control plane machines only
- Supports conversion of Classic and Network load balancer types in both directions

Additional Changes:
- Added MAPI Machine type to controller manager cache configuration for load balancer lookups
- Refactored Azure and GCP provider spec retrieval to use generic helper function
- Added comprehensive test coverage for load balancer validation and conversion scenarios
- Improved error messages and logging for load balancer-related operations
@RadekManak RadekManak force-pushed the convert-loadbalancers branch from e958c81 to 8c7833e Compare October 14, 2025 12:01
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readding

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 14, 2025

@RadekManak: This pull request references OCPCLOUD-2709 which is a valid jira issue.

In response to this:

Summary by CodeRabbit

  • New Features

  • Runtime-configurable CAPI/MAPI namespaces; AWS infra now derives and populates primary + optional secondary control‑plane load balancers from provider specs and exposes conversions for LB types.

  • Bug Fixes

  • Worker machines rejected for declaring LBs; control‑plane machines may use LBs. Improved control‑plane endpoint port handling and degraded-condition reporting when control plane is missing.

  • Refactor

  • Provider-spec retrieval and namespace usage unified under controller context across platforms; namespaces propagated into created resources and logs.

  • Tests

  • Expanded tests for namespaces, control‑plane discovery, LB ordering/validation, MachineSync conversion, and CAPI↔MAPI LB translations.

  • Chores

  • Operator cache includes Machine objects; test env adds ControlPlaneMachineSet CRD.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/controllers/infracluster/gcp.go (1)

86-88: Guard against empty NetworkInterfaces to avoid panic

providerSpec.NetworkInterfaces[0] is accessed without a length check; this can panic. Compute the network name safely before constructing the object.

Apply this diff:

- target = &gcpv1.GCPCluster{
+ // Derive network name from MAPI providerSpec
+ if len(providerSpec.NetworkInterfaces) == 0 || providerSpec.NetworkInterfaces[0].Network == "" {
+   return nil, fmt.Errorf("unable to derive network: missing network interface/network in MAPI providerSpec")
+ }
+ networkName := providerSpec.NetworkInterfaces[0].Network
+ target = &gcpv1.GCPCluster{
   ObjectMeta: metav1.ObjectMeta{
     Name:      r.Infra.Status.InfrastructureName,
     Namespace: r.CAPINamespace,
   },
   Spec: gcpv1.GCPClusterSpec{
     Network: gcpv1.NetworkSpec{
-      Name: &providerSpec.NetworkInterfaces[0].Network,
+      Name: &networkName,
     },
pkg/controllers/infracluster/azure.go (1)

167-184: AzureClusterIdentity uses defaultCAPINamespace; breaks when CAPINamespace is customized

AllowedNamespaces and ClientSecret Namespace must reference r.CAPINamespace; otherwise CAPZ won’t find the secret in non-default setups.

-       Spec: azurev1.AzureClusterIdentitySpec{
+       Spec: azurev1.AzureClusterIdentitySpec{
          Type:              azurev1.ServicePrincipal,
-         AllowedNamespaces: &azurev1.AllowedNamespaces{NamespaceList: []string{defaultCAPINamespace}},
+         AllowedNamespaces: &azurev1.AllowedNamespaces{NamespaceList: []string{r.CAPINamespace}},
          ClientID:          string(azureClientID),
          TenantID:          string(azureTenantID),
-         ClientSecret:      corev1.SecretReference{Name: clusterSecretName, Namespace: defaultCAPINamespace},
+         ClientSecret:      corev1.SecretReference{Name: clusterSecretName, Namespace: r.CAPINamespace},
        },
♻️ Duplicate comments (5)
pkg/conversion/mapi2capi/aws_test.go (1)

134-155: Thanks for adding the CPMS ownerRef/no-role case

This addresses prior feedback to cover all IsControlPlaneMAPIMachine branches.

pkg/controllers/machinesync/aws.go (1)

2-2: Update copyright year

Header says 2024; should be 2025.

- Copyright 2024 Red Hat, Inc.
+ Copyright 2025 Red Hat, Inc.
pkg/controllers/infracluster/azure.go (1)

110-112: Validate azure_client_secret before creating CAPZ Secret

Avoid creating a Secret with empty clientSecret.

- if err := r.createNewAzureSecret(ctx, r.Client, capzManagerBootstrapSecret.Data["azure_client_secret"]); err != nil {
+ secretBytes, ok := capzManagerBootstrapSecret.Data["azure_client_secret"]
+ if !ok || len(secretBytes) == 0 {
+   return fmt.Errorf("azure_client_secret not found or empty in %q secret", capzManagerBootstrapCredentials)
+ }
+ if err := r.createNewAzureSecret(ctx, r.Client, secretBytes); err != nil {
   return fmt.Errorf("failed to create Azure Cluster secret: %w", err)
 }
pkg/controllers/infracluster/infracluster_controller.go (1)

343-381: Guard against nil ProviderSpec.Value to avoid panics.

Both the CPMS and "newest control-plane machine" branches assume ProviderSpec.Value is non-nil, which can panic if a Machine or CPMS template has an external reference or is malformed.

Apply this diff to add nil checks:

     sortedMachines := sortMachinesByCreationTimeDescending(machineList.Items)
     newest := sortedMachines[0]
 
-    return newest.Spec.ProviderSpec.Value.Raw, nil
+    if newest.Spec.ProviderSpec.Value == nil || len(newest.Spec.ProviderSpec.Value.Raw) == 0 {
+      return nil, fmt.Errorf("control plane Machine %q has empty providerSpec", newest.GetName())
+    }
+    return newest.Spec.ProviderSpec.Value.Raw, nil
   }
 
   // Devise providerSpec via CPMS.
-  return cpms.Spec.Template.OpenShiftMachineV1Beta1Machine.Spec.ProviderSpec.Value.Raw, nil
+  if cpms.Spec.Template.OpenShiftMachineV1Beta1Machine.Spec.ProviderSpec.Value == nil ||
+     len(cpms.Spec.Template.OpenShiftMachineV1Beta1Machine.Spec.ProviderSpec.Value.Raw) == 0 {
+    return nil, fmt.Errorf("CPMS template providerSpec is empty")
+  }
+  return cpms.Spec.Template.OpenShiftMachineV1Beta1Machine.Spec.ProviderSpec.Value.Raw, nil
pkg/controllers/infracluster/aws.go (1)

76-76: Fix incorrect error wrapping.

The error message uses %w to wrap err, but err refers to the URL parsing error from Line 65, not a PlatformStatus validation error. This produces misleading error messages.

Apply this diff:

   if r.Infra.Status.PlatformStatus == nil {
-    return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
+    return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil")
   }
🧹 Nitpick comments (9)
pkg/conversion/mapi2capi/aws_test.go (1)

98-109: Make the error assertion less brittle

Rely only on the stable substring to avoid failures from struct formatting drift.

-			expectedErrors: []string{
-				"spec.providerSpec.value.loadBalancers: Invalid value: []v1beta1.LoadBalancerReference{v1beta1.LoadBalancerReference{Name:\"a\", Type:\"classic\"}}: loadBalancers are not supported for non-control plane machines",
-			},
+			expectedErrors: []string{
+				"loadBalancers are not supported for non-control plane machines",
+			},
pkg/controllers/infracluster/infracluster_controller_test.go (2)

117-124: Delete test namespaces in AfterEach to prevent leaks

Namespaces are generated per test but not removed. Delete them to avoid buildup across tests.

  AfterEach(func() {
    // Stop Manager.
    stopManager(mgrCancel, mgrDone)
    // Cleanup Resources.
    testutils.CleanupResources(Default, ctx, cfg, cl, "", &configv1.ClusterOperator{})
    testutils.CleanupResources(Default, ctx, cfg, cl, capiNamespace.Name, &awsv1.AWSCluster{})
    testutils.CleanupResources(Default, ctx, cfg, cl, mapiNamespace.Name, &mapiv1beta1.Machine{})
+   Expect(cl.Delete(ctx, mapiNamespace)).To(Succeed())
+   Expect(cl.Delete(ctx, capiNamespace)).To(Succeed())
  })

169-176: Avoid time.Sleep in tests

Sleeps make tests slow and flaky. Prefer deterministic ordering (e.g., create “old” first, then “young” in quick succession, or explicitly set a tie-breaker field used by your selection logic).

pkg/controllers/machinesync/aws_test.go (1)

66-78: Simplify startManager signature

No need to pass a pointer to manager; pass the value and call Start directly.

- startManager := func(mgr *manager.Manager) (context.CancelFunc, chan struct{}) {
+ startManager := func(mgr manager.Manager) (context.CancelFunc, chan struct{}) {
   mgrCtx, mgrCancel := context.WithCancel(context.Background())
   mgrDone := make(chan struct{})

   go func() {
     defer GinkgoRecover()
     defer close(mgrDone)

-    Expect((*mgr).Start(mgrCtx)).To(Succeed())
+    Expect(mgr.Start(mgrCtx)).To(Succeed())
   }()
   return mgrCancel, mgrDone
 }

And update the call site:

- mgrCancel, mgrDone = startManager(&mgr)
+ mgrCancel, mgrDone = startManager(mgr)
pkg/controllers/machinesync/machine_sync_controller.go (1)

454-461: Fix method name typo: Platfrom → Platform

Name the method correctly and update the call site for readability/searchability.

- if err := r.validateMAPIToCAPIPlatfromSpecifics(ctx, sourceMAPIMachine); err != nil {
+ if err := r.validateMAPIToCAPIPlatformSpecifics(ctx, sourceMAPIMachine); err != nil {
-// validateMAPIToCAPIPlatfromSpecifics verifies that shared CAPI resources are compatible before converting from MAPI -> CAPI.
-func (r *MachineSyncReconciler) validateMAPIToCAPIPlatfromSpecifics(ctx context.Context, mapiMachine *mapiv1beta1.Machine) error {
+// validateMAPIToCAPIPlatformSpecifics verifies that shared CAPI resources are compatible before converting from MAPI -> CAPI.
+func (r *MachineSyncReconciler) validateMAPIToCAPIPlatformSpecifics(ctx context.Context, mapiMachine *mapiv1beta1.Machine) error {
   switch r.Platform {
   case configv1.AWSPlatformType:
     return r.validateMachineAWSLoadBalancers(ctx, mapiMachine)
   default:
     return nil
   }
 }

Also applies to: 570-579

pkg/controllers/infracluster/powervs.go (1)

79-83: Drop redundant client param on getPowerVSMAPIProviderSpec

Method already has access to r.Client; simplify signature and call.

- machineSpec, err := r.getPowerVSMAPIProviderSpec(ctx, r.Client)
+ machineSpec, err := r.getPowerVSMAPIProviderSpec(ctx)
-func (r *InfraClusterController) getPowerVSMAPIProviderSpec(ctx context.Context, cl client.Client) (*mapiv1.PowerVSMachineProviderConfig, error) {
-	return getMAPIProviderSpec[mapiv1.PowerVSMachineProviderConfig](ctx, cl, r.getRawMAPIProviderSpec)
+func (r *InfraClusterController) getPowerVSMAPIProviderSpec(ctx context.Context) (*mapiv1.PowerVSMachineProviderConfig, error) {
+	return getMAPIProviderSpec[mapiv1.PowerVSMachineProviderConfig](ctx, r.Client, r.getRawMAPIProviderSpec)
 }

Also applies to: 123-126

pkg/controllers/infracluster/gcp.go (2)

61-63: Avoid wrapping nil error in message

This wraps a nil err with %w, producing a confusing message.

- if r.Infra.Status.PlatformStatus == nil {
-   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
- }
+ if r.Infra.Status.PlatformStatus == nil {
+   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil")
+ }

116-121: Clarify error when providerSpec is nil

If machineSpec is nil and err is nil, the current message embeds a nil error.

- machineSpec, err := r.getGCPMAPIProviderSpec(ctx, r.Client)
- if err != nil || machineSpec == nil {
-   return "", fmt.Errorf("unable to get GCP MAPI ProviderSpec: %w", err)
- }
+ machineSpec, err := r.getGCPMAPIProviderSpec(ctx, r.Client)
+ if err != nil {
+   return "", fmt.Errorf("unable to get GCP MAPI ProviderSpec: %w", err)
+ }
+ if machineSpec == nil {
+   return "", fmt.Errorf("unable to get GCP MAPI ProviderSpec: providerSpec not found")
+ }
pkg/controllers/infracluster/vsphere.go (1)

79-81: Avoid wrapping nil error in message

Same pattern as GCP: %w with nil.

- if r.Infra.Status.PlatformStatus == nil {
-   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil: %w", err)
- }
+ if r.Infra.Status.PlatformStatus == nil {
+   return nil, fmt.Errorf("infrastructure PlatformStatus should not be nil")
+ }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e958c81 and 8c7833e.

📒 Files selected for processing (19)
  • cmd/cluster-capi-operator/main.go (1 hunks)
  • pkg/controllers/infracluster/aws.go (4 hunks)
  • pkg/controllers/infracluster/azure.go (10 hunks)
  • pkg/controllers/infracluster/gcp.go (3 hunks)
  • pkg/controllers/infracluster/infracluster_controller.go (6 hunks)
  • pkg/controllers/infracluster/infracluster_controller_test.go (8 hunks)
  • pkg/controllers/infracluster/metal3.go (3 hunks)
  • pkg/controllers/infracluster/powervs.go (4 hunks)
  • pkg/controllers/infracluster/vsphere.go (4 hunks)
  • pkg/controllers/machinesync/aws.go (1 hunks)
  • pkg/controllers/machinesync/aws_test.go (1 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (2 hunks)
  • pkg/conversion/capi2mapi/aws.go (5 hunks)
  • pkg/conversion/capi2mapi/aws_test.go (2 hunks)
  • pkg/conversion/mapi2capi/aws.go (1 hunks)
  • pkg/conversion/mapi2capi/aws_test.go (1 hunks)
  • pkg/test/crdbuilder.go (1 hunks)
  • pkg/test/envtest.go (1 hunks)
  • pkg/util/controlplane.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/controllers/infracluster/metal3.go
  • pkg/conversion/mapi2capi/aws.go
  • pkg/test/crdbuilder.go
  • pkg/util/controlplane.go
  • pkg/conversion/capi2mapi/aws_test.go
🔇 Additional comments (14)
cmd/cluster-capi-operator/main.go (1)

422-426: LGTM! Cache entry correctly configured for Machine objects.

The addition of the cache entry for mapiv1beta1.Machine is consistent with the existing patterns for ControlPlaneMachineSet and MachineSet. This enables the operator to efficiently watch and cache Machine objects from the MAPI namespace, which is necessary for implementing the MAPI→CAPI load balancer conversion functionality.

pkg/test/envtest.go (1)

70-70: LGTM! Adds necessary CPMS CRD for test environment.

The addition of fakeControlPlaneMachineSetCRD to the test environment is appropriate and supports the MAPI→CAPI conversion logic that references ControlPlaneMachineSet resources. The placement logically groups it with other MAPI-related CRDs.

pkg/conversion/mapi2capi/aws_test.go (1)

110-133: Good coverage for control-plane with role+CPMS ownerRef

Scenarios are clear and validate the intended acceptance paths.

pkg/conversion/capi2mapi/aws.go (1)

583-611: LB conversion wiring looks solid

  • Non-control-plane short-circuit is correct.
  • Error surfacing via field.Invalid is appropriate.
  • Classic/ELB and NLB mapping matches CAPA types; no implicit defaulting (aligns with prior discussion).

Also applies to: 613-633

pkg/controllers/machinesync/aws.go (1)

39-87: Validation logic is correct and defensive

  • Enforces no LBs on workers.
  • Requires non-empty CP LB name; optional secondary.
  • Compares names/types against AWSCluster; clear field paths; aggregates via ToAggregate.

Also applies to: 90-121

pkg/controllers/infracluster/infracluster_controller.go (5)

268-276: LGTM!

The namespace defaulting logic is clean and allows for test flexibility while ensuring production defaults are applied when not explicitly set.


287-299: LGTM!

The watch setup is correct and well-commented. Both CPMS and control plane Machines appropriately use MAPINamespace filtering, and the comments clearly explain why these watches are necessary for reconciliation.


383-396: LGTM!

The generic helper function provides a clean abstraction for obtaining and unmarshaling provider specs. Good use of Go generics for type safety.


398-411: LGTM!

The sorting logic correctly orders machines by descending creation timestamp with name as a deterministic tie-breaker. This addresses the concern about machines created simultaneously.


414-429: LGTM!

The conversion to a receiver method is correct, and the use of r.MAPINamespace properly aligns with the namespace-aware pattern introduced in this PR.

pkg/controllers/infracluster/aws.go (4)

49-96: LGTM!

The refactored ensureAWSCluster properly integrates MAPI-derived load balancer configuration and uses the controller's CAPINamespace field. The enhanced logging with klog.KObj improves observability.


98-135: LGTM!

The newAWSCluster function correctly constructs an AWSCluster from MAPI provider spec data, properly handling both primary and optional secondary load balancers extracted from the MAPI configuration.


141-192: LGTM!

The load balancer extraction logic correctly handles 0, 1, and 2 load balancers with appropriate error handling. The preference for "-int" suffix as primary aligns with OpenShift installer conventions.


194-204: LGTM!

The type conversion function correctly maps MAPI load balancer types to their CAPI equivalents and returns a descriptive error for unsupported types.

@damdo
Copy link
Member

damdo commented Oct 14, 2025

/test e2e-aws-ovn-techpreview-upgrade

@damdo
Copy link
Member

damdo commented Oct 15, 2025

Permafailing

/override ci/prow/e2e-openstack-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-openstack-ovn-techpreview

In response to this:

Permafailing

/override ci/prow/e2e-openstack-ovn-techpreview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@damdo
Copy link
Member

damdo commented Oct 15, 2025

/retest-required

@huali9
Copy link
Contributor

huali9 commented Oct 15, 2025

/verified later @huali9
I still need time to thoroughly test this feature so I'd like to do post merge testing to unblock some merges.

@damdo
Copy link
Member

damdo commented Oct 15, 2025

Fix for Openstack CAPI hasn't merged yet #381 so overriding this for now

/override ci/prow/e2e-openstack-capi-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-openstack-capi-techpreview

In response to this:

Fix for Openstack CAPI hasn't merged yet #381 so overriding this for now

/override ci/prow/e2e-openstack-capi-techpreview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@huali9
Copy link
Contributor

huali9 commented Oct 15, 2025

/verified later @huali9

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Oct 15, 2025
@openshift-ci-robot
Copy link

@huali9: This PR has been marked to be verified later by @huali9.

In response to this:

/verified later @huali9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit d19c37c into openshift:main Oct 15, 2025
25 of 26 checks passed
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@RadekManak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-techpreview 8c7833e link false /test e2e-azure-ovn-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants