Skip to content

Conversation

vr4manta
Copy link
Contributor

@vr4manta vr4manta commented Sep 18, 2025

SPLAT-2206

Changes

  • Added static Dedicated Host support for AWS machines
  • Updated feature gate owner to rvanderp3 and component to splat

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 18, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 18, 2025

@vr4manta: This pull request references SPLAT-2206 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

SPLAT-2206

Changes

  • Added static Dedicated Host support for AWS machines
  • Updated feature gate owner to rvanderp3 and component to splat

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 18, 2025

@vr4manta: This pull request references SPLAT-2206 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

SPLAT-2206

Changes

  • Added static Dedicated Host support for AWS machines
  • Updated feature gate owner to rvanderp3 and component to splat

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 18, 2025

Hello @vr4manta! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 18, 2025
@JoelSpeed
Copy link
Contributor

Does this API already exist upstream in CAPA?

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 18, 2025
@vr4manta
Copy link
Contributor Author

Does this API already exist upstream in CAPA?

@JoelSpeed Yes, this is already merged and pulled into OpenShift. Working on just the static version since dynamic is not finished upstream.

@everettraven
Copy link
Contributor

/assign

@vr4manta vr4manta changed the title SPLAT-2206: Added AWS dedicated host support [WIP] SPLAT-2206: Added AWS dedicated host support Sep 19, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 19, 2025
Copy link
Contributor

openshift-ci bot commented Sep 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from everettraven. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vr4manta vr4manta force-pushed the SPLAT-2206 branch 2 times, most recently from 0fcff1c to b088b27 Compare September 19, 2025 13:33
// +kubebuilder:validation:MaxLength=19
// +openshift:enable:FeatureGate=AWSDedicatedHosts
// +optional
HostID *string `json:"hostID,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between setting this to "" and omitting the field entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no difference. I would assume this field is not set if user not intending to place instances into a dedicated host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no difference, this should not be a pointer and should have a minimum length of 1. This is probably what the linter is complaining about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is validated by Go based webhooks, and not openapi, the linter is wrong on this one.

If we make this not a pointer, then the Go code has no way to know if this was deliberately set to "" or not. We don't want "" to be valid, so this needs to be a pointer so that we can check that.

In this case (and future cases like this in these providerspec APIs) we will want to make exceptions to the serialization rules on the linter.

We may want to even disable the serialization rules on these particular APIs somehow 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I went into standard API review mode here and forgot this API is webhook validation 🤦

Thanks for catching that!

We may want to even disable the serialization rules on these particular APIs somehow

Can we do this via codegen configurations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this via codegen configurations?

No, but we should be able to disable using the .golangci-lint.yaml config, ideally we could have a different config for the APIs that act like this, these MAPI ones aren't the only ones (e.g. the aggregated APIs we support too)

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small comments.

May have more pending the results of discussions on what the appropriate behaviors are when set to Host and AnyAvailable.

@vr4manta vr4manta changed the title [WIP] SPLAT-2206: Added AWS dedicated host support SPLAT-2206: Added AWS dedicated host support Oct 2, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2025
Comment on lines 111 to 131
// hostID specifies the Dedicated Host on which the instance must be started.
// This field is mutually exclusive with DynamicHostAllocation.
// When set, the value must be a valid AWS Dedicated Host ID in the form
// "h-" followed by 17 lowercase hexadecimal characters.
// The maximum length is 19 characters, and the field may be omitted.
// +kubebuilder:validation:XValidation:rule="self == null || self.matches('^h-[0-9a-f]{17}$')",message="hostID must start with 'h-' and end in 17 alphanumeric characters"
// +kubebuilder:validation:MaxLength=19
// +openshift:enable:FeatureGate=AWSDedicatedHosts
// +optional
HostID *string `json:"hostID,omitempty"`

// hostAffinity specifies the dedicated host affinity setting for the instance.
// Valid values are "AnyAvailable", "Host", and omitted.
// When HostAffinity is set to "Host", an instance started onto a specific host always restarts on the same host if stopped.
// When HostAffinity is set to "AnyAvailable", and you stop and restart the instance, it can be restarted on any available host.
// When HostAffinity is omitted and HostID is defined, the instance is started onto the specified host.
// When HostAffinity is defined, HostID is required.
// +kubebuilder:validation:MaxLength=64
// +openshift:enable:FeatureGate=AWSDedicatedHosts
// +optional
HostAffinity *HostAffinity `json:"hostAffinity,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this, the more I wonder if this should be nested another level and follow the discriminated union pattern.

It might make it easier to have configuration options be like:

...
dedicatedHost:
  affinity: Host
  host:
    id: h-017afcd

and

...
dedicatedHost:
  affinity: AnyAvailable

Then we can enforce requirements like dedicateHost.host.id being required when setting dedicatedHost.affinity to Host and forbidding it otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I discussed with @rvanderp3 and we can make it this way.

@vr4manta vr4manta force-pushed the SPLAT-2206 branch 2 times, most recently from 06d98ae to 9589e77 Compare October 7, 2025 16:16
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change to a discriminated union - I definitely like this direction better!

// hostAffinity selects how the instance is placed on a dedicated host.
// Valid values are "AnyAvailable", "Host", and omitted.
// - AnyAvailable: the platform selects any available dedicated host; do not set host.
// - Host: the instance must run on a specific host; set host.hostID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if Host makes the most sense here or if it is to stutter-y.

As an example:

dedicatedHost:
  hostAffinity: Host
  host:
    hostID: h-1326af

That has "host" 5 times in 4 lines.

Maybe something like:

dedicatedHost:
  affinity: Specific
  specific:
    id: h-1326af

is better here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well above we said remove host from affinity, but i do not like the "specific" being used. That feels very odd. So this goes back to not doing discriminating union due to how this is now looking more complex. With this, affinity states if it is to be assigned to a dedicated host and then we just need a field to specify the host ID. Currently we do not have other info for the host to provide so it may start to feel like overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe taking a step back here would be helpful.

Maybe I misunderstood, but my interpretation was that the original hostAffinity value was meant to only be used with dedicated hosts. Is that incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you understand correctly. The complexity is coming from the discriminating union additions, but I just want to find verbage that makes sense and is clear. So allow me to explain history and ideas.

In the aws API (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/moving-instances-dedicated-hosts.html) it states the placement logic for an instance. Its confusing there too. Here is an example:

aws ec2 modify-instance-placement \
    --instance-id i-1234567890abcdef0 \
    --affinity host \
    --tenancy host \
    --host-id h-012a3456b7890cdef

So what we are adding / added in CAPA upstream was this: kubernetes-sigs/cluster-api-provider-aws#5548

So for MAPI, we are adding just enough so we can do the mapi2capi conversion logic so we can create these resources. Ideally I was hoping to keep our API similar for simplicity (the original concept at beginning of PR), but I am happy to have it follow the OCP ideals (I did similar things for CAPV stuff for multi disk and others). So with all of this, I do like the idea of using the discriminating union pattern. Building upon our ideas with thoughts on whats next to come might be to do a combination of our ideas with the following:

[Any Host Placement]

hostPlacement:
  hostAffinity: AnyAvailable

[Static Placement]

hostPlacement:
  hostAffinity: DedicatedHost
  DedicatedHost:
    id: h-abcdef0123456789a

[Dynamic Host Placement]

hostPlacement:
  hostAffinity: DynamicHost
  DynamicHost:
    tags:
    - app: myApp
    - department: whatever

Maybe this is more in line with what you are thinking. I know @rvanderp3 is working on adding dynamic host support upstream at the moment and that will require some additional data or changed fields. Maybe this better sets us up for that.

What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications here - with more context what you've shared makes a lot of sense to me. Let's go with what you've proposed (presumably dropping the DynamicHost support until that is added upstream).

Only changes I'd suggest here are:

  • hostAffinity -> affinity
  • DedicatedHost -> Dedicated
  • In the future, DynamicHost -> Dynamic

Because it is already nested under hostPlacement dropping "host" from those field names should still logically make sense and reduce the repetition of the "host" term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, i'll mock up those changes.

@vr4manta vr4manta force-pushed the SPLAT-2206 branch 4 times, most recently from c36f492 to c503f95 Compare October 14, 2025 16:19
// When Affinity is set to AnyAvailable, and you stop and restart the instance, it can be restarted on any available host.
// +required
// +unionDiscriminator
Affinity HostAffinity `json:"affinity,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is validated in a go webhook, do you need this to be a pointer so you can explicitly distinguish between not set and intentionally set to the empty string value ("") and return the appropriate field error (i.e required vs invalid value)?

Copy link
Contributor Author

@vr4manta vr4manta Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. If I make it a pointer, i get:

machine/v1beta1/types_awsprovider.go:415:2: requiredfields: field Affinity does not allow the zero value. The field does not need to be a pointer. (kubeapilinter)
        Affinity *HostAffinity `json:"affinity,omitempty"`

If its required according to the godoc, I'd assume we are forcing user to set this field since it is the discriminator. With the parent hostPlacement being optional, then having it required to be set makes the most sense. I was planning on having the webhook look at it and if "", I would just set it to the default (which I cannot set in the godoc since above you said it needs to be removed for discriminators).

All of the changes for this API are being processed in the hood and godoc is gonna be ignored, but we are making it match what we want it to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. If I make it a pointer, i get:

machine/v1beta1/types_awsprovider.go:415:2: requiredfields: field Affinity does not allow the zero value. The field does not need to be a pointer. (kubeapilinter)
Affinity *HostAffinity json:"affinity,omitempty"

Yeah, this is a known issue with our linter (that should actually be fixed more recently - you might need to rebase to pick up the fix on your branch). It should be safe to ignore for this case - I can override the check if this is the only failure.

If its required according to the godoc, I'd assume we are forcing user to set this field since it is the discriminator. With the parent hostPlacement being optional, then having it required to be set makes the most sense. I was planning on having the webhook look at it and if "", I would just set it to the default (which I cannot set in the godoc since above you said it needs to be removed for discriminators).

All of the changes for this API are being processed in the hood and godoc is gonna be ignored, but we are making it match what we want it to be.

Yes, if it is required we will be forcing the user to specify the field but we will have to do that through the webhook validation logic.

If you'd like to use the empty string ("") to signal that this is invalid you can, but that imposes a limitation - you can't determine whether or not it is the empty string because it was not provided or if an end-user explicitly set it to the empty string value.

Being able to distinguish between not provided vs explicitly set to the empty string by an end-user allows you to return more specific error messages.

For example, if you made this a pointer you can do something like:

// return an error message explicitly stating that the field is required and must be specified
if affinity == nil {
    return fields.Required(...)
}

// return an error message explicitly stating that the provided value is invalid and must match an allowed value
if affinity == "" {
    return fields.Invalid(...)
}

Without it being a pointer, you may be limited to only really providing an error message that states the value is invalid - which could cause confusion for a user who didn't set the field.

An alternative here if you want to keep the non-pointer approach is to always return an error that says something like "affinity is required and must be set to one of {allowedValues}".

I was planning on having the webhook look at it and if "", I would just set it to the default (which I cannot set in the godoc since above you said it needs to be removed for discriminators)

Wanted to call this out specifically - don't default the discriminator at all. Make the user make an explicit choice here.

Copy link
Contributor

openshift-ci bot commented Oct 17, 2025

@vr4manta: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// +unionDiscriminator
Affinity HostAffinity `json:"affinity,omitempty"`

// dedicatedHost specifies a particular dedicated host when required by affinity set to DedicatedHost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Suggested change
// dedicatedHost specifies a particular dedicated host when required by affinity set to DedicatedHost.
// dedicatedHost specifies the exact host that an instance should be restarted on if stopped.

?

)

// HostPlacement is the type that will be used to configure the placement of AWS instances.
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.affinity == 'DedicatedHost' ? has(self.dedicated) : !has(self.dedicated)",message="dedicated is required when affinity is DedicatedHost, and forbidden otherwise"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.affinity == 'DedicatedHost' ? has(self.dedicated) : !has(self.dedicated)",message="dedicated is required when affinity is DedicatedHost, and forbidden otherwise"
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.affinity == 'DedicatedHost' ? has(self.dedicatedHost) : !has(self.dedicatedHost)",message="dedicatedHost is required when affinity is DedicatedHost, and forbidden otherwise"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants