Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
apiVersion: apiextensions.k8s.io/v1 # Hack because controller-gen complains if we don't have this
name: "[TechPreview] InternalReleaseImage"
crdName: machineconfignodes.machineconfiguration.openshift.io
featureGates:
- MachineConfigNodes
- NoRegistryClusterOperations
tests:
onUpdate:
- name: Should be able to update a MachineConfigNode with a minimal internalReleaseImage status field.
initial: |
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigNode
metadata:
name: foobar
spec:
node:
name: foobar
pool:
name: master
configVersion:
desired: rendered-master-abc
updated: |
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigNode
metadata:
name: foobar
spec:
node:
name: foobar
pool:
name: master
configVersion:
desired: rendered-master-abc
status:
internalReleaseImage:
installedReleases:
- name: ocp-release-bundle-4.18.0-x86_64
image: example.com/example/openshift-release-dev@sha256:d98795f7932441b30bb8bcfbbf05912875383fad1f2b3be08a22ec148d68607f
expected: |
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigNode
metadata:
name: foobar
spec:
node:
name: foobar
pool:
name: master
configVersion:
desired: rendered-master-abc
status:
internalReleaseImage:
installedReleases:
- name: ocp-release-bundle-4.18.0-x86_64
image: example.com/example/openshift-release-dev@sha256:d98795f7932441b30bb8bcfbbf05912875383fad1f2b3be08a22ec148d68607f
74 changes: 74 additions & 0 deletions machineconfiguration/v1/types_machineconfignode.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,80 @@ type MachineConfigNodeStatus struct {
// +kubebuilder:validation:MaxItems=32
// +optional
IrreconcilableChanges []IrreconcilableChangeDiff `json:"irreconcilableChanges,omitempty"`
// internalReleaseImage describes the status of the release payloads stored in the node.
// When specified, an internalReleaseImage custom resource exists on the cluster, and the specified images will be made available on the control plane nodes.
// This field will reflect the actual on-disk state of those release images.
// +openshift:enable:FeatureGate=NoRegistryClusterOperations
// +optional
InternalReleaseImage *MachineConfigNodeStatusInternalReleaseImage `json:"internalReleaseImage,omitempty"`
}

// MachineConfigNodeStatusInternalReleaseImage holds information about the current and discovered release bundles for the observed machine
// config node.
type MachineConfigNodeStatusInternalReleaseImage struct {
// conditions represent the observations of an internal release image current state.
// +listType=map
// +listMapKey=type
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question for #2510 (comment) kinda applies here as well, but maybe we're ok with just having a few general status's for the node to indicate if anything is failing. In that case, I think 256 is a bit overkill. Maybe we can shorten this and then similar to https://github.com/openshift/api/blob/master/machineconfiguration/v1/types_machineosbuild.go#L81-L90 have a list of what we'd expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there are three main activities/tasks that I 'd expect to be reported here (affecting the current node):

  • A new release is added
  • An exsiting release is removed
  • A new release is discovered

For each of the above tasks, I'd expect to report when it's started/progressing/completed (with success or failure). This means roughly from 3 to 5 conditions entries for each release, for each task (assuming that the progressing could be detailed a little bit more than just one condition).
In a sunny day scenario, where the user initially discover a release, then adds it to the cluster and then removes it in future after an upgrade, a total of ~10/15 conditions entry will be consumed for a single release.

I'm not sure what happens when the maxItems limit will be reached (older entries will be purged?), but in such case probable we could size it to keep track of the events affecting the last 5 release bundles managed (so something around ~128)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I that's your approach I think you may need to reconsider how to declare the field, cause I guess you will have 5 types of conditions but you won't be able to "duplicate" them by release as the field is using listMapKey=type.

Copy link
Contributor

@yuqi-zhang yuqi-zhang Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Pablo. The method we have now via the standard conditions would make it hard to support this, unless you want to make your own type for it.

If we were to use conditions out of the box, let's say I have 2 releases, "v1" and "v2". Where v1 is available and v2 is progressing but failing on this node, I'd see:

  1. started: true, message is that v2 has started
  2. progressing: false, no progress is being made for either
  3. completed: true: v1 is completed
  4. failing: true: v2 is failing

That... would be hard to parse back up to the overall conditions. I think the more I think about this the more I'd lean towards not using conditions but instead have a subfield under MachineConfigNodeStatusInternalReleaseImageDetailedRef that has the status. So using the above example, I'd see:

MachineConfigNodeStatusInternalReleaseImage:
  AvailableReleases:
    - name: v1
    - name: v2
  InstalledReleases:
    - name: v1
      image: v1image
      status: completed
      message: image has been pulled and is available on node
    - name: v2
      image: imagev2
      status: failing
      message: failing to pull image: not enough space on disk

As opposed to:

MachineConfigNodeStatusInternalReleaseImage:
  AvailableReleases:
    - name: v1
    - name: v2
  InstalledReleases:
    - name: v1
      image: v1image
    - name: v2
      image: imagev2
  Conditions:
    - Type: "Available"
      Status: True
      Message: v1 is available
    - Type: "Failing"
      Status: True
      Message: v2 is failing

or:

MachineConfigNodeStatusInternalReleaseImage:
  AvailableReleases:
    - name: v1
    - name: v2
  InstalledReleases:
    - name: v1
      image: v1image
    - name: v2
      image: imagev2
  Conditions:
    - Type: "v1Available"
      Status: True
      Message: Image is available
    - Type: "v1Failing"
      Status: False
      Message: Image is not failing
    - Type: "v2Available"
      Status: False
      Message: Image is not available
    - Type: "v2Failing"
      Status: True
      Message: Image is failing

Both of which would be harder to parse and is confusing to look at

Copy link
Contributor

@pablintino pablintino Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang that's exactly the approach I had in my head when during last's week sync when I said I wasn't expecting conditions to be used here. I can see a few ones that summarize global node status, like "AllReleasesReady" or "ReleasesProgressing" or something like that, but putting in a condition all the details of a release and create N-copies of the conditions per release, no matter if the CR is aimed to be read by a human or not was, and still is, to me awkward.
An extra point agains the approach I've thought. What happens if in N releases we need to report a new thing per release? We cannot add a field, cause it's a condition, so the only way out would be: more conditions, and it will be unmanageable

Copy link
Contributor

@yuqi-zhang yuqi-zhang Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the conversation we had today, we could instead flip the fields, and have the fields be the higher level status, something like this:

MachineConfigNodeStatusInternalReleaseImage:
  Releases:
    - name: v1
      available: True
      installed: True
      conditions:
      - Type: "Available"
        Status: True
        Message: Image is available
    - name: v2
      available: True
      installed: False
      conditions:
      - Type: "Available"
        Status: True
        Message: Image is available
      - Type: "Degraded"
        Status: True
        Message: Image cannot be pulled

Or alternatively, @andfasano mentioned that we don't expect users to necessarily read this, and instead the IRI object aggregates all of the actual status. Based on the minimum amount of information we'd need for this, it can look something like this:

MachineConfigNodeStatusInternalReleaseImage:
  Releases:
    - name: v1
      available: True
      installed: True
      degraded: False
      error: ""
    - name: v2
      available: True
      installed: False
      degraded: True
      error: "out of space on disk"

or

MachineConfigNodeStatusInternalReleaseImage:
  Releases:
    - name: v1
      conditions:
      - Type: "Available"
        Status: True
        Message: v1 is available
      - Type: "Installed"
        Status: False
        Message: v1 is installed
      - Type: "Degraded"
        Status: False
        Message: v1 is not failing
    - name: v2
      - Type: "Available"
        Status: True
        Message: v2 is available
      - Type: "Installed"
        Status: False
        Message: v2 is not installed
      - Type: "Degraded"
        Status: True
        Message: node is out of disk

(or even more simplified)

MachineConfigNodeStatusInternalReleaseImage:
  Releases:
    - name: v1
      state: installed
      message: ""
    - name: v2
      state: degraded
      message: "out of space on disk"

// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

// availableReleases is a list of release bundle identifiers currently detected
// from the ISO attached to one of the control plane nodes. Any reported identifier can
// be used to amend the `spec.Releases` field to add a new release bundle to the cluster.
// An empty value indicates that no ISOs are currently being detected on any control plane
// node.
// Must not exceed 5 entries.
// +listType=map
// +listMapKey=name
// +kubebuilder:validation:MaxItems=5
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud, maybe we should have this be required (if the IRI status exists) so at it would always reflect what the daemon is currently detecting? And if we don't detect anything we just have an empty list?

AvailableReleases []MachineConfigNodeStatusInternalReleaseImageRef `json:"availableReleases,omitempty"`

// installedReleases is a list of the release bundles currently owned and managed by the
// cluster, indicating that their images can be safely pulled by any cluster entity
// requiring them.
// This field can contain between 1 and 5 entries.
// +listType=map
// +listMapKey=name
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=5
// +optional
InstalledReleases []MachineConfigNodeStatusInternalReleaseImageDetailedRef `json:"installedReleases,omitempty"`
}

// MachineConfigNodeStatusInternalReleaseImageRef is used to provide a simple reference for a release
// bundle. Currently it contains only the name field.
// +openshift:enable:FeatureGate=NoRegistryClusterOperations
type MachineConfigNodeStatusInternalReleaseImageRef struct {
// name indicates the desired release bundle identifier. This field is required and must be between 1 and 64 characters long.
// +required
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=64
Name string `json:"name,omitempty"`
}

// MachineConfigNodeStatusInternalReleaseImageDetailedRef is used to provide a more detailed reference for
// a release bundle.
// +openshift:enable:FeatureGate=NoRegistryClusterOperations
type MachineConfigNodeStatusInternalReleaseImageDetailedRef struct {
// name indicates the desired release bundle identifier. This field is required and must be between 1 and 64 characters long.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=64
// +required
Name string `json:"name,omitempty"`

// image is an OCP release image referenced by digest.
// The format of the image pull spec is: host[:port][/namespace]/name@sha256:<digest>,
// where the digest must be 64 characters long, and consist only of lowercase hexadecimal characters, a-f and 0-9.
// The length of the whole spec must be between 1 to 447 characters.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=447
// +kubebuilder:validation:XValidation:rule=`(self.split('@').size() == 2 && self.split('@')[1].matches('^sha256:[a-f0-9]{64}$'))`,message="the OCI Image reference must end with a valid '@sha256:<digest>' suffix, where '<digest>' is 64 characters long"
// +kubebuilder:validation:XValidation:rule=`(self.split('@')[0].matches('^([a-zA-Z0-9-]+\\.)+[a-zA-Z0-9-]+(:[0-9]{2,5})?/([a-zA-Z0-9-_]{0,61}/)?[a-zA-Z0-9-_.]*?$'))`,message="the OCI Image name should follow the host[:port][/namespace]/name format, resembling a valid URL without the scheme"
// +required
Image string `json:"image,omitempty"`
}

// IrreconcilableChangeDiff holds an individual diff between the initial install-time MachineConfig
Expand Down
Loading