Skip to content

(fr-6) Use dedicated startup probe with higher failure threshold#604

Open
Deydra71 wants to merge 1 commit into
openstack-k8s-operators:18.0-fr6from
Deydra71:fix-startup-probe
Open

(fr-6) Use dedicated startup probe with higher failure threshold#604
Deydra71 wants to merge 1 commit into
openstack-k8s-operators:18.0-fr6from
Deydra71:fix-startup-probe

Conversation

@Deydra71

Copy link
Copy Markdown
Contributor

The startup probe shared the same configuration as liveness/readiness probes, giving Horizon only ~40s to start. In resource-constrained environments startup exceeds this window causing CrashLoopBackOff.

Introduce formatStartupProbe() with FailureThreshold: 12 (allowing ~120s for startup), matching the pattern used by Cinder, Glance, and Manila operators.

Assisted-by: Claude Opus 4.6

Note: This is a repeated issue seen in SKMO CI runs that's deploying horizon in main region.

The startup probe shared the same configuration as liveness/readiness
probes, giving Horizon only ~40s to start. In resource-constrained
environments startup exceeds this window causing CrashLoopBackOff.

Introduce formatStartupProbe() with FailureThreshold: 12 (allowing
~120s for startup), matching the pattern used by Cinder, Glance, and
Manila operators.

Signed-off-by: Veronika Fisarova <vfisarov@redhat.com>
Assisted-by: Claude Opus 4.6
@openshift-ci openshift-ci Bot requested a review from dprince June 24, 2026 07:24
@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Deydra71
Once this PR has been reviewed and has the lgtm label, please assign mcgonago for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested a review from mcgonago June 24, 2026 07:24
@Deydra71 Deydra71 requested a review from abays June 24, 2026 07:24
@Deydra71 Deydra71 changed the title Use dedicated startup probe with higher failure threshold (fr-6) Use dedicated startup probe with higher failure threshold Jun 24, 2026
@centosinfra-prod-github-app

Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/a84714ee3ae2425bbdbc8fc4bb5569f0

openstack-k8s-operators-content-provider FAILURE in 9m 45s
⚠️ horizon-operator-kuttl SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider (non-voting)

@Deydra71

Copy link
Copy Markdown
Contributor Author

recheck

func formatStartupProbe() *corev1.Probe {

return &corev1.Probe{
TimeoutSeconds: 5,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deydra71 o/
We recently introduced a new module [1] to provide an interface for probes.
Using cinder here as just a simple example [2] that shows how to use the interface, you can basically create a Probeset struct with something like:

	apiProbes, err := probes.CreateProbeSet(
		int32(cinder.CinderPublicPort),
		&scheme,
		instance.Spec.Override.Probes,
		cinder.GetDefaultProbesAPI(timeout),
	)

or, for more advanced usage, like mariadb does [3], it is possible to also pass a command and the handler type.
In general I think we could improve this code in main and add the probe interface as well, so we can take advantage of the override in case we need to tune this statefulset in production.
Let me know if that aligns with the goal of this patch, otherwise we can create a dedicated follow up that enhances the interface in main and then we backport to fr6 as well.

[1] github.com/openstack-k8s-operators/lib-common/modules/common/probes
[2] https://github.com/openstack-k8s-operators/cinder-operator/blob/main/internal/cinderapi/statefuleset.go#L56C2-L62C3
[3] https://github.com/openstack-k8s-operators/mariadb-operator/blob/main/internal/mariadb/statefulset.go#L138

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I overlooked the lib-common module :/

Looking at the examples I think it's pretty straightforward to update it in main, and then create backport only from the new one (and close this one).

I can work on it this week. @fmount Is there any Jira ttracking the implementation of probes.OverrideSpec across controlplane? So far I could find only support in storage operators and mariadb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but keep in mind that this will be a CRD change. we can not just backport it and expect it to show up without an openstack-operator change. we need to plan for when it has to be released. otherwise we have to do an short term update and a longer term transition to this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Is there any Jira ttracking the implementation of probes.OverrideSpec across controlplane? So far I could find only support in storage operators and mariadb

Because the interface has grown over the time, the idea is to add overrides only where we need, and I assume we can create stories under https://redhat.atlassian.net/browse/OSPRH-2490 to track the work.
I agree that we need to coordinate to make sure it will be available on a specific maintenance release and bump the openstack-operator (fr6) to get the CRD change as well.
For cinder we have an associated bug that will go out soon, not sure we want to take a similar approach, or we just close 2490 w/ FR6 and work on dedicated items (e.g. a new horizon bug that creates the bug-epic and the associated stream).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants