Skip to content

fix: propagate Accepted=False to Ready condition when GitRepo job creation fails#5318

Open
mvanhorn wants to merge 1 commit into
rancher:mainfrom
mvanhorn:fix/4865-ready-on-failure
Open

fix: propagate Accepted=False to Ready condition when GitRepo job creation fails#5318
mvanhorn wants to merge 1 commit into
rancher:mainfrom
mvanhorn:fix/4865-ready-on-failure

Conversation

@mvanhorn

Copy link
Copy Markdown

Refers to #4865

Summary

When a GitRepo fails before any bundles are created (e.g. a missing or invalid secret referenced by HelmSecretNameForPaths or ClientSecretName), the gitjob_controller sets Accepted=False via updateErrorStatus. However the StatusReconciler computes Ready independently from the bundle summary. With no bundles present the summary is 0/0, which satisfies IsReady(), so the GitRepo is reported as Ready=True even though nothing is deploying.

This PR adds propagateAcceptedFailureToReady in status_controller.go. After setReadyStatusFromBundle runs, the function checks whether the Accepted condition is False. If so it forces Ready=False, copying the Accepted message and reason onto the Ready condition. The check is additive: bundle-driven errors are still handled as before, and the propagation only fires when Accepted is explicitly False.

Why this matters

Users who hit pre-job errors (missing secret, cabundle failure, restriction violation) see a healthy green GitRepo while their workload is silently not deploying. This change makes the error visible exactly where operators look first.

Testing

Four unit tests added to gitjob_test.go (same package):

  • Test_PropagateAcceptedFailureToReady_SetsReadyFalse - overwrites existing Ready=True when Accepted=False
  • Test_PropagateAcceptedFailureToReady_AddsReadyWhenMissing - adds Ready=False when no Ready condition exists yet (the 0/0 empty-bundle case)
  • Test_PropagateAcceptedFailureToReady_NoOpWhenAcceptedTrue - verifies no change when Accepted=True
  • Test_PropagateAcceptedFailureToReady_NoOpWhenNoAccepted - verifies no change when no Accepted condition present

All 115 tests in the reconciler package pass (go test ./internal/cmd/controller/gitops/reconciler/...).

Fixes #4865

Additional Information

Checklist

  • I have updated the documentation via a pull request in the fleet-product-docs repository.

…ation fails

When a GitRepo fails before any bundles are created (e.g. missing secret,
invalid cabundle), the Accepted condition is set to False but the Ready
condition remains True because the empty 0/0 bundle summary satisfies the
IsReady check. Users see a healthy-looking GitRepo while their workload is
silently not deploying.

Add propagateAcceptedFailureToReady in the StatusReconciler to copy the
Accepted=False error message onto the Ready condition. The check runs after
setReadyStatusFromBundle, so bundle-driven errors are still handled as before
and the Accepted propagation only fires when Accepted is explicitly False.

Fixes rancher#4865

Signed-off-by: Matt Van Horn <mvanhorn@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

@0xavi0 0xavi0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, for your contribution!
I've left a few comments.

Comment on lines +279 to +286
for _, c := range gitrepo.Status.Conditions {
if c.Type == string(fleet.Ready) {
c.Status = v1.ConditionFalse
c.Message = acceptedMsg
c.Reason = acceptedReason
found = true
}
newConditions = append(newConditions, c)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this could introduce an undesiredLastUpdateTime update.

Let's say the first iteration of the status reconciler computes the desired state and, because there are no Bundles it considers Ready should be true.
Later this new function propagates the message and reason from the Accepted condition and status is now False.
That's fine.

The problem I see (I would need to confirm testing it myself) is that for the next reconcile it re-checks again, there are no Bundles and the Ready condition is found as False.
When that happens, wrangler updates the LastUpdateTime when it sets the condition back to True.
Later the new function only preserves the message and reason, so the LastUpdateTime has been updated (when it shouldn't).

I think the tests are not exercising this particular scenario. I think it should be also covered


Also, what happens if the Ready condition was already False?
We would be overwritting unconditionally the message and reason to the Accepted values, right?

Reason: acceptedReason,
})
}
gitrepo.Status.Conditions = newConditions

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this invalidate/remove all other conditions that may be existing on this object?
I also expect the Ready condition to be a summary of other conditions, so maybe a good approach for this state is to introduce a new one?

Not that familiar with the API to judge, but I feel like it would be a more clean approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GitRepo incorrectly marked as Ready upon failure

3 participants