fix: propagate Accepted=False to Ready condition when GitRepo job creation fails#5318
fix: propagate Accepted=False to Ready condition when GitRepo job creation fails#5318mvanhorn wants to merge 1 commit into
Conversation
…ation fails When a GitRepo fails before any bundles are created (e.g. missing secret, invalid cabundle), the Accepted condition is set to False but the Ready condition remains True because the empty 0/0 bundle summary satisfies the IsReady check. Users see a healthy-looking GitRepo while their workload is silently not deploying. Add propagateAcceptedFailureToReady in the StatusReconciler to copy the Accepted=False error message onto the Ready condition. The check runs after setReadyStatusFromBundle, so bundle-driven errors are still handled as before and the Accepted propagation only fires when Accepted is explicitly False. Fixes rancher#4865 Signed-off-by: Matt Van Horn <mvanhorn@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
0xavi0
left a comment
There was a problem hiding this comment.
Thanks, for your contribution!
I've left a few comments.
| for _, c := range gitrepo.Status.Conditions { | ||
| if c.Type == string(fleet.Ready) { | ||
| c.Status = v1.ConditionFalse | ||
| c.Message = acceptedMsg | ||
| c.Reason = acceptedReason | ||
| found = true | ||
| } | ||
| newConditions = append(newConditions, c) |
There was a problem hiding this comment.
I'm not sure this could introduce an undesiredLastUpdateTime update.
Let's say the first iteration of the status reconciler computes the desired state and, because there are no Bundles it considers Ready should be true.
Later this new function propagates the message and reason from the Accepted condition and status is now False.
That's fine.
The problem I see (I would need to confirm testing it myself) is that for the next reconcile it re-checks again, there are no Bundles and the Ready condition is found as False.
When that happens, wrangler updates the LastUpdateTime when it sets the condition back to True.
Later the new function only preserves the message and reason, so the LastUpdateTime has been updated (when it shouldn't).
I think the tests are not exercising this particular scenario. I think it should be also covered
Also, what happens if the Ready condition was already False?
We would be overwritting unconditionally the message and reason to the Accepted values, right?
| Reason: acceptedReason, | ||
| }) | ||
| } | ||
| gitrepo.Status.Conditions = newConditions |
There was a problem hiding this comment.
Doesn't this invalidate/remove all other conditions that may be existing on this object?
I also expect the Ready condition to be a summary of other conditions, so maybe a good approach for this state is to introduce a new one?
Not that familiar with the API to judge, but I feel like it would be a more clean approach.
Refers to #4865
Summary
When a GitRepo fails before any bundles are created (e.g. a missing or invalid secret referenced by
HelmSecretNameForPathsorClientSecretName), thegitjob_controllersetsAccepted=FalseviaupdateErrorStatus. However theStatusReconcilercomputesReadyindependently from the bundle summary. With no bundles present the summary is 0/0, which satisfiesIsReady(), so the GitRepo is reported asReady=Trueeven though nothing is deploying.This PR adds
propagateAcceptedFailureToReadyinstatus_controller.go. AftersetReadyStatusFromBundleruns, the function checks whether theAcceptedcondition isFalse. If so it forcesReady=False, copying theAcceptedmessage and reason onto theReadycondition. The check is additive: bundle-driven errors are still handled as before, and the propagation only fires whenAcceptedis explicitlyFalse.Why this matters
Users who hit pre-job errors (missing secret, cabundle failure, restriction violation) see a healthy green
GitRepowhile their workload is silently not deploying. This change makes the error visible exactly where operators look first.Testing
Four unit tests added to
gitjob_test.go(same package):Test_PropagateAcceptedFailureToReady_SetsReadyFalse- overwrites existingReady=TruewhenAccepted=FalseTest_PropagateAcceptedFailureToReady_AddsReadyWhenMissing- addsReady=Falsewhen no Ready condition exists yet (the 0/0 empty-bundle case)Test_PropagateAcceptedFailureToReady_NoOpWhenAcceptedTrue- verifies no change whenAccepted=TrueTest_PropagateAcceptedFailureToReady_NoOpWhenNoAccepted- verifies no change when no Accepted condition presentAll 115 tests in the reconciler package pass (
go test ./internal/cmd/controller/gitops/reconciler/...).Fixes #4865
Additional Information
Checklist