-
Notifications
You must be signed in to change notification settings - Fork 216
Queue immediate reconciliation on kustomization dependency #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @fogninid! This contribution will be a really good one!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! We should think about writing a test for this somehow. After fixing these comments I will run some manual tests myself 👍
Can you give me some pointers how this could be tested better? I would really like to also have automated checks for the error cases, especially for the pathological one of cyclic dependencies, but I see them realistic only as e2e tests with quite complex setup and long run-time of the test |
Would this cause multiple reconciliations of the same object given that we add the object to the queue here:
Then, in this PR, if the dependency resolves faster, we add the object to the queue for a 2nd time. |
you are right, that re-queuing is not necessary anymore: I removed it |
This disables the the controller flag that everyone is using now, we need to deprecate it and edit its description saying that is no longer in use. |
I see that the same flag is used also for retrying error conditions, including those related to retrieving artifacts from the source. Watching for objects updates can not really cover those cases, so at least some of those "requeues" should be left anyway. If you want, it should be possible to split those cases between "transient errors" (that should be retried with a requeueAfter delay, or even a non-nil err) and "source/dependency has a not-ready status" (that could just return the reconciliation loop and wait for the watcher to queue again as soon as that status changes). For now I have pushed again the version that queues an additional reconciliation, that might not be necessary for the normal code-path. |
@fogninid I propose we make this feature optional at first. Let's add a feature gate called |
Signed-off-by: Daniele Fognini <[email protected]>
@stefanprodan I added the optional feature-gate as you suggested. As far as I see, all tests are currently running with either true or false for the option, but it is not clear to me which one is preferable to set (or if it would even be feasible to run both variants for some of the tests) |
// EnableDependencyQueueing | ||
EnableDependencyQueueing: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// EnableDependencyQueueing | |
EnableDependencyQueueing: false, | |
// EnableDependencyQueueing | |
// opt-in from v1.6 | |
EnableDependencyQueueing: false, |
|
||
// EnableDependencyQueueing controls whether reconciliation of a kustomization | ||
// should be queued once one of its dependencies becomes ready, or if only | ||
// time-based retries with reque-dependecy delays should be attempted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// time-based retries with reque-dependecy delays should be attempted | |
// time-based retries with requeue-dependency delays should be attempted |
DependencyRequeueInterval: 2 * time.Second, | ||
EnableDependencyQueueing: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DependencyRequeueInterval: 2 * time.Second, | |
EnableDependencyQueueing: true, | |
DependencyRequeueInterval: time.Minute, | |
EnableDependencyQueueing: true, |
Let's set here the requeue interval to 1m, this should cause the test to fail if the watcher doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was trying to write above (#1412 (comment)), the requeue interval is currently used also for retries that are not related to readiness.
For example the test TestKustomizationReconciler_ArtifactDownload/recovers_after_not_found_errors
fails with that, because it is explicitly setting some "invalid" statuses on the resources, that cannot be covered by the predicates that are used to filter watchers.
It is possible to change the test to simulate the conditions that would match the readiness predicates...
... but I suppose it is better to change the controller logic to avoid mixing retries due to unexpected errors together with expected (watchable) non-ready states.
In my opinion those kind of retries should be handled either with delay of obj.GetRetryInterval()
, or left as return ..{}, err
for the runtime framework to handle.
Dependents of a kustomization, that are in "wait dependency", status should be reconciled immediately after the dependency becomes ready or is reconciled with a new revision.
This should not make any functional change compared to the current logic, only improve latency compared to current polling of
requeue-dependency
.