Skip to content

Conversation

fogninid
Copy link

Dependents of a kustomization, that are in "wait dependency", status should be reconciled immediately after the dependency becomes ready or is reconciled with a new revision.

This should not make any functional change compared to the current logic, only improve latency compared to current polling of requeue-dependency.

Copy link
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @fogninid! This contribution will be a really good one!!

Copy link
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! We should think about writing a test for this somehow. After fixing these comments I will run some manual tests myself 👍

@fogninid
Copy link
Author

fogninid commented Apr 8, 2025

Great! We should think about writing a test for this somehow. [...]

Can you give me some pointers how this could be tested better?
So far I have been relying only on this existing e2e, that somehow matches my expectations by completing some 30s quicker after the change.

I would really like to also have automated checks for the error cases, especially for the pathological one of cyclic dependencies, but I see them realistic only as e2e tests with quite complex setup and long run-time of the test

@stefanprodan stefanprodan added the enhancement New feature or request label Apr 13, 2025
@stefanprodan stefanprodan changed the title queue immediate reconciliation on kustomization dependency Queue immediate reconciliation on kustomization dependency Apr 13, 2025
@stefanprodan
Copy link
Member

stefanprodan commented Apr 14, 2025

Would this cause multiple reconciliations of the same object given that we add the object to the queue here:

return ctrl.Result{RequeueAfter: r.requeueDependency}, nil

Then, in this PR, if the dependency resolves faster, we add the object to the queue for a 2nd time.

@fogninid
Copy link
Author

Would this cause multiple reconciliations of the same object given that we add the object to the queue here:

return ctrl.Result{RequeueAfter: r.requeueDependency}, nil

Then, in this PR, if the dependency resolves faster, we add the object to the queue for a 2nd time.

you are right, that re-queuing is not necessary anymore: I removed it

@stefanprodan
Copy link
Member

stefanprodan commented Apr 14, 2025

you are right, that re-queuing is not necessary anymore: I removed it

This disables the the controller flag that everyone is using now, we need to deprecate it and edit its description saying that is no longer in use.

@fogninid
Copy link
Author

you are right, that re-queuing is not necessary anymore: I removed it

This disables the the controller flag that everyone is using now, we need to deprecate it and edit its description saying that is no longer in use.

I see that the same flag is used also for retrying error conditions, including those related to retrieving artifacts from the source.

Watching for objects updates can not really cover those cases, so at least some of those "requeues" should be left anyway.

If you want, it should be possible to split those cases between "transient errors" (that should be retried with a requeueAfter delay, or even a non-nil err) and "source/dependency has a not-ready status" (that could just return the reconciliation loop and wait for the watcher to queue again as soon as that status changes).

For now I have pushed again the version that queues an additional reconciliation, that might not be necessary for the normal code-path.

@stefanprodan
Copy link
Member

@fogninid I propose we make this feature optional at first. Let's add a feature gate called EnableDependencyQueueing and based on its value we add the watcher to the controller manager.

@fogninid
Copy link
Author

@stefanprodan I added the optional feature-gate as you suggested.

As far as I see, all tests are currently running with either true or false for the option, but it is not clear to me which one is preferable to set (or if it would even be feasible to run both variants for some of the tests)

Comment on lines +74 to +75
// EnableDependencyQueueing
EnableDependencyQueueing: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// EnableDependencyQueueing
EnableDependencyQueueing: false,
// EnableDependencyQueueing
// opt-in from v1.6
EnableDependencyQueueing: false,


// EnableDependencyQueueing controls whether reconciliation of a kustomization
// should be queued once one of its dependencies becomes ready, or if only
// time-based retries with reque-dependecy delays should be attempted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// time-based retries with reque-dependecy delays should be attempted
// time-based retries with requeue-dependency delays should be attempted

Comment on lines 187 to +188
DependencyRequeueInterval: 2 * time.Second,
EnableDependencyQueueing: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DependencyRequeueInterval: 2 * time.Second,
EnableDependencyQueueing: true,
DependencyRequeueInterval: time.Minute,
EnableDependencyQueueing: true,

Let's set here the requeue interval to 1m, this should cause the test to fail if the watcher doesn't work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was trying to write above (#1412 (comment)), the requeue interval is currently used also for retries that are not related to readiness.

For example the test TestKustomizationReconciler_ArtifactDownload/recovers_after_not_found_errors fails with that, because it is explicitly setting some "invalid" statuses on the resources, that cannot be covered by the predicates that are used to filter watchers.
It is possible to change the test to simulate the conditions that would match the readiness predicates...
... but I suppose it is better to change the controller logic to avoid mixing retries due to unexpected errors together with expected (watchable) non-ready states.

In my opinion those kind of retries should be handled either with delay of obj.GetRetryInterval(), or left as return ..{}, err for the runtime framework to handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants