-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GracefulShutdown upgrade may get stuck in between versions #2230
Comments
@ryanemerson here is complete content outputted by the failed test: TestOperandUpgrades.zip. This one is quite large as it failed quite lately but the failure can pretty much happen between any two version. What seems to be common is that it seemingly stops reconciling when the reconciler gets to the
|
I believe I finally managed to fully understand what's happening and why does it seems that the reconciler hits the max wait limit out of nowhere. First part that needs to be understood is how k8s controller runtime prioritizes reconcilation requests and how it handles the exponential backoff:
Second part that needs to be understood is Infinispan controller configuration and events it listens for, mainly:
So what happens during reconciliation is that pipeline is updating Infinispan status and StatefulSet which then trigger new events triggering reconciliation executed immediately and at the same it reques the request which is slowly increasing the back off due to requeue with 0s delay. Due to reconciler explicitly reacting to all the events, the quickly increasing backoff isn't visible. Even is often reset by explicit delay. So what's happening is that all the reconciliation events triggered by the changes to Infinispan and StatefulSet that are immediately executed also double the backoff and once requed request gets requed again it hits the max time limit |
There's a slight chance for 'TestOperandUpgrades' to fail in between individual version upgrades due to new StatefulSet version not being created.
Operator log:
infinispan-operator-controller-manager-5776c64f76-l889k.log
Test log:
The text was updated successfully, but these errors were encountered: