-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rotate the openshift-config pull-secret on acr token rotation #3187
rotate the openshift-config pull-secret on acr token rotation #3187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to merge the existing data in the global pull secret with the new token.
fc75287
to
d4ed0ab
Compare
changes made to merge existing and new data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a bug in this controller (and others) surrounding multiple write operations of the same object in a single reconcile loop, and I wonder if we are introducing more of that here on this Secret
resource. Generally, every time we modify any resources, we need to end reconciliation to prevent data from going stale/out of date in our reconciliation loop. Ideally we write 1 object at a time to move towards a desired state, then requeue reconciliation to happen again immediately. For example, a StatefulSet reconciliation regarding scaling would only create/delete 1 Pod at a time, even if the scale was off by 4 to begin with; that just means reconciliation is executed 4 times. I see at least 3 write operations on the same Secret in this PR, and I wonder if this will lead to inconsistent states and lots of errors in logs.
This bug does not apply as we are not in a reconciliation loop on the cluster object, this is done during admin update. The logic in this PR will only call update on the openshift-config secret 1 time (if successful). For the update path we call delete and then create on the secret for 2 write calls.
I think then, proper controller structure would be something pseudo like:
This loop means we only ever perform a write operation once per reconcile, and we always move closer to our desired state, even if we don't get there immediately. |
@SudoBrendan |
I am blind. ty |
In regards to the controller reconciliation failures. We should probably be using a PATCH instead of an Update to the aro cluster CR as during an Update, it's essentially a PUT operation to the resource. A PATCH instead of Update will only update that given field. I think that would cause less contention from an error log perspective in the ARO operator. Ideally we would use PATCH everywhere unless we want a specific configuration laid down. |
445dc3d
to
33de1ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Some things @lranjbar and I noticed during review that might help future reviewers:
- The logic contained here is duplicated across here and the pullsecret controller in the ARO Operator. There are some differences in behavior (e.g. how a notfound error is handled when getting the existing pull secret). IMO this is fine to merge as is, with a future effort to try and deduplicate these with common logic in a shared third location.
- The rotateACRTokenPassword function appears to recreate the pull secret if it needs to, using the contents of an available registryProfile within the cluster document. The code does not perform any nil checks and we were initially concerned that this function would cause a nil pointer dereference when attempting to read the username off this profile. After digging through the code further, we believe that all production clusters should have at least one registryProfile present in the clusterdoc, as the above ensureACRToken function, which is called during bootstrap, will ensure that a registryProfile is created if not already present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the oustanding feedback has been addressed. No new concerns from me.
Which issue this PR addresses:
Fixes https://issues.redhat.com/browse/ARO-4304
What this PR does / why we need it:
Test plan for issue:
Due to the nature of acr token passwords, this will be tested manually
Is there any documentation that needs to be updated for this PR?