Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rotate the openshift-config pull-secret on acr token rotation #3187

Merged
merged 7 commits into from
Sep 21, 2023

Conversation

s-amann
Copy link
Contributor

@s-amann s-amann commented Sep 21, 2023

Which issue this PR addresses:

Fixes https://issues.redhat.com/browse/ARO-4304

What this PR does / why we need it:

Test plan for issue:

Due to the nature of acr token passwords, this will be tested manually

Is there any documentation that needs to be updated for this PR?

Copy link
Collaborator

@dem4gus dem4gus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to merge the existing data in the global pull secret with the new token.

pkg/cluster/acrtoken.go Outdated Show resolved Hide resolved
@s-amann s-amann force-pushed the rotate-openshift-config-pull-secret branch from fc75287 to d4ed0ab Compare September 21, 2023 17:28
@s-amann s-amann dismissed dem4gus’s stale review September 21, 2023 17:29

changes made to merge existing and new data

@s-amann s-amann marked this pull request as ready for review September 21, 2023 17:48
@s-amann s-amann added bug Something isn't working help wanted Extra attention is needed next-up next-release To be included in the next RP release rollout chainsaw Pull requests or issues owned by Team Chainsaw ready-for-review labels Sep 21, 2023
Copy link
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a bug in this controller (and others) surrounding multiple write operations of the same object in a single reconcile loop, and I wonder if we are introducing more of that here on this Secret resource. Generally, every time we modify any resources, we need to end reconciliation to prevent data from going stale/out of date in our reconciliation loop. Ideally we write 1 object at a time to move towards a desired state, then requeue reconciliation to happen again immediately. For example, a StatefulSet reconciliation regarding scaling would only create/delete 1 Pod at a time, even if the scale was off by 4 to begin with; that just means reconciliation is executed 4 times. I see at least 3 write operations on the same Secret in this PR, and I wonder if this will lead to inconsistent states and lots of errors in logs.

@s-amann s-amann dismissed SudoBrendan’s stale review September 21, 2023 19:42

This bug does not apply as we are not in a reconciliation loop on the cluster object, this is done during admin update. The logic in this PR will only call update on the openshift-config secret 1 time (if successful). For the update path we call delete and then create on the secret for 2 write calls.

@SudoBrendan
Copy link
Collaborator

SudoBrendan commented Sep 21, 2023

This bug does not apply as we are not in a reconciliation loop on the cluster object, this is done during admin update. The logic in this PR will only call update on the openshift-config secret 1 time (if successful). For the update path we call delete and then create on the secret for 2 write calls.

I think then, proper controller structure would be something pseudo like:

func Reconcile() {
    if secretDoesNotExist() {
        createSecret()
        return nil, requeue(time.now())
    }
    if secretIsOutOfDate() {
        deleteSecret()
        return nil, requeue(time.now())
    }
}

This loop means we only ever perform a write operation once per reconcile, and we always move closer to our desired state, even if we don't get there immediately.

@bennerv
Copy link
Collaborator

bennerv commented Sep 21, 2023

@SudoBrendan
The logic change isn't a controller. It's included as part of a step in PUCM.

@SudoBrendan
Copy link
Collaborator

@SudoBrendan The logic change isn't a controller. It's included as part of a step in PUCM.

I am blind. ty

@bennerv
Copy link
Collaborator

bennerv commented Sep 21, 2023

@SudoBrendan

In regards to the controller reconciliation failures. We should probably be using a PATCH instead of an Update to the aro cluster CR as during an Update, it's essentially a PUT operation to the resource. A PATCH instead of Update will only update that given field.

I think that would cause less contention from an error log perspective in the ARO operator. Ideally we would use PATCH everywhere unless we want a specific configuration laid down.

@s-amann s-amann force-pushed the rotate-openshift-config-pull-secret branch from 445dc3d to 33de1ab Compare September 21, 2023 20:16
Copy link
Collaborator

@tsatam tsatam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Some things @lranjbar and I noticed during review that might help future reviewers:

  • The logic contained here is duplicated across here and the pullsecret controller in the ARO Operator. There are some differences in behavior (e.g. how a notfound error is handled when getting the existing pull secret). IMO this is fine to merge as is, with a future effort to try and deduplicate these with common logic in a shared third location.
  • The rotateACRTokenPassword function appears to recreate the pull secret if it needs to, using the contents of an available registryProfile within the cluster document. The code does not perform any nil checks and we were initially concerned that this function would cause a nil pointer dereference when attempting to read the username off this profile. After digging through the code further, we believe that all production clusters should have at least one registryProfile present in the clusterdoc, as the above ensureACRToken function, which is called during bootstrap, will ensure that a registryProfile is created if not already present.

Copy link
Contributor

@hlipsig hlipsig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the oustanding feedback has been addressed. No new concerns from me.

@hlipsig hlipsig merged commit 1a35d4e into Azure:master Sep 21, 2023
18 checks passed
@s-amann s-amann deleted the rotate-openshift-config-pull-secret branch September 22, 2023 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chainsaw Pull requests or issues owned by Team Chainsaw help wanted Extra attention is needed next-release To be included in the next RP release rollout next-up ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants