Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oauth-openshift "invalid resource name" error due to not unescaping client_id for service account #136

Closed
bergner opened this issue Oct 13, 2023 · 6 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@bergner
Copy link

bergner commented Oct 13, 2023

When configuring ArgoCD to use Openshift OAuth for authentication the login works but the final code exchange fails like this inside oauth-openshift

I1013 10:29:35.684794       1 httplog.go:131] "HTTP" verb="GET" URI="/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=qtdg2lscsyrq7qlkgutypygxi" latency="90.415571ms" userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.103 Safari/537.36" audit-ID="cac191fd-8164-452d-b1ca-0040fc0a621b" srcIP="10.217.0.1:54130" resp=302
I1013 10:29:35.701586       1 request.go:833] Error in request: invalid resource name "system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server": [may not contain '%']
E1013 10:29:35.701646       1 access.go:177] osin: error=server_error, internal_error=&errors.errorString{s:"invalid resource name \"system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server\": [may not contain '%']"} get_client=error finding client
E1013 10:29:35.701664       1 osinserver.go:111] internal error: invalid resource name "system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server": [may not contain '%']

After much digging I figured out that verbosity level 10 would dump HTTP traffic in the logs and how to increase the log level of oauth-openshift to --v=10 instead of --v=2. What I did was to scale down the authentication-operator Deployment in the openshift-authentication-operator namespace to 0, then manually edited the oauth-openshift Deployment in the openshift-authentication namespace to use --v=10.

I have not figured out what exactly has changed here but having ArgoCD do Openshift OAuth login worked fine for me about 6 months ago. After having looked quite a lot at ArgoCD and ArgoCD Operator it seems to in the end boil down to a problem in Openshift OAuth itself.

Expected result

The final code exchange works ok.

Actual result

As you can see from the log snippet above it looks like the GET request has query parameters URI escaped with a single level of escaping (which is expected and appropriate). But the error makes it seem like oauth-openshift does not unescape the client_id parameter at all.

Questions

  • Can anyone versed in Openshift OAuth confirm that using a service account should provide the fully qualified name, e.g. "system:serviceaccount:NAMESPACE:ACCOUNT"?
  • If not, what is the correct approach here so the ArgoCD developers can be made aware of what they are doing wrong? Keep in mind that this all seemingly worked a few months ago (but I can't swear the request from ArgoCD was identical to above).
  • Is there some reason why the client_id specifically should not be URI escaped by the client making the request? I.e. do you deem it correct behavior that Openshift OAuth should NOT do one level of URI unescaping on the client_id?

Related issues

@bergner
Copy link
Author

bergner commented Oct 13, 2023

I tried reverting this commit from May this year c2d7de5 since I thought that seemed to fit timewise and touched some very much related lines of code, then I rebuilt my own image after also patching the Dockerfile.rhel to use public images rather than Redhat private ones:

diff --git images/Dockerfile.rhel images/Dockerfile.rhel
index 77f64be0..748d54d0 100644
--- images/Dockerfile.rhel
+++ images/Dockerfile.rhel
@@ -1,10 +1,12 @@
-FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.14 AS builder
+FROM registry.ci.openshift.org/openshift/release:golang-1.19 AS builder
+#FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.14 AS builder
 WORKDIR /go/src/github.com/openshift/oauth-server
 COPY . .
 ENV GO_PACKAGE github.com/openshift/oauth-server
 RUN make build --warn-undefined-variables
 
-FROM registry.ci.openshift.org/ocp/4.14:base
+FROM registry.ci.openshift.org/origin/4.12:base
+#FROM registry.ci.openshift.org/ocp/4.14:base
 COPY --from=builder /go/src/github.com/openshift/oauth-server/oauth-server /usr/bin/
 ENTRYPOINT ["/usr/bin/oauth-server"]
 LABEL io.k8s.display-name="OpenShift OAuth Server" \

Finally I tagged and pushed that image into my Openshift CRC's registry and referenced that image in the Deployment for oauth-server. Again I had to resort to a bit of hackery to be able to test in time before the authentication-operator reset things back to the original state by scaling down the authentication operator to 0 replicas, then edit oauth-server deployment, then quickly test.

But I was able to login successfully in ArgoCD when that oauth-server image was deployed and the request logged in the oauth-server log looks the same like the one mentioned before that caused an error.

I1013 15:28:19.821365       1 httplog.go:132] "HTTP" verb="GET" URI="/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=iquexr4kxxdalcxt4kkd2c3tq" latency="37.712037ms" userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.103 Safari/537.36" audit-ID="ad01e5c2-8d4c-492a-a1ba-cf91bd551622" srcIP="10.217.0.1:60668" resp=302
I1013 15:28:19.846193       1 httplog.go:132] "HTTP" verb="GET" URI="/login?then=%2Foauth%2Fauthorize%3Fclient_id%3Dsystem%253Aserviceaccount%253Aargocd%253Aargocd-argocd-dex-server%26redirect_uri%3Dhttps%253A%252F%252Fargocd-server-argocd.apps-crc.testing%252Fapi%252Fdex%252Fcallback%26response_type%3Dcode%26scope%3Duser%253Ainfo%26state%3Diquexr4kxxdalcxt4kkd2c3tq" latency="2.27283ms" userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.103 Safari/537.36" audit-ID="de4ff334-635a-4e11-8156-ff4887287b37" srcIP="10.217.0.1:60668" resp=200

So something definitely changed behavior either directly because of that commit or commits in the https://github.com/openshift/osin library since that was updated in that commit.

@bergner
Copy link
Author

bergner commented Oct 23, 2023

I'm not sure how but some auto-update of the Openshift authentication operator and oauth-openshift seems to have made this problem disappear for me. I haven't found anything specific in the commit logs I have looked at that would explain this but now I do not get the aforementioned errors.

If someone happens to know what the underlying problem and fix was here it would be good to know about it.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2024
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Mar 23, 2024
Copy link
Contributor

openshift-ci bot commented Mar 23, 2024

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

2 participants