Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutating webhook is called but does not install cloud-sql-proxy container #276

Closed
rialg opened this issue Mar 20, 2023 · 14 comments
Closed
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@rialg
Copy link

rialg commented Mar 20, 2023

Expected Behavior

After the creation of a AuthProxyWorkload, the workload is recognized and the mutating admission webhook adds to the pod the missing container, cloud-sql-proxy, to connect to the sql instance.

Actual Behavior

The side-car container is never added.

Steps to Reproduce the Problem

  1. Deploy cloud-sql-proxy-operator 0.3.0
  2. Create the following AuthProxyWorkload
apiVersion: cloudsql.cloud.google.com/v1alpha1
kind: AuthProxyWorkload
metadata:
  name: authproxyworkload-myworkload
  namespace: my-workload-namespace
spec:
  workloadSelector:
    kind: "Deployment"
    name: "my-workload"
  instances:
    - connectionString: "<connection string>"
      unixSocketPath: "/var/run/cloud-sql-proxy/path/to/socket"
  1. Check operator logs
2023-03-20T09:41:25Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-pods", "UID": "917f9cba-4591-44fb-ba80-18875ee4d00c", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-03-20T09:41:25Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-pods", "code": 200, "reason": "", "UID": "917f9cba-4591-44fb-ba80-18875ee4d00c", "allowed": true}
  1. Check AuhtProxyWorkload status
  status:
    WorkloadStatus:
    - conditions:
      - lastTransitionTime: "2023-03-20T09:40:21Z"
        message: No update needed for this workload
        observedGeneration: 1
        reason: UpToDate
        status: "True"
        type: WorkloadUpToDate
      kind: Deployment
      name: my-workload
      namespace: my-workload-namespace
      version: apps/v1
    conditions:
    - lastTransitionTime: "2023-03-20T09:40:21Z"
      message: Reconciled 1 matching workloads complete
      observedGeneration: 1
      reason: FinishedReconcile
      status: "True"
      type: UpToDate
  1. Check the pod for cloud-sql-proxy

Specifications

  • Version: 0.3.0
  • Platform: GKE (version 1.21.14-gke.15800)
@iazunna
Copy link

iazunna commented Mar 27, 2023

I'm also facing the same issue. I can see the /mutate-pods endpoint being called yet, the container isn't being added to the pod.

2023-03-27T16:12:29Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-pods", "UID": "2faff181-f81e-4038-b5ee-cbcc3bb25aa5", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-03-27T16:12:29Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-pods", "code": 200, "reason": "", "UID": "2faff181-f81e-4038-b5ee-cbcc3bb25aa5", "allowed": true}

Logs from the kube-rbac-proxy

I0327 16:11:09.295588       1 main.go:186] Valid token audiences: 
I0327 16:11:09.295671       1 main.go:316] Generating self signed cert as no cert is provided
I0327 16:11:09.892069       1 main.go:366] Starting TCP socket on 0.0.0.0:8443
I0327 16:11:09.892553       1 main.go:373] Listening securely on 0.0.0.0:8443

@hessjcg
Copy link
Collaborator

hessjcg commented Mar 27, 2023

Were the workload pods created by another operator? It may be a duplicate of #244. We are going to release this fix in the next week.

@hessjcg
Copy link
Collaborator

hessjcg commented Mar 29, 2023

Hello @iazunna, We have released preview version v0.4.0. Please give this another try and let me know how it goes. Note this version has some breaking changes. Be sure to check the Release Notes.

@iazunna
Copy link

iazunna commented Mar 31, 2023

Were the workload pods created by another operator? It may be a duplicate of #244. We are going to release this fix in the next week.

Yes, I'm using argocd to manage the resources.

@rialg
Copy link
Author

rialg commented Apr 2, 2023

Hi @hessjcg, thanks for releasing a new version of the operator, which I am yet to test. In the meantime, as a workaround, I am using a standalone Pod to run the cloud-sql-proxy upstream container image. For each instance connection, a ClusterIP Service was created, which the mysql client is using to connect to the DBs (sample YAML definition below).

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    labels:
      app: cloud-sql-proxy
    name: cloud-sql-proxy
    namespace: cloud-sql-proxy
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: cloud-sql-proxy
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: cloud-sql-proxy
      spec:
        containers:
        - args:
          - my-project-id:somewhere:mysql-db1?address=0.0.0.0&port=30016
          - my-project-id:somewhere:mysql-db2?address=0.0.0.0&port=30017
          - --credentials-file=/secrets/cloudsql/credentials.json
          command:
          - /cloud-sql-proxy
          image: eu.gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.2
          imagePullPolicy: IfNotPresent
          name: cloud-sql-proxy
          ports:
          - containerPort: 30016
            name: db1-port
            protocol: TCP
          - containerPort: 30017
            name: db2-port
            protocol: TCP
          resources:
            limits:
              cpu: 150m
              memory: 150Mi
            requests:
              cpu: 100m
              memory: 100Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /secrets/cloudsql
            name: cloudsql-instance-credentials
            readOnly: true
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        volumes:
        - name: cloudsql-instance-credentials
          secret:
            defaultMode: 420
            secretName: cloudsql-instance-credentials
 ---
 apiVersion: v1
  kind: Service
  metadata:
    name: mysql-db1
    namespace: cloud-sql-proxy
  spec:
    clusterIP: <svc-ip>
    clusterIPs:
    - <svc-ip>
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - port: 30016
      protocol: TCP
      targetPort: 30016
    selector:
      app: cloud-sql-proxy
    sessionAffinity: None
    type: ClusterIP
 ---
 apiVersion: v1
  kind: Service
  metadata:
    name: mysql-db2
    namespace: cloud-sql-proxy
  spec:
    clusterIP: <svc-ip>
    clusterIPs:
    - <svc-ip>
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - port: 30017
      protocol: TCP
      targetPort: 30017
    selector:
      app: cloud-sql-proxy
    sessionAffinity: None
    type: ClusterIP

@iazunna
Copy link

iazunna commented Apr 3, 2023

Hello @iazunna, We have released preview version v0.4.0. Please give this another try and let me know how it goes. Note this version has some breaking changes. Be sure to check the Release Notes.

I tried the new version but still getting the same results. I'm testing on a GKE cluster.

@iazunna
Copy link

iazunna commented Apr 3, 2023

Logs to help

2023-04-03T16:22:56Z	INFO	authproxyworkload-resource	default	{"name": "authproxyworkload-sample"}
2023-04-03T16:22:56Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-cloudsql-cloud-google-com-v1-authproxyworkload", "code": 200, "reason": "", "UID": "bc3d167d-9186-4336-8f02-2442ceb8bcaa", "allowed": true}
2023-04-03T16:22:56Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/validate-cloudsql-cloud-google-com-v1-authproxyworkload", "UID": "d34daa4f-4544-4446-a74a-2e4cf0be31a9", "kind": "cloudsql.cloud.google.com/v1, Kind=AuthProxyWorkload", "resource": {"group":"cloudsql.cloud.google.com","version":"v1","resource":"authproxyworkloads"}}
2023-04-03T16:22:56Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/validate-cloudsql-cloud-google-com-v1-authproxyworkload", "code": 200, "reason": "", "UID": "d34daa4f-4544-4446-a74a-2e4cf0be31a9", "allowed": true}
2023-04-03T16:22:56Z	INFO	Added finalizer. Will requeue quickly for reconcile	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "d24ded5f-d8a3-451d-9375-577c8d28f50f", "err": null}
2023-04-03T16:22:56Z	INFO	Reconcile loop started AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "ba5e105e-39be-4b8b-8f89-8ea09c738c5e", "name": {"namespace": "default", "name": "authproxyworkload-sample"}}
2023-04-03T16:22:56Z	INFO	Reconcile add/update for AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "ba5e105e-39be-4b8b-8f89-8ea09c738c5e", "name": "authproxyworkload-sample", "namespace": "default", "gen": 1}
2023-04-03T16:22:56Z	INFO	Reconcile loop started AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "e23b0396-38a9-48f3-9dd7-d6253792d2e6", "name": {"namespace": "default", "name": "authproxyworkload-sample"}}
2023-04-03T16:22:56Z	INFO	Reconcile add/update for AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "e23b0396-38a9-48f3-9dd7-d6253792d2e6", "name": "authproxyworkload-sample", "namespace": "default", "gen": 1}
2023-04-03T16:22:56Z	INFO	Reconcile loop started AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "4bc79d0c-bc95-49f1-9f92-1b4fc35de772", "name": {"namespace": "default", "name": "authproxyworkload-sample"}}
2023-04-03T16:22:56Z	INFO	Reconcile add/update for AuthProxyWorkload	{"controller": "authproxyworkload", "controllerGroup": "cloudsql.cloud.google.com", "controllerKind": "AuthProxyWorkload", "AuthProxyWorkload": {"name":"authproxyworkload-sample","namespace":"default"}, "namespace": "default", "name": "authproxyworkload-sample", "reconcileID": "4bc79d0c-bc95-49f1-9f92-1b4fc35de772", "name": "authproxyworkload-sample", "namespace": "default", "gen": 1}
2023-04-03T16:23:06Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-pods", "UID": "5cabadd7-ae95-433f-8900-612f5f4b1c8a", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-04-03T16:23:06Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-pods", "code": 200, "reason": "no changes to pod", "UID": "5cabadd7-ae95-433f-8900-612f5f4b1c8a", "allowed": true}
2023-04-03T16:24:03Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-pods", "UID": "2a77e64c-6b98-43f0-881a-14c696aef510", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-04-03T16:24:03Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-pods", "code": 200, "reason": "no changes to pod", "UID": "2a77e64c-6b98-43f0-881a-14c696aef510", "allowed": true}
2023-04-03T16:24:07Z	DEBUG	controller-runtime.webhook.webhooks	received request	{"webhook": "/mutate-pods", "UID": "54cd9b52-f332-4127-80e6-5a70501ddd10", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-04-03T16:24:07Z	DEBUG	controller-runtime.webhook.webhooks	wrote response	{"webhook": "/mutate-pods", "code": 200, "reason": "no changes to pod", "UID": "54cd9b52-f332-4127-80e6-5a70501ddd10", "allowed": true}

I believe this "reason": "no changes to pod" is the reason the container isn't being created. Could it be from here?

@ianonavy
Copy link

ianonavy commented Apr 4, 2023

I am also seeing this on GKE v1.23.16-gke.200 with cloud-sql-proxy-operator v0.4.0. I am using ArgoCD as well, and I decided for now to just manually create the cloud-sql-proxy sidecar container.

@hessjcg
Copy link
Collaborator

hessjcg commented Apr 13, 2023

Hi, I'm going to look into this further over the next week. Are all of you experiencing this issue using ArgoCD?

@hessjcg hessjcg added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Apr 13, 2023
@hessjcg
Copy link
Collaborator

hessjcg commented Apr 18, 2023

I'm downgrading this to a P2 because it seems to be a problem with just the ArgoCD operator.

@rialg
Copy link
Author

rialg commented Apr 19, 2023

@hessjcg I am closing the issue because the workaround with a standalone pod for the cloud-sql-proxy adapts better to our use case. Thanks!

@rialg rialg closed this as completed Apr 19, 2023
@enocom enocom reopened this Apr 19, 2023
@enocom
Copy link
Member

enocom commented Apr 19, 2023

Glad to hear you have a workaround. Nonetheless, the Proxy Operator should work with other operators, so I think it's still valuable to figure out what's not working here.

@hessjcg
Copy link
Collaborator

hessjcg commented May 1, 2023

I was unable to reproduce this using operator 0.5.0 with the simple example configuration. I tried these three ways of deploying the example:

  • Writing a helm chart and running helm from my laptop to deploy the simple example AuthProxyWorkload and deployment.
  • Deploying the simple example through plain k8s YAML files in a git repo using ArgoCD.
  • Deploying the helm chart using ArgoCD.

In all three cases, the operator was able to add the sidecar proxy container to the deployment and connect to the database.

Thus, we haven't yet found the root cause of this issue. I will work on improving status reporting (#50) in coming versions of the operator so that hopefully we can narrow down the problem if this happens again.

I'm going to close this for now. If you have more information about your workloads, please comment on this issue.

@hessjcg hessjcg closed this as completed May 1, 2023
@hessjcg
Copy link
Collaborator

hessjcg commented May 5, 2023

I was able to reproduce the part of this issue where pods are created without proxy containers. I am tracking it in #337.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants