Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard Label for Excluding Pod from Identity Mounting #215

Closed
DerekTBrown opened this issue Mar 11, 2024 · 6 comments
Closed

Standard Label for Excluding Pod from Identity Mounting #215

DerekTBrown opened this issue Mar 11, 2024 · 6 comments

Comments

@DerekTBrown
Copy link
Contributor

DerekTBrown commented Mar 11, 2024

What would you like to be added?

It would be useful to have a standard label that excludes pods from being processed by the identity webhook.

Proposed Change: #216

Why is this needed?

  1. Removing Cyclical Dependencies - When installing a CNI and the Pod Identity Webhook onto our clusters, we identified a cyclical dependency; the identity webhook needs the network/CNI to be successfully operating to be able to communicate with the backend service, and the CNI needs the identity webhook to be functioning in order to schedule pods. The CNI doesn't rely on Amazon identities, so having a mechanism to skip the identity webhook for some pods (including the CNI) would remove this cyclical dependency.

  2. Performance/Reliability - Having the ability to exclude pods that don't need Amazon identities can reduce load on the webhook and improve pod launching performance. This also removes a critical dependency for services that need to be resilient to failure.

  3. Standardization is key to compatibility- Having a canonical tag for skipping the identity webhook will make it easier for downstream projects to exclude critical components (such as CNIs) from the webhook, leading to a better out-of-the-box experience.

@jeevanions
Copy link

Can't we simply avoid this by not using the Service Account that is enabled for Pod Identity webhook?

@DerekTBrown
Copy link
Contributor Author

Can't we simply avoid this by not using the Service Account that is enabled for Pod Identity webhook?

I am not sure I understand your suggestion. How do Service Accounts fit into the mutating webhook flow?

@fungusakafungus
Copy link

What exactly prevents CNI from coming up? This webhook sets failurePolicy: Ignore
https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/deploy/mutatingwebhook.yaml#L10
So I'd expect that it will not prevent other pods form starting if it is itself down.

@DerekTBrown
Copy link
Contributor Author

What exactly prevents CNI from coming up? This webhook sets failurePolicy: Ignore

At least in our case, we rely on kops to install the webhook, which sets this to Fail:

https://github.com/kubernetes/kops/blob/5befa3fb96f35692411772152e2d1c22850dbc90/upup/models/cloudup/resources/addons/eks-pod-identity-webhook.addons.k8s.io/k8s-1.16.yaml.template#L157

Isn't this a kops issue then?

IMO, eks-pod-identity-webhook should expect downstream consumers to change the webhook to use a Fail policy, and support features to enable this usecase.

Having a webhook with the failurePolicy: Ignore shifts the failure mode from an infra-level failure (where infrastructure/platform teams can detect and remediate the actual issue) to a service-level failure (where now individual services are responsible for detecting that they don't have adequate information, and then escalating this failure to the infrastructure/platform team actually responsible for the webhook). This is an undesirable behavior for many (if not most) users.

@DerekTBrown
Copy link
Contributor Author

DerekTBrown commented Jun 20, 2024

What exactly prevents CNI from coming up? This webhook sets failurePolicy: Ignore

One other thing I realized I should add: I could see a case as well where a customer wants to disable the Pod Identity Webhook as a performance optimization, independent of the failure mode. For example, if leveraging DoEKS, they may not need IAM roles on Spark worker pods.

@modulitos
Copy link
Contributor

Looks like this issue was resolved in the proposed PR linked above (#216)

I appreciate the feedback on failurePolicy: Ignore - looks like it was changed from the default value of Fail to Ignore here: https://github.com/aws/amazon-eks-pod-identity-webhook/pull/3/files
We can look into the tradeoffs of changing it back, but it's a separate issue.

Marking this issue as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants