Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Check known required permissions for install before installing with the helm applier #1716

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

trgeiger
Copy link
Contributor

@trgeiger trgeiger commented Feb 5, 2025

Adds a check that makes sure the installer service account has the necessary get permissions to get all the resources it will need to inspect for a server-connected dry-run and returns errors detailing all the missing get permissions. This prevents a hung server-connected dry-run getting caught on individual missing get permissions.

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2025
Copy link

netlify bot commented Feb 5, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 7c9a05e
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/67b786b79ea6cf00085e6d2d
😎 Deploy Preview https://deploy-preview-1716--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@trgeiger trgeiger force-pushed the perms-preflight branch 2 times, most recently from 65bc862 to 27e8d4f Compare February 7, 2025 21:08
@trgeiger trgeiger force-pushed the perms-preflight branch 2 times, most recently from d767e20 to 678687c Compare February 13, 2025 23:35
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2025
@trgeiger trgeiger force-pushed the perms-preflight branch 4 times, most recently from c114d4c to 74c93c7 Compare February 14, 2025 20:41
@trgeiger trgeiger changed the title ✨ Add client-only helm dry-run to helm applier ✨ Check known required permissions for install before installing with the helm applier Feb 17, 2025
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 41.59292% with 66 lines in your changes missing coverage. Please review.

Project coverage is 67.73%. Comparing base (7040ee2) to head (ade3961).

Files with missing lines Patch % Lines
...operator-controller/authorization/authorization.go 33.80% 42 Missing and 5 partials ⚠️
internal/operator-controller/applier/helm.go 55.88% 10 Missing and 5 partials ⚠️
cmd/operator-controller/main.go 50.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1716      +/-   ##
==========================================
- Coverage   68.34%   67.73%   -0.61%     
==========================================
  Files          63       64       +1     
  Lines        5117     5228     +111     
==========================================
+ Hits         3497     3541      +44     
- Misses       1390     1446      +56     
- Partials      230      241      +11     
Flag Coverage Δ
e2e 50.40% <9.73%> (-1.23%) ⬇️
unit 55.52% <38.05%> (-0.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

NewForConfig NewForConfigFunc
}

func (acm *AuthClientMapper) GetAuthenticationClient(ctx context.Context, ext *ocv1.ClusterExtension) (authorizationv1client.AuthorizationV1Interface, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (acm *AuthClientMapper) GetAuthenticationClient(ctx context.Context, ext *ocv1.ClusterExtension) (authorizationv1client.AuthorizationV1Interface, error) {
func (acm *AuthClientMapper) GetAuthorizationClient(ctx context.Context, ext *ocv1.ClusterExtension) (authorizationv1client.AuthorizationV1Interface, error) {


type NewForConfigFunc func(*rest.Config) (authorizationv1client.AuthorizationV1Interface, error)

type AuthClientMapper struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type AuthClientMapper struct {
type AuthorizationClientMapper struct {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah I have that in a local change I need to push up, got myself mixed up

@trgeiger trgeiger marked this pull request as ready for review February 19, 2025 20:01
@trgeiger trgeiger requested a review from a team as a code owner February 19, 2025 20:02
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2025
clusterScopedErrs := []error{}
requiredVerbs := []string{"get", "create", "update", "list", "watch", "delete", "patch"}

for _, o := range objects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the object is a ClusterRole or Role, we need to check that we have its rules at the appropriate scope (or the escalate verb on clusterroles or roles)

If the object is a ClusterRoleBinding or a RoleBinding, we need to check that we have the rules present in the referenced ClusterRole/Role (or the bind verb on clusterrolesbindings or rolebindings)

Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound right, yes. We can try do to that work here or have that be a future iteration/improvement in another PR. I'm fine with either. This PR started as just implementing the quick "get" check but along the way realized we can and should just go ahead and check all the verbs above while we check for get so that's why nothing more complicated is happening yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to have this basically immediately because every single bundle we support is going to have RBAC objects in the manifest by virtue of all of this stuff having CRDs and controllers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like using the RBACAuthorizer I linked to in another thread would handle all this for you, fwiw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I've seen libraries in k/k for this. Looking now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@trgeiger trgeiger Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I actually was using the RulesAllow in an earlier version of this but it was causing some import issue @bentito was noticing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that up and use those functions again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around, I also see:

Ultimately though, it seems like the RBACAuthorizer encapsulates everything we'd want to do and is the actual authorizer the apiserver uses. The catch is that we'd have to give ourselves permission to list/watch all RBAC objects in the cluster. Is that a dealbreaker?

Maybe something we should go ask the sig-auth folks: what's better for best effort authz checks? use an in-process RBAC authorizer with cached RBAC from informers, or a flood of SSAR requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By give ourselves, do you mean operator-controller's manager would need list/watch on all RBAC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford @trgeiger was getting at that the point that the import adding RBACAuthorizer brings in seems to normally be k8s.io/kubernetes which is undesirable b/c that depends on several staging-only deps at v0.0.0. I am trying to figure out a more granular import for it to avoid the kitchen sink import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In considering RBACAuthorizer, I think we should probably get more team input on the addition of all the list/watch on all rbac objects in the cluster since that would be a pretty significant expansion of what OLMv1 sees, right? I'll link this thread to the olm-dev chat as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I dive into the RBACAuthorizer, the more I understand the complexity of doing this check ourselves. There are a lot of edge cases. I think we may need to at least start out with using SelfSubjectAccessReview so that we don't have to take on the complexity of handling all of the nuance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it looks like SSAR is not sufficient for checking bindings and role escalations... So for RBAC objects we probably need to do a dry-run create.

trgeiger and others added 3 commits February 19, 2025 14:10
Adds a check that makes sure the installer service account has the
necessary permissions to manage all the resources in the
ClusterExtension payload and returns errors detailing all
the missing permissions. This prevents a hung server-connected
dry-run getting caught on individual missing get permissions as well
returns many other missing required permissions needed for the actual
installation or upgrade.

Signed-off-by: Tayler Geiger <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
clusterScopedErrs := []error{}
requiredVerbs := []string{"get", "create", "update", "list", "watch", "delete", "patch"}

for _, o := range objects {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to have this basically immediately because every single bundle we support is going to have RBAC objects in the manifest by virtue of all of this stuff having CRDs and controllers.


// SelfSubjectRulesReview formats the resource names as lowercase and plural
func sanitizeResourceName(resourceName string) string {
return strings.ToLower(resourceName) + "s"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a safe way to handle this because pluralization is complicated, and CRDs get to define their resource name however they want. Seems like we would need to use a rest mapper to get the actual resource name for the object.


type NewForConfigFunc func(*rest.Config) (authorizationv1client.AuthorizationV1Interface, error)

type AuthorizationClientMapper struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense for AuthorizationClientMapper and its related types and methods to be part of authorization package?
To me they don't seem helm specific and we'll be encapsulating authorization into a separate package.

Also, to avoid situation where we the caller needs to know about authorizationv1client.AuthorizationV1Interface and first obtain the authorization client and then pass it to CheckObjectPermissions I was thinking we could have CheckContentPermissions be a public method of the custom authorization client that we return, which would internally call more generic checkObjectPermissions.

For that, the client in authorization package would have to wrap authorizationv1client.AuthorizationV1Interface and then in this package, we'd just define an interface of the custom authorization client with whatever we need to use here similar to this:

type authorizationClient interface {
    CheckContentPermissions(...)
}

With that we would be encapsulating internals of authorization within a dedicated single package and exposing a clean interface to the callers with the added benefit of having a simple interface to mock on the callers side.

WDYT?

Obviously those are just suggestions, the second one especially.


err = h.checkContentPermissions(ctx, contentFS, authclient, ext)
if err != nil {
return nil, "", err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a good idea to wrap with context, ie. `fmt.Errorf(err, "failed checking content permissions") for better debuggability. Here and other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this change in the bits I introduce in this PR and we can maybe edit the other error returns in helm.go in a separate enhancement PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's actually what I meant - other places introduced by this PR

})
}

resAttrs := []authorizationv1.ResourceAttributes{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could probably initialize it with specific length or at least capacity equal to len(requiredVerbs) * len(objects)

for _, resAttr := range resAttrs {
if !canI(resAttr, rules) {
if resAttr.Namespace != "" {
namespacedErrs = append(namespacedErrs, fmt.Errorf("cannot %s %s %s in namespace %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it would make sense to put the name of the resource and namespace or even all of those string values in quotes by using %q?

}
errs := append(namespacedErrs, clusterScopedErrs...)
if len(errs) > 0 {
errs = append([]error{fmt.Errorf("installer service account %s is missing required permissions", ext.Spec.ServiceAccount.Name)}, errs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: that's not ideal from performance perspective, maybe just do return fmt.Errorf("installer service..., %s, errors.Join(errs...))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, can you re-explain this suggestion? I thought I understood at first glance. Are you suggesting I sub in the errs into the "installer service account %s..." error message with Errorf? Like nested fmt.Errorf(fmt.Sprintf("installer SA %s is missing perms", ext.Spec.ServiceAccountName)) + " %s", errs...)? or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I meant something like fmt.Errorf("installer SA... perms, %w", ext.Spec.ServiceAccountName, errors.Join(errs...))

Alternatively you could start with errs slice already having this "install service account %s" error at first position and then at the end when you see there were no actual errors, simply return a nil error

Move all authorization-related code to the authorization package, rename
anything using 'auth' to 'authorization' to avoid confusion with
authentication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants