Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Slicing using Data Tries #74

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

oflatt
Copy link

@oflatt oflatt commented Jul 29, 2024

@oflatt oflatt force-pushed the oflatt-entity-slicing branch 2 times, most recently from 3eaa6db to f53f2d2 Compare July 29, 2024 18:12
@oflatt oflatt force-pushed the oflatt-entity-slicing branch from f53f2d2 to 0778bef Compare July 29, 2024 18:14
@khieta khieta added the pending This RFC is pending; for definitions see the README label Jul 30, 2024
@oflatt
Copy link
Author

oflatt commented Jul 30, 2024

Code is now available: cedar-policy/cedar#1102

@oflatt oflatt force-pushed the oflatt-entity-slicing branch from 5f579f4 to d33585d Compare July 30, 2024 21:05
Copy link
Contributor

@cdisselkoen cdisselkoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, and I think this is a good idea. Some comments, particularly about the subset of Cedar that this RFC works with.

text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
text/0073-entity-slicing.md Outdated Show resolved Hide resolved
@oflatt oflatt force-pushed the oflatt-entity-slicing branch 2 times, most recently from 56b7f6f to ee696f1 Compare August 1, 2024 17:43
oflatt and others added 11 commits August 1, 2024 10:43
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Co-authored-by: Craig Disselkoen <[email protected]>
Signed-off-by: oflatt <[email protected]>
Signed-off-by: oflatt <[email protected]>
Signed-off-by: oflatt <[email protected]>
@oflatt oflatt force-pushed the oflatt-entity-slicing branch from ee696f1 to 9ba753c Compare August 1, 2024 17:43
@oflatt oflatt mentioned this pull request Aug 1, 2024
Signed-off-by: oflatt <[email protected]>
@jeffsec-aws
Copy link

jeffsec-aws commented Aug 2, 2024

What would it take to answer in Cedar schema?
Can we transform the following answers:

Request type: (User, Read, Document)
Data required:
resource.readers
resource.metadata.owner
principal.ancestors (where ancestors are the ancestors in the entity hierarchy)

On the other hand, if the User would like to Edit a Document, we only need one piece of data:

Request type: (User, Edit, Document)
Data required:
resource.metadata.owner

in the following proposal:

 Request type: (User, Read, Document)
 Data required:
    [
        {
             "resource": "entity Document = { readers: Set<User>, }"
        },
        {
             "resource": "entity Document = { metadata: Metadata, },
              "entities": [
                   "entity Metadata = { owner: User }",
                   "entity User = { }"
              ]
        },
        {
             "principal": "entity User in [UserGroup]",
              "entities": [
                   "entity UserGroup= {  }",
              ]
        },
    ]

 Request type: (User, Edit, Document)
    [
        {
             "resource": "entity Document = { metadata: Metadata, },
              "entities": [
                   "entity Metadata = { owner: User }",
                   "entity User = { }"
              ]
        }
    ]

Response should be able to handle sub element of type: principal, action, entities, and context

@jeffsec-aws
Copy link

Also remphasing that:

entity User in [User];

for :

// allow read if the user is a global admin
permit(
  principal in User::"GlobalAdmin",
  action in [Action::"Read"],
  resource
);

is a cumbersome example. one would prefer using UserGroup for a notion such as GlobalAdmin in the form of:

entity User in [UserGroup];

with policy shaped as:

// allow read if the user is a global admin
permit(
  principal in UserGroup::"GlobalAdmin",
  action in [Action::"Read"],
  resource
);

@jeffsec-aws
Copy link

jeffsec-aws commented Aug 2, 2024

I want to see an example with a when clause having multiple conditions connected with OR or AND operand, i.e.:

// allow documents to be read by anyone in the readers set
permit(
  principal,
  action == Action::"Read",
  resource
)
when
{
  resource.readers.contains(principal) &&
  resource.clearance < 7
};

or

// allow documents to be read by anyone in the readers set
permit(
  principal,
  action == Action::"Read",
  resource
)
when
{
  principal.title == "Security" ||
  principal.clearance >= resource.clearance
};

@jeffsec-aws
Copy link

I want to see an example with an unless clause

@D-McAdams
Copy link
Contributor

As currently proposed, high-scale systems would struggle to use this approach. It has a couple of drawbacks. The first is the necessity for the cache of manifests, which is difficult to maintain at scale when policies change frequently and the volume of policies could potentially be huge. The second is that the manifests are course grained, such as requiring the loading all ancestors of an entity, which can be impractical when the ancestor set is unbounded. Third, it requires extra work on the client-side to piece together the instructions in the manifest with the actual principal/resource in the request in order to generate a concrete plan for data retrieval.

For the situations that I'm personally most familiar with, I'd prefer a different approach. I'd rather give Cedar a policy slice, and the parameters of specific request (principal, action, resource), and Cedar emits a document that tells exactly which data to load for the request.

In addition, this document would be in a well-defined interoperable representation like JSON that we can use as a standard interface to a Policy Information Point (PIP). We give this document to the PIP, and it returns JSON-formatted entity data.

An example manifest may look like this:

  {
    "EntityType": "User",
    "EntityId": "12345",
    "AttributesToLoad": ["name","email","manager"]
    "MembershipsToCheck": [
      {"EntityType": "UserGroup",
       "EntityId": "987654"},
      {"EntityType": "UserGroup",
       "EntityId": "987654"}
    ]
  },
  {
    "EntityType": "File",
    "EntityId": "image.jpg",
    "AttributesToLoad": ["size","type","lastUpdated"]
    "MembershipsToCheck": [
      {"EntityType": "Folder",
       "EntityId": "2902982"},
      {"EntityType": "Folder",
       "EntityId": "6297822"}
    ]
  }
]

If policies have multiple levels of references, like principal.manager.manager.location, I'd want to iterate multiple times, having Cedar tell me everything to load for Level 1, and then everything for Level 2, and so on, until complete.

To tie it all together, I would definitely use an API that returned an enum of either (1) the concrete authorization result, or (2) the necessary data to arrive at a concrete result. I would loop on this API and gather data until a concrete result is achieved, or the max level is exceeded.

@jeffsec-aws
Copy link

That is a better proposition of what I tried to propose with #74 (comment) @D-McAdams . Going a Cedar formatted answer is the way. Now:

  1. I think we are still missing which part falls where. With your proposal the example is clear of what is the Principal and what is the Resource. But it might be less implicit in plenty of other cases, therefore we need to have hints onto that.

  2. On the Iteration, I am not sure Iteration is a good approach. Slice is extrapolated from one policy that could apply. In this case:

permit(
  principal is User,
  action == Action::"ReadFile",
  resource is File
) when {
  principal.manager.manager.location == "Minas Tirith"
};

Therefore the slice answer should ask to resolve the whole equation. Having a first response only solving the Level 1 of condition, will make the next request for a slice more complex to evaluate. And in any cases, the isAuthorized() request will have to be perform in one call to match this policy.

  1. The return slice might not have an entity id to specify and the format needs to encompass for that.

Copy link
Contributor

@khieta khieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write up! I think the overall idea makes sense, although I do wonder whether this should be in the Cedar library vs. in a separate tool. Will take a look at the implementation PR next.

text/0074/0074-entity-slicing.md Outdated Show resolved Hide resolved
text/0074/0074-entity-slicing.md Outdated Show resolved Hide resolved
text/0074/0074-entity-slicing.md Outdated Show resolved Hide resolved
text/0074/0074-entity-slicing.md Show resolved Hide resolved

As written, entity manifests in this RFC do not support loading only parts of a Cedar Set, or only some of the ancestors in the entity hierarchy. This is because sets are loaded on the leaves of the access trie, with no way to specify which elements are requested.

To support this feature, we recommend that we take a constraint-based approach. Constraints would be attached to nodes in the access trie, specifying which elements of sets or the parent hierarchy are needed. Constraints form a small query language, so this would increase the complexity of the entity manifest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 76 provides some additional ideas about how to support partial loading of ancestors. Do you still think the constraint-based approach is the right one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not a big fan of the @ancestors annotation proposed- I think a more automatic solution like constraint annotations would be better.
As for partial evaluation, it would still make entity manifests more precise.

Comment on lines +333 to +336
Action::"CreateDocument"
[
context.is_authenticated
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we expect entity manifests to be written and consumed by machines (right?), it might make more sense to use a structured JSON format. Our experience with writing parsers for human-readable formats that support comments has shown that they are easy to mess up 😅

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented both a JSON format and a human-readable format. The human-readable format re-uses the parser for cedar expression so it's not super error prone.

Personally, I've found the human-readable format useful for understanding what's going on. For example, a human can diff a new entity manifest with an old one to understand how a new policy changes the manifest.

@D-McAdams
Copy link
Contributor

That is a better proposition of what I tried to propose with #74 (comment) @D-McAdams . Going a Cedar formatted answer is the way. Now:

  1. I think we are still missing which part falls where. With your proposal the example is clear of what is the Principal and what is the Resource. But it might be less implicit in plenty of other cases, therefore we need to have hints onto that.
  2. On the Iteration, I am not sure Iteration is a good approach. Slice is extrapolated from one policy that could apply. In this case:
permit(
  principal is User,
  action == Action::"ReadFile",
  resource is File
) when {
  `principal.manager.manager.location` == "Minas Tirith"
};

Therefore the slice answer should ask to resolve the whole equation. Having a first response only solving the Level 1 of condition, will make the next request for a slice more complex to evaluate. And in any cases, the isAuthorized() request will have to be perform in one call to match this policy.

  1. The return slice might not have an entity id to specify and the format needs to encompass for that.

Couple reasons I think the simple approach is preferable and iteration is OK.

One is that no matter what we design here, any real-world data gatherer will be looping and iterating. For example, given a policy that refers to principal.manager.manager.location, the machinery loading the data from a database has to ping back and forth with the database 3 times. First to look up the UID value for principal.manager. Then, to look up the UID value for principal.manager.manager, then to look up the string value of principal.manager.manager.location. As currently proposed, the RFC makes the customer work harder to figure out the execution plan from paths. I don't want to work hard.

It's going to be much easier for me as a customer of Cedar to say "Here's everything I know, tell me exactly what to load", and then repeat until there's nothing left to load. It's a dirt-simple code path I can run in a loop.

Such an approach also makes it consistent on how to load entity information, regardless of whether the entities are the principal/resource, or some other entity referenced in a policy, or some other entity literal. As a data gatherer, I simply don't care. Cedar tells me which entities to load, and I just go load them. This makes it dirt-simple to make a generic data loader.

@shaobo-he-aws
Copy link
Contributor

One is that no matter what we design here, any real-world data gatherer will be looping and iterating. For example, given a policy that refers to principal.manager.manager.location, the machinery loading the data from a database has to ping back and forth with the database 3 times. First to look up the UID value for principal.manager. Then, to look up the UID value for principal.manager.manager, then to look up the string value of principal.manager.manager.location.

I agree that iterating works for most if not all DBs. That being said, this can be done via one query in a relational DB with JOINs?

@jeffsec-aws
Copy link

Also while I agree that getting data out of the PIP might need round trips, getting out of the policy store of the PDP like here does not. The information needed is stored in single dedicated objects (policies) which are extrapolated into what is missing.

@shaobo-he-aws
Copy link
Contributor

For the situations that I'm personally most familiar with, I'd prefer a different approach. I'd rather give Cedar a policy slice, and the parameters of specific request (principal, action, resource), and Cedar emits a document that tells exactly which data to load for the request.

I think such document format should merit a RFC.

An example manifest may look like this:

  {
    "EntityType": "User",
    "EntityId": "12345",
    "AttributesToLoad": ["name","email","manager"]
    "MembershipsToCheck": [
      {"EntityType": "UserGroup",
       "EntityId": "987654"},
      {"EntityType": "UserGroup",
       "EntityId": "987654"}
    ]
  },
  {
    "EntityType": "File",
    "EntityId": "image.jpg",
    "AttributesToLoad": ["size","type","lastUpdated"]
    "MembershipsToCheck": [
      {"EntityType": "Folder",
       "EntityId": "2902982"},
      {"EntityType": "Folder",
       "EntityId": "6297822"}
    ]
  }
]

IIUC, entities in the MembershipsToCheck are the RHS of in operations. If so, the executor of this document essentially evaluate these operations, which are later evaluated again by the Cedar evaluator. Moreover, we have to perform a cheap analysis on the policy slice to have this kind of information, right?

Copy link

@patjakdev patjakdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like this idea as it provides a neat way to do a just-in-time batched retrieval of entity data relevant to a particular type of request.

I'm curious why such a similar approach for finding the entities to load given a fully formed PARC wasn't included in the proposal.

Comment on lines +154 to +158
pub struct RequestType {
pub principal: EntityType,
pub action: EntityUID,
pub resource: EntityType,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Why not allow EntityUIDs to be passed for the principal and resource?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RequestType is used to index the entity manifest. When it is computed, the actual principal and resource are not known (but the types are). Action ids are known since there are a finite number.

text/0074/0074-entity-slicing.md Outdated Show resolved Hide resolved
text/0074/0074-entity-slicing.md Outdated Show resolved Hide resolved
@oflatt
Copy link
Author

oflatt commented Aug 26, 2024

As currently proposed, high-scale systems would struggle to use this approach. It has a couple of drawbacks....

Hi @D-McAdams, sorry for the late reply (I was out for a few weeks). You make some good points about the drawbacks of this proposal that I hope to address.

  1. I agree that maintaining an entity manifest when policies are numerous and frequently change is a problem. Luckily, I think this proposal is compatible with the solution you outlined. Given a policy slice, the Entity Manifest can be computed dynamically with a single pass over all the policies. Then, you can use it as a "document" that tells exactly what data to load for the request.

  2. Yes, as proposed in this RFC manifests are course-grained in terms of the ancestors required and elements of sets. I'll be working on this problem for the next few weeks. My guess is that it could be solved by annotating the entity manifest further, treating ancestors specially in the API.

  3. I think the concrete plan of how to load the entities from the database is solved by my SimplifiedEntityLoader API. As you said, there is a loop that tells the user the necessary data to arrive at a concrete result. When all the data is retrieved, it stops and the authorization can proceed. I also think that we could expand on this API to make it more precise, giving details on which fields are required for each entity. We've also been thinking about designing algorithms that efficiently make use of this API to batch as many entities together as possible.

As for a JSON format, entity manifests map easily to JSON objects, and they look similar to what you have written.

Let me know what you think- perhaps I've missed something. In my mind, the entity manifests as proposed in this RFC are the most general form of entity slicing, and other solutions can be implemented on top.

@oflatt
Copy link
Author

oflatt commented Aug 26, 2024

@jeffsec-aws asks:

What would it take to answer in Cedar schema?

That's a good question. Cedar schemas and entity manifest look quite similar, but entity manifests represent data paths through multiple types. While I think it's possible to represent an entity manifest as a cedar schema as you showed, it requires encoding the data somehow. In other words, any encoding using Cedar schemas would abuse the format.

oflatt and others added 2 commits August 26, 2024 10:16
Co-authored-by: Kesha Hietala <[email protected]>
Signed-off-by: oflatt <[email protected]>
Signed-off-by: oflatt <[email protected]>
@oflatt oflatt force-pushed the oflatt-entity-slicing branch from eef7b00 to d7ff80f Compare August 26, 2024 17:17
oflatt added 4 commits August 26, 2024 10:20
Signed-off-by: oflatt <[email protected]>
Signed-off-by: oflatt <[email protected]>
Signed-off-by: oflatt <[email protected]>
@oflatt
Copy link
Author

oflatt commented Aug 28, 2024

@patjakdev asks:

I'm curious why such a similar approach for finding the entities to load given a fully formed PARC wasn't included in the proposal.

That's a good question. Having a fully formed PARC and having all the policies on hand does let you be a little more precise. You can do partial evaluation of the policies, get an entity manifest based on that, then do entity loading.

@khieta khieta added pre-acceptance-experimental This RFC is pre-acceptance experimental; for definitions see the README and removed pending This RFC is pending; for definitions see the README labels Sep 11, 2024
@oflatt
Copy link
Author

oflatt commented Sep 16, 2024

@D-McAdams This PR implements an API that is similar to what you have outlined above:
cedar-policy/cedar#1208

Users can compute an entity manifest from a policy slice, then implement an EntityLoader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pre-acceptance-experimental This RFC is pre-acceptance experimental; for definitions see the README
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants