-
Notifications
You must be signed in to change notification settings - Fork 20
feature/rbac-2 #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feature/rbac-2 #405
Conversation
Not yet addressed:
@Avantol13 - could you review and comment 🧾 User Story 1: Control Whether Records Are DiscoverableTitle: Configurable Discovery of Indexd Records As a platform operator, Acceptance Criteria
🧾 User Story 2: Global Discovery Authorization ControlTitle: Global Discovery Authz for Indexd Records, Support Discovery Access Independent from File Access As a system administrator, As a data commons architect, Acceptance CriteriaAssuming ARE_RECORDS_DISCOVERABLE=False
📌 Configuration Summary# Whether any records are discoverable at all
ARE_RECORDS_DISCOVERABLE = True # default: True
# Override per-record authz for GET/read
# Only applies to record discovery (not file access)
# If None, use per-record `authz`
GLOBAL_DISCOVERY_AUTHZ = ["/indexd/discovery"] |
In general the comments above look good, thanks for all the detail. This part:
I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out. |
indexd/auth/drivers/alchemy.py
Outdated
resources = self.arborist.auth_mapping() | ||
return resources | ||
|
||
@timed_cache(1800) # Cache for 30 minutes (typical JWT expiration time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't hard-code this because we absolutely cannot have a response cached beyond the expiration. This has to be dynamic based on the expiration of the token. Our security is heavily reliant on the guarantee that the expiration ensures no access beyond that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. requirements
has to be dynamic based on the expiration of the token
Understood. At the same time, previous feedback stated:
Arborist allows no token to be sent on purpose, it allows assignment of anonymous access.
Additionally, AFAIK, no validation of the token occurs now in indexd. ie no calls to authutils.token.validate_jwt()
So, if there is a token:
- 🆕 we can check to ensure it has not expired, use expiry time as ttl
- already being used as a cache key
If there is no token:
- 🆕 use maximum_ttl_seconds as ttl
- 🆕 add authentication header to cache key (for basic and no auth)
Other:
- 🆕 clean up any unused cache entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
@blueprint.errorhandler(Exception) | ||
def handle_uncaught_exception(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this duplicated across routes, can you implement once in a utils and import to reduce code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks. I edited the comment above |
|
@Avantol13 I've addressed all PR items. Please see #405 (comment) for a followup question. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: db driver and how to centrally organize things. Here's my current thinking:
We should, theoretically, be able to move all the new code we need with stateful decisions out of the db driver b/c nothing really needs the db.
Here's my idea:
Put the authz check for discovery in a similar authorize decorator to this
indexd/indexd/auth/__init__.py
Line 9 in b5b198a
def authorize(*p): |
authorize_discovery
and add that decorator everywhere you need. The logic in there should look like this:
- if is_discovery_enabled (check config only)
- Get user's authz (perhaps cache, could even use flask's per-request cache flask.g if that somehow simplifies - I know that won't save beyond the request)
- if config GLOBAL_AUTHZ set
- Check if the GLOBAL_authz is in the user's authz
- if config GLOBAL not set
- Check if user's authz contains records authz (this will require making a db call based on the request's ID)
done. Now we have appropriately denied access pre-blueprint logic with this decorator.
Within the blueprints that need the logic for filtering, now we can implement a shared set of util functions.
Before making a db query:
* if is_discovery_enabled (check config only)
* get user authz (perhaps from cache)
* don't worry about GLOBAL AUTHZ at all b/c we already handled that in the new decorator of this endpoint, so we only get here if they are authorized
* Add filter for records to db query based on user authz
This way, the stateful logic stays out of the db driver and in the request handling (which is where it should be) and we have minimized duplication of code as much as possible.
What do you think?
|
||
return result | ||
|
||
def calculate_ttl(now, token) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we might be overcomplicating this cache. Can't we simply keep the exp in the cache itself and instead of doing ttl math, just invalidate entries before now?
""" | ||
|
||
def decorator(func): | ||
cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only caches on a per instance basis, so this benefit is lessened by automatic routing of requests in multi-instance deployments where it might not route to the same instance (and having multiple instances is typical for indexd due to the load it can receive). I would prefer to see a more robust caching solution shared across instances, but that would require a shared resource and I'm not sure the actual time saved here would outweigh interacting with that shared resource (perhaps it would b/c it requires an arborist query).
Fence has an example of this shared resource -> in memory cache setup but it's a bit more involved... I suppose we can revisit if this per-instance approach isn't performant enough
New Features
adds RBAC to db operations
).add cached call to Arborist
).Adds RBAC config
).Breaking Changes
Bug Fixes
Improve error handling
).Improvements
developer documentation
).test RBAC
).Dependency updates
Deployment changes
RBAC
. When set toTrue
, RBAC enforcement is active for protected records.