Skip to content

Commit 0a4bffe

Browse files
committed
Adds RBAC - ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ
1 parent 3db932e commit 0a4bffe

24 files changed

+1979
-76
lines changed

docs/rbac.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
2+
# RBAC in Indexd
3+
4+
## **Problem:**
5+
As an indexd or DRS user, when I list objects, I only expect to see items that belong to projects I have access to.
6+
7+
## **Solution:**
8+
9+
Assuming a Bearer token is included on the request, I expect indexd to query arborist, extract the projects I have access to and add those as an "authz" filter when querying the database. If a token is not present, I expect that index will still query Arborist to retrieve default permissions. These calls should be cached for performance.
10+
11+
A feature flag should control this query injection, the flag should default to FALSE, as this will improve chances of getting a PR approved. All current unit tests should pass. Additional unit tests should confirm behavior.
12+
13+
## **Alternatives:**
14+
We could have a RBAC aware proxy front end indexd - however will add complexity and processing overhead
15+
16+
## **Context:**
17+
18+
The indexd service is used to manage and serve metadata about data objects, such as files in a data repository. It currently does not enforce any access control on the objects it serves, which means that any user can see all objects regardless of their permissions.
19+
20+
Main [auth code](https://github.com/uc-cdis/indexd/blob/fb21317f2bc72ad9b0ea143fe9388122f59d10f4/indexd/auth/drivers/alchemy.py#L37-L36) has two methods `auth` and `authz`. The [indexd.authorize](https://github.com/uc-cdis/indexd/blob/0859c639f99a7cbce0a0cd15564ed9847814a5ff/indexd/auth/__init__.py#L10) method checks if Basic auth header is present auth is called otherwise authz is called. The revproxy gateway injects this header [here](https://github.com/uc-cdis/gen3-helm/blob/9ccd25c3e4c40f87f750883802ece5866cdfbc24/helm/revproxy/gen3.nginx.conf/indexd-service.conf#L41-L53) This reliance on Basic auth is concerning and it's rationale is undocumented. It appears that it is not used for either create or read based on [client API](https://github.com/uc-cdis/indexclient/blob/master/indexclient/client.py)
21+
22+
**Approach:**
23+
Add code to [get_index](https://github.com/uc-cdis/indexd/blob/b6ec68f15a8bb61e99c0daf3f6af729691f213c7/indexd/index/blueprint.py#L60) to call [auth_mapping](https://github.com/uc-cdis/gen3authz/blob/master/src/gen3authz/client/arborist/base.py#L286)
24+
and inject resources (projects) into query.
25+
- [x] skip if feature flag not enabled
26+
- [x] call arborist with token to get resources for user, or without token to get default resources
27+
- [x] Cache arborist results for 30 minutes (typical JWT token lifetime)
28+
- [x] update dependency gen3authz as latest version includes token as parameter (as an alternative to username)
29+
- [x] use [mock_arborist_requests](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/tests/conftest.py#L230) pytest fixture
30+
- [x] ensure all existing tests pass
31+
- [x] add new tests specific to RBAC
32+
- [x] Add feature flag to [default_settings](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/indexd/default_settings.py)
33+
- [x] Remove extraneous logging and debugging code
34+
- [x] Ensure ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ See [discussion](https://github.com/uc-cdis/indexd/pull/400#discussion_r2243579240)
35+
- [ ] Add a corresponding feature flag to helm chart
36+
37+
---
38+
39+
## Implementation Overview
40+
41+
* Main changes were made to:
42+
* indexd/auth
43+
* indexd/index/drivers/alchemy.py
44+
45+
* All the changes above:
46+
* should be transparent to the user, and they should not notice any difference in behavior.
47+
* should be non-breaking, as it only changes the behavior when the `authz` parameter is empty.
48+
* However, it will throw a 401/403 is the user does not have access to the requested resource,or does not have and Authorization header which is a change from the previous behavior where it would return all the records regardless of the user's access.
49+
50+
* "Breaking" Changes:
51+
* In order to enforce authorization, we need to ensure that all records have an `authz` field.
52+
* (This is not a change in behavior to OHSU/ACED/Calypr, but it is a change in behavior to the Indexd API in that effectively authz is mandatory on write)
53+
54+
* Misc:
55+
* Added stack traces to log for unhandled exceptions see changes to blueprint.py for various endpoints
56+
57+
## **Testing `tests/rbac`**
58+
59+
The `tests/rbac` suite is designed to validate RBAC-aware behavior in the indexd service, while ensuring the stability and integrity of legacy functionality. The following principles guide its architecture:
60+
61+
- **Preservation of Existing Tests:**
62+
All pre-existing tests are retained without modification to guarantee backward compatibility and to ensure that legacy functionality remains unaffected by the introduction of RBAC features.
63+
64+
- **Comprehensive Endpoint Coverage:**
65+
New tests are introduced to exercise RBAC logic across a wide set of API endpoints (e.g., list, read, write, update, delete). This ensures that authorization checks are consistently enforced and that the feature flag, token handling, and resource filtering behave as intended in every context.
66+
67+
- **Parameterized Test Design:**
68+
Parameterized tests are used to efficiently cover combinations of public, controlled, and private `authz` resources, as well as users with and without tokens. This approach ensures all relevant access patterns are validated, including edge cases, without duplicating test logic.
69+
70+
- **Mocked Authorization Backend:**
71+
Arborist responses are mocked to provide deterministic and isolated test scenarios, enabling reliable validation of access control logic without external dependencies.
72+
73+
## **Configuration:**
74+
Tests verify both enabled and disabled states of the RBAC feature flag, confirming that the system defaults to legacy behavior unless explicitly configured otherwise.
75+
76+
* `ARE_RECORDS_DISCOVERABLE`
77+
78+
- **Type:** `bool`
79+
- **Default:** `True`
80+
- **Description:**
81+
Controls whether any records in IndexD are discoverable via search or listing endpoints.
82+
If set to `False`, all records are hidden from discovery, regardless of their individual authorization settings.
83+
Note: Role-Based Access Control (RBAC) is not enabled by default.
84+
85+
* `GLOBAL_DISCOVERY_AUTHZ`
86+
87+
- **Type:** `list` or `None`
88+
- **Default:** `[]`
89+
- **Description:**
90+
Overrides per-record authorization for GET/read operations during record discovery.
91+
If set to a list of authorization requirements, these are applied globally to all records for discovery purposes.
92+
If set to `None`, the system uses each record's individual `authz` field for authorization checks.
93+
This setting does not affect file access permissions, only record discovery.
94+
95+
This approach ensures robust coverage of the new RBAC functionality while maintaining the integrity and reliability of the existing test suite.

indexd/alias/blueprint.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
import re
2+
import sys
3+
import traceback
4+
5+
import cdislogging
26
import flask
37
import jsonschema
48

@@ -12,7 +16,9 @@
1216
from .errors import NoRecordFound
1317
from .errors import MultipleRecordsFound
1418
from .errors import RevisionMismatch
19+
from .. import utils
1520

21+
logger = cdislogging.get_logger(__name__)
1622

1723
blueprint = flask.Blueprint("alias", __name__)
1824

@@ -159,11 +165,13 @@ def handle_multiple_records_error(err):
159165

160166
@blueprint.errorhandler(UserError)
161167
def handle_user_error(err):
168+
logger.error(err, exc_info=True)
162169
return flask.jsonify(error=str(err)), 400
163170

164171

165172
@blueprint.errorhandler(AuthError)
166173
def handle_auth_error(err):
174+
logger.error(err, exc_info=True)
167175
return flask.jsonify(error=str(err)), 403
168176

169177

@@ -172,6 +180,15 @@ def handle_revision_mismatch(err):
172180
return flask.jsonify(error=str(err)), 409
173181

174182

183+
@blueprint.errorhandler(Exception)
184+
def handle_uncaught_exception(err):
185+
"""
186+
Handle uncaught exceptions.
187+
Delegate to utils.handle_uncaught_exception
188+
"""
189+
return utils.handle_uncaught_exception(err)
190+
191+
175192
@blueprint.record
176193
def get_config(setup_state):
177194
config = setup_state.app.config["ALIAS"]

indexd/auth/__init__.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
from flask import current_app
44
from flask import request
55

6-
from .errors import AuthError
6+
from indexd.auth.errors import AuthError, AuthzError
7+
from indexd.errors import UserError
78

89

910
def authorize(*p):
@@ -20,8 +21,8 @@ def authorize(*p):
2021

2122
@wraps(f)
2223
def check_auth(*args, **kwargs):
23-
if not request.authorization:
24-
raise AuthError("Username / password required.")
24+
if not request.authorization.parameters.get("username"):
25+
raise AuthError(f"Basic auth Username / password required. {request.authorization}")
2526
current_app.auth.auth(
2627
request.authorization.parameters.get("username"),
2728
request.authorization.parameters.get("password"),
@@ -31,13 +32,20 @@ def check_auth(*args, **kwargs):
3132

3233
return check_auth
3334
else:
34-
method, resources = p
35+
method, resources_ = p
3536
if request.authorization and request.authorization.type == "basic":
3637
current_app.auth.auth(
3738
request.authorization.parameters.get("username"),
3839
request.authorization.parameters.get("password"),
3940
)
4041
else:
41-
if not isinstance(resources, list):
42-
raise UserError(f"'authz' must be a list, received '{resources}'.")
43-
current_app.auth.authz(method, list(set(resources)))
42+
if not isinstance(resources_, list):
43+
raise UserError(f"'authz' must be a list, received '{resources_}'.")
44+
return current_app.auth.authz(method, list(set(resources_)))
45+
46+
47+
def resources():
48+
"""
49+
Returns a list of resources the user has access to. Uses Arborist if available.
50+
"""
51+
return current_app.auth.resources()

indexd/auth/driver.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,11 @@ def delete(self, username):
4343
Raises AuthError if user doesn't exist.
4444
"""
4545
raise NotImplementedError("TODO")
46+
47+
@abc.abstractmethod
48+
def resources(self):
49+
"""
50+
Returns a list of resources the user has access to. Uses Arborist if available.
51+
Raises AuthError if the user doesn't exist.
52+
"""
53+
raise NotImplementedError("TODO")

indexd/auth/drivers/__init__.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
import functools
2+
import time
3+
4+
import flask
5+
import jwt
6+
7+
8+
def request_auth_cache(maximum_ttl_seconds=1800):
9+
"""
10+
Decorator to cache the result of a function for a specified maximum TTL in seconds.
11+
The actual cache duration is determined by the 'token' parameter's expiration.
12+
If no token is provided, the maximum TTL is used and the Authorization header is included in the cache key.
13+
"""
14+
15+
def decorator(func):
16+
cache = {}
17+
18+
@functools.wraps(func)
19+
def wrapper(*args, **kwargs):
20+
key = functools._make_key(args, kwargs, typed=False)
21+
now = time.time()
22+
23+
# Extract token from args or kwargs
24+
token = kwargs.get("token")
25+
if token is None:
26+
# print("No token provided in kwargs")
27+
if type(args[0]) is str:
28+
# If the first argument is a string, assume it's the token
29+
token = args[0]
30+
else:
31+
token = args[1] if len(args) > 1 else None
32+
33+
# Calculate token expiration duration
34+
if token:
35+
# Decode the JWT token without verifying the signature to get the 'exp' claim
36+
# If the token is a string, decode it
37+
token = token.encode('utf-8') if isinstance(token, str) else token
38+
39+
# we could check for jwt.exceptions.DecodeError here, but we assume the token is valid
40+
# and just decode it to get the expiration time
41+
payload = jwt.decode(token, options={"verify_signature": False})
42+
43+
exp = payload.get("exp", now + maximum_ttl_seconds)
44+
token_ttl = max(0, exp - now)
45+
else:
46+
# If no token is provided, use the maximum TTL and add the Authorization header to the key.
47+
# This is useful for cases where the function does not require a token,
48+
# but still needs to cache based on the Authorization header.
49+
auth_header = flask.request.headers.get('Authorization', '')
50+
# Add the Authorization header to the key
51+
key = functools._make_key(args + (auth_header,), kwargs, typed=False)
52+
token_ttl = maximum_ttl_seconds
53+
54+
ttl = min(token_ttl, maximum_ttl_seconds)
55+
56+
# Check if the result is already cached and still valid
57+
if key in cache:
58+
result, timestamp = cache[key]
59+
if now - timestamp < ttl:
60+
return result
61+
62+
# If not cached or expired, call the function and cache the result
63+
result = func(*args, **kwargs)
64+
cache[key] = (result, now)
65+
66+
# Clean up any old cache entries
67+
keys_to_delete = [k for k, (v, t) in cache.items() if now - t >= ttl]
68+
for k in keys_to_delete:
69+
del cache[k]
70+
71+
return result
72+
73+
return wrapper
74+
75+
return decorator

indexd/auth/drivers/alchemy.py

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from sqlalchemy.ext.declarative import declarative_base
1212

1313
from indexd.auth.driver import AuthDriverABC
14+
from indexd.auth.drivers import request_auth_cache
1415

1516
from indexd.auth.errors import AuthError, AuthzError
1617

@@ -133,14 +134,13 @@ def authz(self, method, resource):
133134
try:
134135
# A successful call from arborist returns a bool, else returns ArboristError
135136
try:
136-
authorized = self.arborist.auth_request(
137-
get_jwt_token(), "indexd", method, resource
138-
)
137+
authorized = self.cached_auth_request(get_jwt_token(), "indexd", method, resource)
139138
except Exception as e:
140139
logger.error(
141140
f"Request to Arborist failed; now checking admin access. Details:\n{e}"
142141
)
143142
authorized = False
143+
144144
if not authorized:
145145
# admins can perform all operations
146146
is_admin = self.arborist.auth_request(
@@ -157,7 +157,53 @@ def authz(self, method, resource):
157157
"The indexd admin '/programs' logic is deprecated. Please update your policy to '/services/indexd/admin'"
158158
)
159159
if not is_admin:
160-
raise AuthError("Permission denied.")
160+
raise AuthError("Permission denied. (not is_admin)")
161161
except Exception as err:
162162
logger.error(err)
163163
raise AuthzError(err)
164+
165+
def resources(self):
166+
"""
167+
Returns a list of resources for the given user.
168+
"""
169+
if not self.arborist:
170+
raise AuthError(
171+
"Arborist is not configured; cannot perform authorization check"
172+
)
173+
token = get_jwt_token()
174+
try:
175+
resources = self.caching_auth_mapping(token)
176+
return resources
177+
except Exception as err:
178+
raise AuthError(
179+
"Failed to get resources from Arborist. Please check your Arborist configuration."
180+
)
181+
182+
@request_auth_cache() # cache the result of the auth request
183+
def caching_auth_mapping(self, token):
184+
"""
185+
Returns a list of resources the user has access to.
186+
Uses Arborist if available.
187+
If a token is provided, it will use that token to get the auth mapping.
188+
If no token is provided, it will use the default auth mapping.
189+
"""
190+
if token:
191+
resources = self.arborist.auth_mapping(
192+
jwt=token
193+
)
194+
else:
195+
resources = self.arborist.auth_mapping()
196+
return resources
197+
198+
@request_auth_cache() # cache the result of the auth request
199+
def cached_auth_request(self, token, service, method, resource):
200+
"""
201+
Makes an authenticated request to Arborist and caches the result.
202+
This method is used to check if the user has access to a specific resource
203+
with a specific method.
204+
"""
205+
return self.arborist.auth_request(
206+
token, service, method, resource
207+
)
208+
209+

0 commit comments

Comments
 (0)