Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/rbac.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@

# RBAC in Indexd

## **Problem:**
As an indexd or DRS user, when I list objects, I only expect to see items that belong to projects I have access to.

## **Solution:**

Assuming a Bearer token is included on the request, I expect indexd to query arborist, extract the projects I have access to and add those as an "authz" filter when querying the database. If a token is not present, I expect that index will still query Arborist to retrieve default permissions. These calls should be cached for performance.

A feature flag should control this query injection, the flag should default to FALSE, as this will improve chances of getting a PR approved. All current unit tests should pass. Additional unit tests should confirm behavior.

## **Alternatives:**
We could have a RBAC aware proxy front end indexd - however will add complexity and processing overhead

## **Context:**

The indexd service is used to manage and serve metadata about data objects, such as files in a data repository. It currently does not enforce any access control on the objects it serves, which means that any user can see all objects regardless of their permissions.

Main [auth code](https://github.com/uc-cdis/indexd/blob/fb21317f2bc72ad9b0ea143fe9388122f59d10f4/indexd/auth/drivers/alchemy.py#L37-L36) has two methods `auth` and `authz`. The [indexd.authorize](https://github.com/uc-cdis/indexd/blob/0859c639f99a7cbce0a0cd15564ed9847814a5ff/indexd/auth/__init__.py#L10) method checks if Basic auth header is present auth is called otherwise authz is called. The revproxy gateway injects this header [here](https://github.com/uc-cdis/gen3-helm/blob/9ccd25c3e4c40f87f750883802ece5866cdfbc24/helm/revproxy/gen3.nginx.conf/indexd-service.conf#L41-L53) This reliance on Basic auth is concerning and it's rationale is undocumented. It appears that it is not used for either create or read based on [client API](https://github.com/uc-cdis/indexclient/blob/master/indexclient/client.py)

**Approach:**
Add code to [get_index](https://github.com/uc-cdis/indexd/blob/b6ec68f15a8bb61e99c0daf3f6af729691f213c7/indexd/index/blueprint.py#L60) to call [auth_mapping](https://github.com/uc-cdis/gen3authz/blob/master/src/gen3authz/client/arborist/base.py#L286)
and inject resources (projects) into query.
- [x] skip if feature flag not enabled
- [x] call arborist with token to get resources for user, or without token to get default resources
- [x] Cache arborist results for 30 minutes (typical JWT token lifetime)
- [x] update dependency gen3authz as latest version includes token as parameter (as an alternative to username)
- [x] use [mock_arborist_requests](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/tests/conftest.py#L230) pytest fixture
- [x] ensure all existing tests pass
- [x] add new tests specific to RBAC
- [x] Add feature flag to [default_settings](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/indexd/default_settings.py)
- [x] Remove extraneous logging and debugging code
- [x] Ensure ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ See [discussion](https://github.com/uc-cdis/indexd/pull/400#discussion_r2243579240)
- [ ] Add a corresponding feature flag to helm chart

---

## Implementation Overview

* Main changes were made to:
* indexd/auth
* indexd/index/drivers/alchemy.py

* All the changes above:
* should be transparent to the user, and they should not notice any difference in behavior.
* should be non-breaking, as it only changes the behavior when the `authz` parameter is empty.
* However, it will throw a 401/403 is the user does not have access to the requested resource,or does not have and Authorization header which is a change from the previous behavior where it would return all the records regardless of the user's access.

* "Breaking" Changes:
* In order to enforce authorization, we need to ensure that all records have an `authz` field.
* (This is not a change in behavior to OHSU/ACED/Calypr, but it is a change in behavior to the Indexd API in that effectively authz is mandatory on write)

* Misc:
* Added stack traces to log for unhandled exceptions see changes to blueprint.py for various endpoints

## **Testing `tests/rbac`**

The `tests/rbac` suite is designed to validate RBAC-aware behavior in the indexd service, while ensuring the stability and integrity of legacy functionality. The following principles guide its architecture:

- **Preservation of Existing Tests:**
All pre-existing tests are retained without modification to guarantee backward compatibility and to ensure that legacy functionality remains unaffected by the introduction of RBAC features.

- **Comprehensive Endpoint Coverage:**
New tests are introduced to exercise RBAC logic across a wide set of API endpoints (e.g., list, read, write, update, delete). This ensures that authorization checks are consistently enforced and that the feature flag, token handling, and resource filtering behave as intended in every context.

- **Parameterized Test Design:**
Parameterized tests are used to efficiently cover combinations of public, controlled, and private `authz` resources, as well as users with and without tokens. This approach ensures all relevant access patterns are validated, including edge cases, without duplicating test logic.

- **Mocked Authorization Backend:**
Arborist responses are mocked to provide deterministic and isolated test scenarios, enabling reliable validation of access control logic without external dependencies.

## **Configuration:**
Tests verify both enabled and disabled states of the RBAC feature flag, confirming that the system defaults to legacy behavior unless explicitly configured otherwise.

* `ARE_RECORDS_DISCOVERABLE`

- **Type:** `bool`
- **Default:** `True`
- **Description:**
Controls whether any records in IndexD are discoverable via search or listing endpoints.
If set to `False`, all records are hidden from discovery, regardless of their individual authorization settings.
Note: Role-Based Access Control (RBAC) is not enabled by default.

* `GLOBAL_DISCOVERY_AUTHZ`

- **Type:** `list` or `None`
- **Default:** `[]`
- **Description:**
Overrides per-record authorization for GET/read operations during record discovery.
If set to a list of authorization requirements, these are applied globally to all records for discovery purposes.
If set to `None`, the system uses each record's individual `authz` field for authorization checks.
This setting does not affect file access permissions, only record discovery.

This approach ensures robust coverage of the new RBAC functionality while maintaining the integrity and reliability of the existing test suite.
17 changes: 17 additions & 0 deletions indexd/alias/blueprint.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
import re
import sys
import traceback

import cdislogging
import flask
import jsonschema

Expand All @@ -12,7 +16,9 @@
from .errors import NoRecordFound
from .errors import MultipleRecordsFound
from .errors import RevisionMismatch
from .. import utils

logger = cdislogging.get_logger(__name__)

blueprint = flask.Blueprint("alias", __name__)

Expand Down Expand Up @@ -159,11 +165,13 @@ def handle_multiple_records_error(err):

@blueprint.errorhandler(UserError)
def handle_user_error(err):
logger.error(err, exc_info=True)
return flask.jsonify(error=str(err)), 400


@blueprint.errorhandler(AuthError)
def handle_auth_error(err):
logger.error(err, exc_info=True)
return flask.jsonify(error=str(err)), 403


Expand All @@ -172,6 +180,15 @@ def handle_revision_mismatch(err):
return flask.jsonify(error=str(err)), 409


@blueprint.errorhandler(Exception)
def handle_uncaught_exception(err):
"""
Handle uncaught exceptions.
Delegate to utils.handle_uncaught_exception
"""
return utils.handle_uncaught_exception(err)


@blueprint.record
def get_config(setup_state):
config = setup_state.app.config["ALIAS"]
Expand Down
22 changes: 15 additions & 7 deletions indexd/auth/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
from flask import current_app
from flask import request

from .errors import AuthError
from indexd.auth.errors import AuthError, AuthzError
from indexd.errors import UserError


def authorize(*p):
Expand All @@ -20,8 +21,8 @@ def authorize(*p):

@wraps(f)
def check_auth(*args, **kwargs):
if not request.authorization:
raise AuthError("Username / password required.")
if not request.authorization.parameters.get("username"):
raise AuthError(f"Basic auth Username / password required. {request.authorization}")
current_app.auth.auth(
request.authorization.parameters.get("username"),
request.authorization.parameters.get("password"),
Expand All @@ -31,13 +32,20 @@ def check_auth(*args, **kwargs):

return check_auth
else:
method, resources = p
method, resources_ = p
if request.authorization and request.authorization.type == "basic":
current_app.auth.auth(
request.authorization.parameters.get("username"),
request.authorization.parameters.get("password"),
)
else:
if not isinstance(resources, list):
raise UserError(f"'authz' must be a list, received '{resources}'.")
current_app.auth.authz(method, list(set(resources)))
if not isinstance(resources_, list):
raise UserError(f"'authz' must be a list, received '{resources_}'.")
return current_app.auth.authz(method, list(set(resources_)))


def resources():
"""
Returns a list of resources the user has access to. Uses Arborist if available.
"""
return current_app.auth.resources()
8 changes: 8 additions & 0 deletions indexd/auth/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,11 @@ def delete(self, username):
Raises AuthError if user doesn't exist.
"""
raise NotImplementedError("TODO")

@abc.abstractmethod
def resources(self):
"""
Returns a list of resources the user has access to. Uses Arborist if available.
Raises AuthError if the user doesn't exist.
"""
raise NotImplementedError("TODO")
75 changes: 75 additions & 0 deletions indexd/auth/drivers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import functools
import time

import flask
import jwt


def request_auth_cache(maximum_ttl_seconds=1800):
"""
Decorator to cache the result of a function for a specified maximum TTL in seconds.
The actual cache duration is determined by the 'token' parameter's expiration.
If no token is provided, the maximum TTL is used and the Authorization header is included in the cache key.
"""

def decorator(func):
cache = {}

@functools.wraps(func)
def wrapper(*args, **kwargs):
key = functools._make_key(args, kwargs, typed=False)
now = time.time()

# Extract token from args or kwargs
token = kwargs.get("token")
if token is None:
# print("No token provided in kwargs")
if type(args[0]) is str:
# If the first argument is a string, assume it's the token
token = args[0]
else:
token = args[1] if len(args) > 1 else None

# Calculate token expiration duration
if token:
# Decode the JWT token without verifying the signature to get the 'exp' claim
# If the token is a string, decode it
token = token.encode('utf-8') if isinstance(token, str) else token

# we could check for jwt.exceptions.DecodeError here, but we assume the token is valid
# and just decode it to get the expiration time
payload = jwt.decode(token, options={"verify_signature": False})

exp = payload.get("exp", now + maximum_ttl_seconds)
token_ttl = max(0, exp - now)
else:
# If no token is provided, use the maximum TTL and add the Authorization header to the key.
# This is useful for cases where the function does not require a token,
# but still needs to cache based on the Authorization header.
auth_header = flask.request.headers.get('Authorization', '')
# Add the Authorization header to the key
key = functools._make_key(args + (auth_header,), kwargs, typed=False)
token_ttl = maximum_ttl_seconds

ttl = min(token_ttl, maximum_ttl_seconds)

# Check if the result is already cached and still valid
if key in cache:
result, timestamp = cache[key]
if now - timestamp < ttl:
return result

# If not cached or expired, call the function and cache the result
result = func(*args, **kwargs)
cache[key] = (result, now)

# Clean up any old cache entries
keys_to_delete = [k for k, (v, t) in cache.items() if now - t >= ttl]
for k in keys_to_delete:
del cache[k]

return result

return wrapper

return decorator
54 changes: 50 additions & 4 deletions indexd/auth/drivers/alchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from sqlalchemy.ext.declarative import declarative_base

from indexd.auth.driver import AuthDriverABC
from indexd.auth.drivers import request_auth_cache

from indexd.auth.errors import AuthError, AuthzError

Expand Down Expand Up @@ -133,14 +134,13 @@ def authz(self, method, resource):
try:
# A successful call from arborist returns a bool, else returns ArboristError
try:
authorized = self.arborist.auth_request(
get_jwt_token(), "indexd", method, resource
)
authorized = self.cached_auth_request(get_jwt_token(), "indexd", method, resource)
except Exception as e:
logger.error(
f"Request to Arborist failed; now checking admin access. Details:\n{e}"
)
authorized = False

if not authorized:
# admins can perform all operations
is_admin = self.arborist.auth_request(
Expand All @@ -157,7 +157,53 @@ def authz(self, method, resource):
"The indexd admin '/programs' logic is deprecated. Please update your policy to '/services/indexd/admin'"
)
if not is_admin:
raise AuthError("Permission denied.")
raise AuthError("Permission denied. (not is_admin)")
except Exception as err:
logger.error(err)
raise AuthzError(err)

def resources(self):
"""
Returns a list of resources for the given user.
"""
if not self.arborist:
raise AuthError(
"Arborist is not configured; cannot perform authorization check"
)
token = get_jwt_token()
try:
resources = self.caching_auth_mapping(token)
return resources
except Exception as err:
raise AuthError(
"Failed to get resources from Arborist. Please check your Arborist configuration."
)

@request_auth_cache() # cache the result of the auth request
def caching_auth_mapping(self, token):
"""
Returns a list of resources the user has access to.
Uses Arborist if available.
If a token is provided, it will use that token to get the auth mapping.
If no token is provided, it will use the default auth mapping.
"""
if token:
resources = self.arborist.auth_mapping(
jwt=token
)
else:
resources = self.arborist.auth_mapping()
return resources

@request_auth_cache() # cache the result of the auth request
def cached_auth_request(self, token, service, method, resource):
"""
Makes an authenticated request to Arborist and caches the result.
This method is used to check if the user has access to a specific resource
with a specific method.
"""
return self.arborist.auth_request(
token, service, method, resource
)


Loading
Loading