Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/rbac.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@

# RBAC in Indexd

## **Problem:**
As an indexd or DRS user, when I list objects, I only expect to see items that belong to projects I have access to.

## **Solution:**

Assuming a Bearer token is included on the request, I expect indexd to query arborist, extract the projects I have access to and add those as an "authz" filter when querying the database. If a token is not present, I expect that index will still query Arborist to retrieve default permissions. These calls should be cached for performance.

A feature flag should control this query injection, the flag should default to FALSE, as this will improve chances of getting a PR approved. All current unit tests should pass. Additional unit tests should confirm behavior.

## **Alternatives:**
We could have a RBAC aware proxy front end indexd - however will add complexity and processing overhead

## **Context:**

The indexd service is used to manage and serve metadata about data objects, such as files in a data repository. It currently does not enforce any access control on the objects it serves, which means that any user can see all objects regardless of their permissions.

Main [auth code](https://github.com/uc-cdis/indexd/blob/fb21317f2bc72ad9b0ea143fe9388122f59d10f4/indexd/auth/drivers/alchemy.py#L37-L36) has two methods `auth` and `authz`. The [indexd.authorize](https://github.com/uc-cdis/indexd/blob/0859c639f99a7cbce0a0cd15564ed9847814a5ff/indexd/auth/__init__.py#L10) method checks if Basic auth header is present auth is called otherwise authz is called. The revproxy gateway injects this header [here](https://github.com/uc-cdis/gen3-helm/blob/9ccd25c3e4c40f87f750883802ece5866cdfbc24/helm/revproxy/gen3.nginx.conf/indexd-service.conf#L41-L53) This reliance on Basic auth is concerning and it's rationale is undocumented. It appears that it is not used for either create or read based on [client API](https://github.com/uc-cdis/indexclient/blob/master/indexclient/client.py)

**Approach:**
Add code to [get_index](https://github.com/uc-cdis/indexd/blob/b6ec68f15a8bb61e99c0daf3f6af729691f213c7/indexd/index/blueprint.py#L60) to call [auth_mapping](https://github.com/uc-cdis/gen3authz/blob/master/src/gen3authz/client/arborist/base.py#L286)
and inject resources (projects) into query.
- [x] skip if feature flag not enabled
- [x] call arborist with token to get resources for user, or without token to get default resources
- [x] Cache arborist results for 30 minutes (typical JWT token lifetime)
- [x] update dependency gen3authz as latest version includes token as parameter (as an alternative to username)
- [x] use [mock_arborist_requests](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/tests/conftest.py#L230) pytest fixture
- [x] ensure all existing tests pass
- [x] add new tests specific to RBAC
- [x] Add feature flag to [default_settings](https://github.com/uc-cdis/indexd/blob/8ff50b9c829920907181d5c186c907e06f5c4a5d/indexd/default_settings.py)
- [x] Remove extraneous logging and debugging code
- [x] Ensure ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ See [discussion](https://github.com/uc-cdis/indexd/pull/400#discussion_r2243579240)
- [ ] Add a corresponding feature flag to helm chart

---

## Implementation Overview

* Main changes were made to:
* indexd/auth
* indexd/index/drivers/alchemy.py

* All the changes above:
* should be transparent to the user, and they should not notice any difference in behavior.
* should be non-breaking, as it only changes the behavior when the `authz` parameter is empty.
* However, it will throw a 401/403 is the user does not have access to the requested resource,or does not have and Authorization header which is a change from the previous behavior where it would return all the records regardless of the user's access.

* "Breaking" Changes:
* In order to enforce authorization, we need to ensure that all records have an `authz` field.
* (This is not a change in behavior to OHSU/ACED/Calypr, but it is a change in behavior to the Indexd API in that effectively authz is mandatory on write)

* Misc:
* Added stack traces to log for unhandled exceptions see changes to blueprint.py for various endpoints

## **Testing `tests/rbac`**

The `tests/rbac` suite is designed to validate RBAC-aware behavior in the indexd service, while ensuring the stability and integrity of legacy functionality. The following principles guide its architecture:

- **Preservation of Existing Tests:**
All pre-existing tests are retained without modification to guarantee backward compatibility and to ensure that legacy functionality remains unaffected by the introduction of RBAC features.

- **Comprehensive Endpoint Coverage:**
New tests are introduced to exercise RBAC logic across a wide set of API endpoints (e.g., list, read, write, update, delete). This ensures that authorization checks are consistently enforced and that the feature flag, token handling, and resource filtering behave as intended in every context.

- **Parameterized Test Design:**
Parameterized tests are used to efficiently cover combinations of public, controlled, and private `authz` resources, as well as users with and without tokens. This approach ensures all relevant access patterns are validated, including edge cases, without duplicating test logic.

- **Mocked Authorization Backend:**
Arborist responses are mocked to provide deterministic and isolated test scenarios, enabling reliable validation of access control logic without external dependencies.

## **Configuration:**
Tests verify both enabled and disabled states of the RBAC feature flag, confirming that the system defaults to legacy behavior unless explicitly configured otherwise.

* `ARE_RECORDS_DISCOVERABLE`

- **Type:** `bool`
- **Default:** `True`
- **Description:**
Controls whether any records in IndexD are discoverable via search or listing endpoints.
If set to `False`, all records are hidden from discovery, regardless of their individual authorization settings.
Note: Role-Based Access Control (RBAC) is not enabled by default.

* `GLOBAL_DISCOVERY_AUTHZ`

- **Type:** `list` or `None`
- **Default:** `[]`
- **Description:**
Overrides per-record authorization for GET/read operations during record discovery.
If set to a list of authorization requirements, these are applied globally to all records for discovery purposes.
If set to `None`, the system uses each record's individual `authz` field for authorization checks.
This setting does not affect file access permissions, only record discovery.

This approach ensures robust coverage of the new RBAC functionality while maintaining the integrity and reliability of the existing test suite.
17 changes: 17 additions & 0 deletions indexd/alias/blueprint.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
import re
import sys
import traceback

import cdislogging
import flask
import jsonschema

Expand All @@ -12,7 +16,9 @@
from .errors import NoRecordFound
from .errors import MultipleRecordsFound
from .errors import RevisionMismatch
from .. import utils

logger = cdislogging.get_logger(__name__)

blueprint = flask.Blueprint("alias", __name__)

Expand Down Expand Up @@ -159,11 +165,13 @@ def handle_multiple_records_error(err):

@blueprint.errorhandler(UserError)
def handle_user_error(err):
logger.error(err, exc_info=True)
return flask.jsonify(error=str(err)), 400


@blueprint.errorhandler(AuthError)
def handle_auth_error(err):
logger.error(err, exc_info=True)
return flask.jsonify(error=str(err)), 403


Expand All @@ -172,6 +180,15 @@ def handle_revision_mismatch(err):
return flask.jsonify(error=str(err)), 409


@blueprint.errorhandler(Exception)
def handle_uncaught_exception(err):
"""
Handle uncaught exceptions.
Delegate to utils.handle_uncaught_exception
"""
return utils.handle_uncaught_exception(err)


@blueprint.record
def get_config(setup_state):
config = setup_state.app.config["ALIAS"]
Expand Down
5 changes: 5 additions & 0 deletions indexd/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from indexd.index.drivers.alchemy import Base as IndexBase
from indexd.alias.drivers.alchemy import Base as AliasBase
from indexd.auth.drivers.alchemy import Base as AuthBase
from .auth.discovery_context import ensure_auth_context
from .bulk.blueprint import blueprint as indexd_bulk_blueprint
from .index.blueprint import blueprint as indexd_index_blueprint
from .alias.blueprint import blueprint as indexd_alias_blueprint
Expand Down Expand Up @@ -42,6 +43,10 @@ def app_init(app, settings=None):

app.auth = settings["auth"]
app.hostname = os.environ.get("HOSTNAME") or "http://example.io"

# write flask app and request information to a ContextVar
app.before_request(ensure_auth_context)

app.register_blueprint(indexd_bulk_blueprint)
app.register_blueprint(indexd_index_blueprint)
app.register_blueprint(indexd_alias_blueprint)
Expand Down
20 changes: 14 additions & 6 deletions indexd/auth/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
from flask import current_app
from flask import request

from .errors import AuthError
from indexd.auth.errors import AuthError, AuthzError
from indexd.errors import UserError


def authorize(*p):
Expand All @@ -20,7 +21,7 @@ def authorize(*p):

@wraps(f)
def check_auth(*args, **kwargs):
if not request.authorization:
if not request.authorization.parameters.get("username"):
raise AuthError("Username / password required.")
current_app.auth.auth(
request.authorization.parameters.get("username"),
Expand All @@ -31,13 +32,20 @@ def check_auth(*args, **kwargs):

return check_auth
else:
method, resources = p
method, authz_resources = p
if request.authorization and request.authorization.type == "basic":
current_app.auth.auth(
request.authorization.parameters.get("username"),
request.authorization.parameters.get("password"),
)
else:
if not isinstance(resources, list):
raise UserError(f"'authz' must be a list, received '{resources}'.")
current_app.auth.authz(method, list(set(resources)))
if not isinstance(authz_resources, list):
raise UserError(f"'authz' must be a list, received '{authz_resources}'.")
return current_app.auth.authz(method, list(set(authz_resources)))


def get_authorized_resources():
"""
Returns a list of resources the user has access to. Uses Arborist if available.
"""
return current_app.auth.resources()
127 changes: 127 additions & 0 deletions indexd/auth/discovery_context.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
from contextvars import ContextVar
import inspect
from functools import wraps
from typing import Any, Mapping


import flask

from indexd import auth

discovery_context = ContextVar("discovery_context", default={})


def set_auth_context(**kwargs) -> None:
"""
Set key-value pairs in the request context.
You should not need to call this function directly, as
ensure_auth_context() will set the context for you.
However, this function is exposed for testing purposes.
"""
discovery_context.set(kwargs)


def get_auth_context() -> dict:
"""
Get the current request context.
Returns a dict with variables:
are_records_discoverable: application wide setting, ARE_RECORDS_DISCOVERABLE
can_user_discover: for the current user, does this user have access to the GLOBAL_DISCOVERY_AUTHZ
authorized_resources: what resources the current user has read access to (from Arborist)
"""
return discovery_context.get()


def reset_auth_context() -> None:
"""
Reset the request context to an empty dict.
Primarily useful for testing.
"""
discovery_context.set({})


def _get_authorized_resources() -> list[str]:
"""Retrieve the user's authorized resources from Arborist."""
# Arborist allows no token to be sent on purpose, it allows assignment of anonymous access
# See https://github.com/uc-cdis/indexd/pull/400#discussion_r2243298256
authorized_resources = []
for resource, permissions in auth.get_authorized_resources().items():
for permission in permissions:
method = permission['method'].lower().strip()
service = permission['service'].lower().strip()
if method == "read" and service in ["indexd", "*"]:
authorized_resources.append(resource)
return authorized_resources


def ensure_auth_context() -> None:
"""
Ensure that the request context has the authorized resources set.
If not, retrieve them from Arborist and set them in the context.
Sets the following variables:
are_records_discoverable: application wide setting, ARE_RECORDS_DISCOVERABLE
can_user_discover: for the current user, does this user have access to the GLOBAL_DISCOVERY_AUTHZ
authorized_resources: what resources the current user has read access to (from Arborist)
"""
are_records_discoverable = flask.current_app.config.get('ARE_RECORDS_DISCOVERABLE', True)

# Does user have access to "GLOBAL_DISCOVERY_AUTHZ" resource?
global_discovery_authz: list = flask.current_app.config.get('GLOBAL_DISCOVERY_AUTHZ', [])
authorized_resources: list = []
if not are_records_discoverable:
authorized_resources = _get_authorized_resources()
# if any of the global discovery authz resources are in the authorized resources, RBAC is not enabled
can_user_discover = False
if any(resource in authorized_resources for resource in global_discovery_authz):
can_user_discover = True

# Remove global discovery authz resources from authorized_resources to avoid confusion
authorized_resources = [_ for _ in authorized_resources if _ not in global_discovery_authz]

# if are_records_discoverable is True, then can_user_discover is True for everyone
if are_records_discoverable:
can_user_discover = True

set_auth_context(are_records_discoverable=are_records_discoverable,
can_user_discover=can_user_discover,
authorized_resources=authorized_resources)


def authorize_discovery(func):
"""
Injects `can_user_discover` and `authorized_resources` into the wrapped callable
from the ContextVar-backed auth context (set by ensure_auth_context).

Injection rules:
- Only inject for parameters actually present in the target signature.
- Only fill when the argument is missing or None (does not override explicit values).
"""
sig = inspect.signature(func)
inject_params = ("can_user_discover", "authorized_resources")
is_async = inspect.iscoroutinefunction(func)

def _apply(bound: inspect.BoundArguments) -> None:
ctx: Mapping[str, Any] = get_auth_context() or {}
for name in inject_params:
if name not in sig.parameters:
continue
current = bound.arguments.get(name, None)
if current is None:
if name in ctx:
bound.arguments[name] = ctx[name]

if is_async:
@wraps(func)
async def async_wrapper(*args, **kwargs):
bound = sig.bind_partial(*args, **kwargs)
_apply(bound)
# Use keyword-style call to avoid "multiple values" issues
return await func(**bound.arguments)
return async_wrapper

@wraps(func)
def wrapper(*args, **kwargs):
bound = sig.bind_partial(*args, **kwargs)
_apply(bound)
return func(**bound.arguments)
return wrapper
8 changes: 8 additions & 0 deletions indexd/auth/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,11 @@ def delete(self, username):
Raises AuthError if user doesn't exist.
"""
raise NotImplementedError("TODO")

@abc.abstractmethod
def resources(self):
"""
Returns a list of resources the user has access to. Uses Arborist if available.
Raises AuthError if the user doesn't exist.
"""
raise NotImplementedError("TODO")
Loading