feature/rbac-2 #405

bwalsh · 2025-08-04T21:06:51Z

New Features

Introduced Role-Based Access Control (RBAC) to indexd.
- Added support for enforcing authorization on database operations via Arborist (adds RBAC to db operations).
- Integrated a cached call to Arborist to reduce authorization lookup overhead (add cached call to Arborist).
- Added configuration flag to enable or disable RBAC enforcement (Adds RBAC config).

Breaking Changes

None. The RBAC feature is gated by a configuration flag and maintains backward compatibility when disabled.

Bug Fixes

Improved error messaging for unauthorized access and token validation (Improve error handling).

Improvements

Added developer documentation for RBAC configuration and usage (developer documentation).
Added test coverage for RBAC behavior (test RBAC).

Dependency updates

None.

Deployment changes

New optional configuration setting: RBAC. When set to True, RBAC enforcement is active for protected records.
Arborist must be reachable by the indexd service for RBAC to function properly.

bwalsh · 2025-08-04T21:35:17Z

Not yet addressed:

Ensure ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ See discussion
Add a corresponding feature flag to helm chart

@Avantol13 - could you review and comment

🧾 User Story 1: Control Whether Records Are Discoverable

Title: Configurable Discovery of Indexd Records

As a platform operator,
I want to control whether indexd records are discoverable at all via a config flag,
So that I can prevent users from listing or retrieving records unless explicitly permitted.

Acceptance Criteria

Given ARE_RECORDS_DISCOVERABLE=False, when a client sends a request (with valid token):
- indexd returns a 403 Forbidden for all reads with id e.g. GET /index/<did>
  - should only 403 in situations where this record itself would've been filtered out.
- indexd filters out all records without read permission e.g. GET /index
Given ARE_RECORDS_DISCOVERABLE=True, the RBAC rules are ignored
This behavior is documented in indexd_settings.py and the README with a description of impact on runtime behavior.

🧾 User Story 2: Global Discovery Authorization Control

Title: Global Discovery Authz for Indexd Records, Support Discovery Access Independent from File Access

As a system administrator,
I want to configure a global authorization group for reading/discovering indexd records,
So that discovery can be gated separately from file access and we can support user registration workflows.

As a data commons architect,
I want to decouple discovery access (e.g. listing/searching records) from access to the underlying files,
So that I can implement workflows like "register to see what’s available", then "apply for access to download".

Acceptance Criteria

Assuming ARE_RECORDS_DISCOVERABLE=False

If GLOBAL_DISCOVERY_AUTHZ=None, then RBAC will use record-level authz fields are used to authorize GET requests to records. ie then record-level authz continues to govern access to records.
If GLOBAL_DISCOVERY_AUTHZ is set and if a user has permissions to the resource set in GLOBAL_DISCOVERY_AUTHZ, then RBAC will ignore filters for record-level authz fields and return all records.
Behavior is clearly documented, including the override effect of GLOBAL_DISCOVERY_AUTHZ.

📌 Configuration Summary

# Whether any records are discoverable at all
ARE_RECORDS_DISCOVERABLE = True  # default: True

# Override per-record authz for GET/read
# Only applies to record discovery (not file access)
# If None, use per-record `authz`
GLOBAL_DISCOVERY_AUTHZ = ["/indexd/discovery"]

Avantol13 · 2025-08-12T17:50:12Z

In general the comments above look good, thanks for all the detail.

This part:

indexd returns a 403 Forbidden for all reads with id e.g. GET /index/

I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out.

Avantol13 · 2025-08-05T15:02:45Z

indexd/auth/drivers/alchemy.py

+            resources = self.arborist.auth_mapping()
+        return resources
+
+    @timed_cache(1800)  # Cache for 30 minutes (typical JWT expiration time)


We can't hard-code this because we absolutely cannot have a response cached beyond the expiration. This has to be dynamic based on the expiration of the token. Our security is heavily reliant on the guarantee that the expiration ensures no access beyond that

@Avantol13

Re. requirements

has to be dynamic based on the expiration of the token

Understood. At the same time, previous feedback stated:

Arborist allows no token to be sent on purpose, it allows assignment of anonymous access.

Additionally, AFAIK, no validation of the token occurs now in indexd. ie no calls to authutils.token.validate_jwt()

So, if there is a token:

🆕 we can check to ensure it has not expired, use expiry time as ttl

already being used as a cache key

If there is no token:

🆕 use maximum_ttl_seconds as ttl

🆕 add authentication header to cache key (for basic and no auth)

Other:

🆕 clean up any unused cache entries

Avantol13 · 2025-08-12T17:51:53Z

indexd/drs/blueprint.py



+@blueprint.errorhandler(Exception)
+def handle_uncaught_exception(err):


I see this duplicated across routes, can you implement once in a utils and import to reduce code duplication?

bwalsh · 2025-08-13T00:01:55Z

In general the comments above look good, thanks for all the detail.

This part:

indexd returns a 403 Forbidden for all reads with id e.g. GET /index/

I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out.

Thanks. I edited the comment above

bwalsh · 2025-08-14T03:48:46Z

squash commits

tests/rbac/test_auth_cache.py

tests/discovery/test_auth_cache.py

tests/rbac/test_auth_cache.py

tests/discovery/test_auth_cache.py

indexd/index/drivers/query/urls.py

indexd/index/drivers/alchemy.py

bwalsh · 2025-09-23T00:23:15Z

@Avantol13 I've addressed all PR items. Please see #405 (comment) for a followup question.

Avantol13

re: db driver and how to centrally organize things. Here's my current thinking:

We should, theoretically, be able to move all the new code we need with stateful decisions out of the db driver b/c nothing really needs the db.

Here's my idea:

Put the authz check for discovery in a similar authorize decorator to this

indexd/indexd/auth/__init__.py

Line 9 in b5b198a

def authorize(*p):

, maybe call it authorize_discovery and add that decorator everywhere you need. The logic in there should look like this:

if is_discovery_enabled (check config only)
- Get user's authz (perhaps cache, could even use flask's per-request cache flask.g if that somehow simplifies - I know that won't save beyond the request)
- if config GLOBAL_AUTHZ set
  - Check if the GLOBAL_authz is in the user's authz
- if config GLOBAL not set
  - Check if user's authz contains records authz (this will require making a db call based on the request's ID)

done. Now we have appropriately denied access pre-blueprint logic with this decorator.

Within the blueprints that need the logic for filtering, now we can implement a shared set of util functions.

Before making a db query:

 * if is_discovery_enabled (check config only)
      * get user authz (perhaps from cache)
      * don't worry about GLOBAL AUTHZ at all b/c we already handled that in the new decorator of this endpoint, so we only get here if they are authorized
      * Add filter for records to db query based on user authz

This way, the stateful logic stays out of the db driver and in the request handling (which is where it should be) and we have minimized duplication of code as much as possible.

What do you think?

indexd/auth/drivers/__init__.py

Avantol13 · 2025-10-03T16:30:45Z

indexd/auth/drivers/__init__.py

+
+            return result
+
+        def calculate_ttl(now, token) -> int | None:


I feel we might be overcomplicating this cache. Can't we simply keep the exp in the cache itself and instead of doing ttl math, just invalidate entries before now?

indexd/auth/drivers/__init__.py

Avantol13 · 2025-10-03T16:38:44Z

indexd/auth/drivers/__init__.py

+    """
+
+    def decorator(func):
+        cache = {}


this only caches on a per instance basis, so this benefit is lessened by automatic routing of requests in multi-instance deployments where it might not route to the same instance (and having multiple instances is typical for indexd due to the load it can receive). I would prefer to see a more robust caching solution shared across instances, but that would require a shared resource and I'm not sure the actual time saved here would outweigh interacting with that shared resource (perhaps it would b/c it requires an arborist query).

Fence has an example of this shared resource -> in memory cache setup but it's a bit more involved... I suppose we can revisit if this per-instance approach isn't performant enough

indexd/index/drivers/alchemy.py

indexd/index/drivers/query/urls.py

indexd/index/drivers/alchemy.py

Avantol13 requested changes Aug 12, 2025

View reviewed changes

bwalsh force-pushed the feature/rbac-2 branch from f64a1ac to 8a566db Compare August 12, 2025 23:16

bwalsh force-pushed the feature/rbac-2 branch from 0035d68 to c941fc0 Compare August 13, 2025 17:45

bwalsh requested a review from Avantol13 August 14, 2025 03:47

Adds RBAC - ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ

0a4bffe

bwalsh force-pushed the feature/rbac-2 branch from 76952a1 to 0a4bffe Compare August 20, 2025 20:19

Avantol13 requested changes Sep 3, 2025

View reviewed changes

bwalsh added 2 commits September 22, 2025 10:46

rely on existing WRITE auth

6c55b7f

PR feedback

d84e660

Avantol13 requested changes Oct 3, 2025

View reviewed changes



		@blueprint.errorhandler(Exception)
		def handle_uncaught_exception(err):

feature/rbac-2 #405

Are you sure you want to change the base?

feature/rbac-2 #405

Uh oh!

Conversation

bwalsh commented Aug 4, 2025

New Features

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

Uh oh!

bwalsh commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧾 User Story 1: Control Whether Records Are Discoverable

Acceptance Criteria

🧾 User Story 2: Global Discovery Authorization Control

Acceptance Criteria

📌 Configuration Summary

Uh oh!

Avantol13 commented Aug 12, 2025

Uh oh!

Avantol13 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

bwalsh Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwalsh Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Avantol13 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

bwalsh Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

bwalsh commented Aug 13, 2025

Uh oh!

bwalsh commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwalsh commented Sep 23, 2025

Uh oh!

Avantol13 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Avantol13 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Avantol13 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwalsh commented Aug 4, 2025 •

edited

Loading

bwalsh Aug 12, 2025 •

edited

Loading

bwalsh commented Aug 14, 2025 •

edited

Loading