Introduce arborist RBAC #400

bwalsh · 2025-06-11T00:57:02Z

Status

test progress: all passed (skipped single table tests)

Problem:
As an indexd or DRS user, when I list objects, I only expect to see items that belong to projects I have access to.

Solution:
Assuming a Bearer token is included on the request, I expect indexd to query arborist, extract the projects I have access to and add those as an "authz" filter when querying the database. A feature flag should control this query injection, the flag should default to FALSE, as this will improve chances of getting a PR approved. All current unit tests should pass. Additional unit tests should confirm behavior.

Alternatives:
We could have a RBAC aware proxy front end indexd - however will add complexity and processing overhead

Context:
Main auth code has two methods auth and authz. The indexd.authorize method checks if Basic auth header is present auth is called otherwise authz is called. The revproxy gateway injects this header here This reliance on Basic auth is concerning and it's rationale is undocumented. It appears that it is not used for either create or read based on client API

Approach:
Add code to get_index to call auth_mapping
and inject resources (projects) into query.

skip if feature flag not enabled
401 if Bearer token not available
update dependency gen3authz as latest version includes token as parameter (as an alternative to username)
use mock_arborist_requests pytest fixture
update tests
add new tests specific to RBAC
Add feature flag to default_settings
Add corresponding feature flag to helm chart
Remove extraneous logging and debugging code

bwalsh · 2025-06-11T18:05:29Z

reviewers guide

Setup Instructions

See docs/local_dev_environment.md for details on setting up a local development environment.

Testing

See pytest
- Suggest doing this on master first branch to ensure all tests pass

Code Review

main changes were made to indexd.index.blueprint.py::get_index
- goal of changes is to check the authz parameter - if it is empty, then substitute it with all the Arborist resources they have access to. see
- In order to do this we needed to fetch the current user's authorized resources from Arborist.
  - In order to do that, we updated the gen3authz dependency to use the latest version of gen3authz, which included methods to look up resources by token as opposed to by username see.
- Now that we have the user's resources, we can check if the requested resource is in the list of resources they have access to.
  - We did not redesign the query logic:
    - The existing API and logic is: If the authz parameter is set, the records returned must be in ALL "projects" in specified in the authz parameter
    - We want to keep this behavior, so it shouldn't be breaking
    - However, if the authz parameter is empty, we will return all records that the user has access to, regardless of the "project" they are in.
    - This means that if a user has access to multiple projects, they will see records from all those projects when authz is empty.
    - See alchemy driver
    - We check the blueprint record for a rbac attribute, and if true, we retrieve resources and enforce access and add and ANY filter to query to limit unconstrained queries
- All of the changes above:
  - should be transparent to the user, and they should not notice any difference in behavior.
  - should be non-breaking, as it only changes the behavior when the authz parameter is empty.
  - However, it will throw a 401/403 is the user does not have access to the requested resource,or does not have and Authorization header which is a change from the previous behavior where it would return all the records regardless of the user's access.
"Breaking" Changes:
- In order to enforce authorization, we need to ensure that all records have an authz field. See
- (This is not a change in behavior to OHSU/ACED/Calypr, but it is a change in behavior to the Indexd API in that effectively authz is mandatory on write)
Misc:
- Added stack traces to log for unhandled exceptions see
Tests:
- Introducing this changed required us to update the tests to account for the new behavior.
  - There are a lot of existing tests :-)
  - There are a lot of deprecated features that are still in the codebase :-(
  - All of the reads see client.get(...)
```
res = client.get(alias_endpoint) # old
res = client.get(alias_endpoint, headers=user) # new
```
  - There are legacy fields in indexd the indexd records that are no longer used,see authz vs acl
  - Since the authz field is now required, AND is tied to the user's Arborist resources, we need to ensure that all records have an authz field have proper values.
    - In most cases, authorized resource are: ["/programs/bpa/projects/UChicago", "/programs/other/projects/project"]
    - In most cases, the un-authorized resources are: ["/programs/forbidden/projects/project"]
    - All of these need to be updated in the tests to ensure that the authz field is set correctly.
  - Test fixtures:
    - There is no "Arborist" server, that is mocked out in the tests.
    - See conftest

lbeckman314 · 2025-06-12T19:24:30Z

Testing Steps 🌀

1. Start Postgres ✔️

➜ brew services start postgresql
==> Successfully started `postgresql@14` (label: homebrew.mxcl.postgresql@14)

➜ brew services list
Name              Status  User     File
postgresql@14     started beckmanl ~/Library/LaunchAgents/[email protected]

➜ psql postgres
psql (14.13 (Homebrew))
Type "help" for help.
postgres=# \c
You are now connected to database "postgres" as user "beckmanl".
postgres=# exit

2. Run Indexd Tests ✔️

➜ gh pr checkout 400
Switched to branch 'feature/rbac'

➜ poetry install
Installing the current project: indexd (5.1.2)

➜ poetry run pytest -vv --cov=indexd --cov-report xml tests
537 passed, 303 skipped, 5320 warnings in 1513.87s (0:25:13)

lbeckman314 · 2025-06-16T22:16:19Z

Deployment Steps 🚀

1. Deploy Gen3 ✔️

Tip

Required deployment updates:

values.yaml

# Indexd configuration
indexd:
  image:
    repository: quay.io/ohsu-comp-bio/indexd
    tag: feature_rbac  # <---- point to this fork of Indexd (#400)

user.yaml

authz:
  resources:
  - name: services
    subresources:
    - name: indexd
      subresources:
      - name: admin   # <---- Defining the /services/indexd/admin resource

  policies:
  - id: indexd_admin
    description: full access to indexd API
    role_ids:
      - administrator
    resource_paths:
      - /programs
      - /data_file
      - /services/indexd/admin   # <---- Adding the resource to the indexd_admin policy

➜ git clone https://github.com/ACED-IDP/gen3-helm.git -b ohsu-develop

➜ cd gen3-helm

➜ helm dependency build ./helm/gen3

➜ helm upgrade --install local ./helm/gen3  -f values.yaml -f user.yaml

➜ kubectl get deployments/indexd-deployment
NAME                READY   UP-TO-DATE   AVAILABLE
indexd-deployment   1/1     1            1

2. Add Data File ✔️

➜ g3t init $(basename $PWD)

➜ echo 'Example Data' > example.txt

➜ g3t add example.txt --patient example

➜ g3t meta init

➜ g3t commit -m "test: add example file"

➜ g3t push

3. Current Behavior (RBAC Filtering disabled by default) ✔️

➜ curl -s https://calypr.ohsu.edu/index/index
{
  "records": [
    {
      "authz": [
        "/programs/cbds/projects/example"
      ],
      "did": "540b7f64-8e85-5ba1-9d8d-50ebe54e0632",
      "file_name": "example.txt",
    },
  ],
}

3. Enable RBAC Filtering

Tip

helm/indexd/indexd-settings/local_settings.py

CONFIG["RBAC"] = True

➜ helm dependency update ./helm/gen3 

➜ helm upgrade --install local ./helm/gen3 -f values.yaml -f user.yaml

➜ kubectl rollout restart deployment/indexd-deployment

4. New Behavior (RBAC Filtering enabled) ✔️

1. No Bearer Token ❌

`/index`

➜ curl -s https://calypr.ohsu.edu/index/index
{
  "error": "Authorization header is required for RBAC"
}

`/ga4gh`

➜ curl -s https://calypr.ohsu.edu/ga4gh/drs/v1/objects
{
  "msg": "Authorization header is required for RBAC",
  "status_code": 403
}

2. Invalid Bearer Token ⚠️

`/index`

➜ curl -s -H "Authorization: Bearer BAD_EXAMPLE" https://calypr.ohsu.edu/index/index
{
  "error": "Failed to get resources from Arborist. Please check your Arborist configuration."
}

`/ga4gh`

➜ curl -s -H "Authorization: Bearer BAD_EXAMPLE" https://calypr.ohsu.edu/ga4gh/drs/v1/objects
{
  "msg": "Failed to get resources from Arborist. Please check your Arborist configuration.",
  "status_code": 403
}

3. Valid Bearer Token ✅

Tip

Access Token retrieved after logging in to Frontend-Framework:

Chrome: Developer Tools > Application > Storage > Cookies > https://calypr.ohsu.edu > access_token
Firefox: Developer Tools > Storage > Cookies > https://calypr.ohsu.edu > access_token

➜ export TOKEN=<access_token>

`/index`

➜ curl -s -H "Authorization: Bearer $TOKEN" https://calypr.ohsu.edu/index/index
{
  "records": [
    {
      "authz": [
        "/programs/cbds/projects/example"
      ],
      "did": "540b7f64-8e85-5ba1-9d8d-50ebe54e0632",
      "file_name": "example.txt",
    },
  ],
}

`/ga4gh`

➜ curl -s -H "Authorization: Bearer $TOKEN" https://calypr.ohsu.edu/ga4gh/drs/v1/objects
{
  "drs_objects": [
    {
      "access_methods": [
        {
          "access_id": "s3",
          "access_url": {
            "url": "s3://cbds/540b7f64-8e85-5ba1-9d8d-50ebe54e0632/example.txt"
          },
        }
      ],
      "name": "example.txt",
      "self_uri": "drs://PREFIX:540b7f64-8e85-5ba1-9d8d-50ebe54e0632",
    },
}

Environment ⚙️

Cluster: Kind (Setup Steps)
Gen3-Helm:
- Repo: ACED-IDP/gen3-helm
- Branch: ohsu-develop
- Commit: 442fecc
Indexd:
- Image: quay.io/ohsu-comp-bio/indexd:feature_rbac
- Commit: 46606d6
gen3-client: 2023.11 (ACED-IDP Fork)
g3t: 0.0.7rc12

Open Questions + Next Steps 🌀

How can we best test another user's access (e.g. mock/service user)?
Is the feature flag for RBAC filtering ENABLE_RBAC_FILTERING or RBAC tests/default_test_settings.py#41?
Update Indexd Config to read from environmental variable so this can be enabled/disabled via Helm values files
Check access to the /ga4gh/ and /index endpoints — do they comply with RBAC filtering are does one provide access while the other doesn't?
- We can ignore /index-admin/ as that endpoint has been removed in OHSU's Gen3-Helm Charts

Additional Resources 📚

Indexd endpoints in Gen3-Helm
- /ga4gh/: GA4GH endpoint for DOS resolver and DRS server
- /index: Primary Indexd endpoint
- /index-admin/: Indexd Admin endpoint
Indexd API (Swagger)
- global: Search for an alias or index, potentially even a distributed search.
- index: Associate a file (object) with a unique id, and store some basic metadata.
- bulk: bulk endpoints
- query: query endpoints
- DOS: Data Object Service Retrieval Endpoints
- DRS: Data Repository Service Retrieval Endpoints
- bundle: Bundle endpoints.
- GUID: Endpoints for generation of Gen3 GUIDs
- system: System endpoints

Happy 30th Birthday APOD! 🥳

bwalsh · 2025-06-18T23:28:00Z

Feature Request ⚙️

@lbeckman314 Can you remove the comment above re. the use case document? The request is here:
See updated https://docs.google.com/document/d/1tHFyI-s8N8DccJYnbfxo-hBgOrULms7DMOAmXtqhNeA/edit?usp=drive_web&ouid=110793006573203727769

bwalsh · 2025-06-18T23:37:24Z

Update 💥

removed extraneous DEBUG logging
moved rbac enforcement to alchemy driver

improve arborist check

adds additional checks improve test_multiple_endpoints

Avantol13

The detail in the PR description is great, but it must follow our PR template to be parsed correctly. Please move any relevant documentation to a markdown file in the docs folder if you think it's widely useful going forward, otherwise, ensure the PR description follows our template.

You cannot include other markdown headings due to the automated parsing for our release notes, but you can include text above the templated headings with any additional information about the PR. PR template

This initial review is a cursory, high-level single read-through of the code itself and I have not done any setup or testing (which we will need to do eventually).

Avantol13 · 2025-07-30T16:40:08Z

indexd/alias/blueprint.py


 @blueprint.errorhandler(UserError)
 def handle_user_error(err):
+    print(f"Uncaught Exception: {err}", file=sys.stderr)


Please use cdislogging, not direct prints

Avantol13 · 2025-07-30T16:40:53Z

docs/local_dev_environment.md

 You can install Poetry.  Make sure the virtual environment is activated.

 ```console
+# Note: this method is deprecated, returns a 404.


can you update the install markdown with their recommended method and remove this deprecated one

defer for now

Avantol13 · 2025-07-30T16:43:36Z

indexd/auth/drivers/alchemy.py

            try:
+                token = get_jwt_token()
+                if not token:
+                    raise AuthzError("No JWT token found for authorization check")


Arborist allows no token to be sent on purpose, it allows assignment of anonymous access. So we don't want to raise this error here

Avantol13 · 2025-07-30T16:43:49Z

indexd/auth/drivers/alchemy.py

            if not authorized:
+                token = get_jwt_token()
+                if not token:
+                    raise AuthError("No JWT token found for authorization check")


see above comment

Avantol13 · 2025-07-30T16:44:51Z

indexd/auth/drivers/alchemy.py

+            )
+        token = get_jwt_token()
+        try:
+            _ = self.arborist.auth_mapping(


don't use _ as a variable name unless it's a return that is unused. Here we're returning it, so we need a name

Avantol13 · 2025-07-30T17:33:05Z

tests/test_rbac.py

+    assert data_all_by_md.status_code == 403, f"Expected status code 403, got {data_all_by_md.status_code}"
+
+
+def test_multiple_endpoints(client, user, mock_arborist_requests, is_rbac_configured):


see previous test comment, we need to break this up into smaller, more focused tests

Avantol13 · 2025-07-30T17:33:39Z

.python-version

@@ -0,0 +1 @@
+3.12


everything we run must be Python 3.9

can you remove this and reinstall and relock on 3.9?

Avantol13 · 2025-07-30T17:35:11Z

indexd/dos/blueprint.py

+
+
+@blueprint.errorhandler(AuthError)
+def handle_requester_auth_error(err):


any new public method needs a Google-style docstring

Avantol13 · 2025-07-30T17:37:35Z

indexd/drs/blueprint.py

 from indexd.index.errors import NoRecordFound as IndexNoRecordFound
 from indexd.errors import IndexdUnexpectedError
 from indexd.utils import reverse_url
+import traceback


please ensure isort-style imports.

3 import sections:

python built-ins

third-party

within this code

and each one is alphabetically ordered

traceback is a built-in so it should be in the first block of imports

Avantol13 · 2025-07-30T18:40:46Z

indexd/default_settings.py

    },
 }

+CONFIG["RBAC"] = False  # RBAC is not enabled by default


We should consider a different name for this and a little more description here about how this configuration affects the runtime of the service (some instructions to an operator to understand what True/False really does)

I would maybe recommend ARE_RECORDS_DISCOVERABLE and default to True. Note in a comment that the records themselves contain only file metadata which includes required authorization for underlying files.

I also suspect there is a use case for authorizing the discovery of the records separately from the authorization required for the underlying files. Important to remember that the authz in indexd records currently is intended to represent the authorization required for the underlying files - not the record itself. And I can forsee a potential use of this feature being: no data is discoverable until you "register", then all data is discoverable but you have to apply to specific studies to get access to underlying data.

This solution as it stands is not flexible enough to support the above b/c it couples the authz for the underlying files with the authz to view the indexd record itself. I'm not convinced this is a super future-proof approach.

What we could consider are 2 configs, 1 to turn discovery off and one to toggle whether or not there's a separate authz for all records

ARE_RECORDS_DISCOVERABLE: False # None below means that each record will be authorized based # on the authz specified for the underlying files. # If you set a global discovery authz, this OVERRIDES # individual record authz for the purpose of discovery # (e.g. reading the records). Importantly, it DOES NOT # change any behavior with regards to the authz on the # record controlling access to underlying data. GLOBAL_DISCOVERY_AUTHZ: ["/indexd/discovery"]. # or None

I'd like to make sure we support something like this to keep things future proof. So if GLOBAL_DISCOVERY_AUTHZ is set, you ignore the authz on the record and use it instead (for ONLY GET/read records).

bwalsh mentioned this pull request Jun 11, 2025

Enable RBAC on reads ohsu-comp-bio/indexd#1

Open

7 tasks

This comment was marked as duplicate.

Sign in to view

bwalsh added 8 commits June 18, 2025 16:13

introduce arborist RBAC

8de2867

ensure Authorization header

f2a4bb1

adds resources

0f92bd5

adds rbac enforcement

1e1cfb3

adds handle_uncaught_exception

cfce1d4

adds handle_uncaught_exception, blueprint.rbac

192c095

alias None check

dbb260e

enforce rbac

442fecc

bwalsh force-pushed the feature/rbac branch from e3d2db4 to 442fecc Compare June 18, 2025 23:21

improve arborist check

d42214c

improve arborist check

bwalsh force-pushed the feature/rbac branch from 2671872 to d42214c Compare June 25, 2025 20:51

improve test_multiple_endpoints

1b7b6e3

adds additional checks improve test_multiple_endpoints

bwalsh force-pushed the feature/rbac branch from 0b81495 to 1b7b6e3 Compare July 3, 2025 13:31

add rbac to dos

9500f84

bwalsh mentioned this pull request Jul 8, 2025

adds auth header to index uc-cdis/fence#1279

Open

adds rbac to /{GUID}

6a62030

This was referenced Jul 9, 2025

feature/rbac ACED-IDP/gen3_util#130

Merged

adds auth to indexclient ACED-IDP/aced_etl_pod#49

Closed

feature/rbac ACED-IDP/aced_etl_pod#50

Closed

adds auth to all gets uc-cdis/indexclient#82

Open

enforce on additional endpoints

ad3ca66

quinnwai mentioned this pull request Jul 11, 2025

Pass auth token to all indexd calls calypr/git-drs#21

Merged

8 tasks

improves tests for ../q ../versions ../latest

2da3cfd

bwalsh added 2 commits July 14, 2025 16:35

add rbac to /index/bundle

eecf086

fix: sql warnings subquery, cartesian product

9e9e114

Avantol13 requested changes Jul 30, 2025

View reviewed changes

bwalsh mentioned this pull request Aug 4, 2025

feature/rbac-2 #405

Open

		assert data_all_by_md.status_code == 403, f"Expected status code 403, got {data_all_by_md.status_code}"


		def test_multiple_endpoints(client, user, mock_arborist_requests, is_rbac_configured):



		@blueprint.errorhandler(AuthError)
		def handle_requester_auth_error(err):

		@@ -0,0 +1 @@
		3.12

Introduce arborist RBAC #400

Are you sure you want to change the base?

Introduce arborist RBAC #400

Uh oh!

Conversation

bwalsh commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwalsh commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

reviewers guide

Setup Instructions

Testing

Code Review

Uh oh!

lbeckman314 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing Steps 🌀

1. Start Postgres ✔️

2. Run Indexd Tests ✔️

Uh oh!

This comment was marked as duplicate.

lbeckman314 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deployment Steps 🚀

1. Deploy Gen3 ✔️

2. Add Data File ✔️

3. Current Behavior (RBAC Filtering disabled by default) ✔️

3. Enable RBAC Filtering

4. New Behavior (RBAC Filtering enabled) ✔️

1. No Bearer Token ❌

/index

/ga4gh

2. Invalid Bearer Token ⚠️

/index

/ga4gh

3. Valid Bearer Token ✅

/index

/ga4gh

Environment ⚙️

Open Questions + Next Steps 🌀

Additional Resources 📚

Uh oh!

bwalsh commented Jun 18, 2025

Feature Request ⚙️

Uh oh!

bwalsh commented Jun 18, 2025

Uh oh!

Avantol13 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bwalsh commented Jun 11, 2025 •

edited

Loading

bwalsh commented Jun 11, 2025 •

edited

Loading

lbeckman314 commented Jun 12, 2025 •

edited

Loading

lbeckman314 commented Jun 16, 2025 •

edited

Loading

`/index`

`/ga4gh`

`/index`

`/ga4gh`

`/index`

`/ga4gh`