-
Notifications
You must be signed in to change notification settings - Fork 21
Introduce arborist RBAC #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
reviewers guideSetup Instructions
Testing
Code Review
|
Testing Steps 🌀1. Start Postgres ✔️2. Run Indexd Tests ✔️➜ gh pr checkout 400
Switched to branch 'feature/rbac'
➜ poetry install
Installing the current project: indexd (5.1.2)
➜ poetry run pytest -vv --cov=indexd --cov-report xml tests
537 passed, 303 skipped, 5320 warnings in 1513.87s (0:25:13) |
This comment was marked as duplicate.
This comment was marked as duplicate.
Deployment Steps 🚀1. Deploy Gen3 ✔️Tip Required deployment updates:
|
@lbeckman314 Can you remove the comment above re. the use case document? The request is here: |
|
Update 💥
|
improve arborist check
adds additional checks improve test_multiple_endpoints
Avantol13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The detail in the PR description is great, but it must follow our PR template to be parsed correctly. Please move any relevant documentation to a markdown file in the docs folder if you think it's widely useful going forward, otherwise, ensure the PR description follows our template.
You cannot include other markdown headings due to the automated parsing for our release notes, but you can include text above the templated headings with any additional information about the PR. PR template
This initial review is a cursory, high-level single read-through of the code itself and I have not done any setup or testing (which we will need to do eventually).
|
|
||
| @blueprint.errorhandler(UserError) | ||
| def handle_user_error(err): | ||
| print(f"Uncaught Exception: {err}", file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use cdislogging, not direct prints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| You can install Poetry. Make sure the virtual environment is activated. | ||
|
|
||
| ```console | ||
| # Note: this method is deprecated, returns a 404. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the install markdown with their recommended method and remove this deprecated one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer for now
| try: | ||
| token = get_jwt_token() | ||
| if not token: | ||
| raise AuthzError("No JWT token found for authorization check") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arborist allows no token to be sent on purpose, it allows assignment of anonymous access. So we don't want to raise this error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if not authorized: | ||
| token = get_jwt_token() | ||
| if not token: | ||
| raise AuthError("No JWT token found for authorization check") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| ) | ||
| token = get_jwt_token() | ||
| try: | ||
| _ = self.arborist.auth_mapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use _ as a variable name unless it's a return that is unused. Here we're returning it, so we need a name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| assert data_all_by_md.status_code == 403, f"Expected status code 403, got {data_all_by_md.status_code}" | ||
|
|
||
|
|
||
| def test_multiple_endpoints(client, user, mock_arborist_requests, is_rbac_configured): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous test comment, we need to break this up into smaller, more focused tests
| @@ -0,0 +1 @@ | |||
| 3.12 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything we run must be Python 3.9
can you remove this and reinstall and relock on 3.9?
|
|
||
|
|
||
| @blueprint.errorhandler(AuthError) | ||
| def handle_requester_auth_error(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any new public method needs a Google-style docstring
| from indexd.index.errors import NoRecordFound as IndexNoRecordFound | ||
| from indexd.errors import IndexdUnexpectedError | ||
| from indexd.utils import reverse_url | ||
| import traceback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please ensure isort-style imports.
3 import sections:
- python built-ins
- third-party
- within this code
and each one is alphabetically ordered
traceback is a built-in so it should be in the first block of imports
| }, | ||
| } | ||
|
|
||
| CONFIG["RBAC"] = False # RBAC is not enabled by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider a different name for this and a little more description here about how this configuration affects the runtime of the service (some instructions to an operator to understand what True/False really does)
I would maybe recommend ARE_RECORDS_DISCOVERABLE and default to True. Note in a comment that the records themselves contain only file metadata which includes required authorization for underlying files.
I also suspect there is a use case for authorizing the discovery of the records separately from the authorization required for the underlying files. Important to remember that the authz in indexd records currently is intended to represent the authorization required for the underlying files - not the record itself. And I can forsee a potential use of this feature being: no data is discoverable until you "register", then all data is discoverable but you have to apply to specific studies to get access to underlying data.
This solution as it stands is not flexible enough to support the above b/c it couples the authz for the underlying files with the authz to view the indexd record itself. I'm not convinced this is a super future-proof approach.
What we could consider are 2 configs, 1 to turn discovery off and one to toggle whether or not there's a separate authz for all records
ARE_RECORDS_DISCOVERABLE: False
# None below means that each record will be authorized based
# on the authz specified for the underlying files.
# If you set a global discovery authz, this OVERRIDES
# individual record authz for the purpose of discovery
# (e.g. reading the records). Importantly, it DOES NOT
# change any behavior with regards to the authz on the
# record controlling access to underlying data.
GLOBAL_DISCOVERY_AUTHZ: ["/indexd/discovery"]. # or NoneI'd like to make sure we support something like this to keep things future proof. So if GLOBAL_DISCOVERY_AUTHZ is set, you ignore the authz on the record and use it instead (for ONLY GET/read records).
Status
Problem:
As an indexd or DRS user, when I list objects, I only expect to see items that belong to projects I have access to.
Solution:
Assuming a Bearer token is included on the request, I expect indexd to query arborist, extract the projects I have access to and add those as an "authz" filter when querying the database. A feature flag should control this query injection, the flag should default to FALSE, as this will improve chances of getting a PR approved. All current unit tests should pass. Additional unit tests should confirm behavior.
Alternatives:
We could have a RBAC aware proxy front end indexd - however will add complexity and processing overhead
Context:
Main auth code has two methods
authandauthz. The indexd.authorize method checks if Basic auth header is present auth is called otherwise authz is called. The revproxy gateway injects this header here This reliance on Basic auth is concerning and it's rationale is undocumented. It appears that it is not used for either create or read based on client APIApproach:
Add code to get_index to call auth_mapping
and inject resources (projects) into query.