-
Notifications
You must be signed in to change notification settings - Fork 21
Introduce arborist RBAC #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
8de2867
f2a4bb1
0f92bd5
1e1cfb3
cfce1d4
192c095
dbb260e
442fecc
d42214c
1b7b6e3
9500f84
6a62030
ad3ca66
2da3cfd
eecf086
9e9e114
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 3.12 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,9 +35,11 @@ python3 -m venv py38-venv | |
| You can install Poetry. Make sure the virtual environment is activated. | ||
|
|
||
| ```console | ||
| # Note: this method is deprecated, returns a 404. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update the install markdown with their recommended method and remove this deprecated one
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. defer for now |
||
| curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python | ||
| source $HOME/.poetry/env | ||
| ``` | ||
| See [poetry docs](https://python-poetry.org/docs/#installing-with-pipx) | ||
|
|
||
| You can install python dependencies using Poetry: | ||
|
|
||
|
|
@@ -72,7 +74,22 @@ sudo pg_ctlcluster 12 main start | |
| If you're on mac, you can install postgres using brew: | ||
|
|
||
| ```console | ||
| brew install postgres | ||
| brew install postgresql | ||
| ``` | ||
| Start the PostgreSQL service: Use Homebrew to start the PostgreSQL service: | ||
|
|
||
| ```console | ||
| # start | ||
| brew services start postgresql | ||
|
|
||
| # verify | ||
| brew services list | ||
|
|
||
| # connect | ||
| psql postgres | ||
|
|
||
| # stop | ||
| brew services stop postgresql | ||
| ``` | ||
|
|
||
| You may also need to create a user. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,7 @@ | ||
| import re | ||
| import sys | ||
| import traceback | ||
|
|
||
| import flask | ||
| import jsonschema | ||
|
|
||
|
|
@@ -159,11 +162,15 @@ def handle_multiple_records_error(err): | |
|
|
||
| @blueprint.errorhandler(UserError) | ||
| def handle_user_error(err): | ||
| print(f"Uncaught Exception: {err}", file=sys.stderr) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use cdislogging, not direct prints
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error=str(err)), 400 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(AuthError) | ||
| def handle_auth_error(err): | ||
| print(f"AuthError: {err}", file=sys.stderr) | ||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error=str(err)), 403 | ||
|
|
||
|
|
||
|
|
@@ -172,6 +179,13 @@ def handle_revision_mismatch(err): | |
| return flask.jsonify(error=str(err)), 409 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(Exception) | ||
| def handle_uncaught_exception(err): | ||
| print(f"Uncaught Exception: {err}", file=sys.stderr) | ||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error=f"Internal server error {type(err)} {err}"), 500 | ||
|
|
||
|
|
||
| @blueprint.record | ||
| def get_config(setup_state): | ||
| config = setup_state.app.config["ALIAS"] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,11 @@ | ||
| import sys | ||
| from functools import wraps | ||
|
|
||
| from flask import current_app | ||
| from flask import request | ||
|
|
||
| from .errors import AuthError | ||
| from .errors import AuthError, AuthzError | ||
| from ..errors import UserError | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it was like this before, but can we switch to absolute imports instead of relative? Those are preferred
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
|
||
|
|
||
| def authorize(*p): | ||
|
|
@@ -20,8 +22,8 @@ def authorize(*p): | |
|
|
||
| @wraps(f) | ||
| def check_auth(*args, **kwargs): | ||
| if not request.authorization: | ||
| raise AuthError("Username / password required.") | ||
| if not request.authorization.parameters.get("username"): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why was this necessary to update?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed |
||
| raise AuthError(f"Basic auth Username / password required. {request.authorization}") | ||
| current_app.auth.auth( | ||
| request.authorization.parameters.get("username"), | ||
| request.authorization.parameters.get("password"), | ||
|
|
@@ -31,13 +33,20 @@ def check_auth(*args, **kwargs): | |
|
|
||
| return check_auth | ||
| else: | ||
| method, resources = p | ||
| method, resources_ = p | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the suffix _? is that a reserved keyword?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shadows name 'resources' from outer scope (the new method) |
||
| if request.authorization and request.authorization.type == "basic": | ||
| current_app.auth.auth( | ||
| request.authorization.parameters.get("username"), | ||
| request.authorization.parameters.get("password"), | ||
| ) | ||
| else: | ||
| if not isinstance(resources, list): | ||
| raise UserError(f"'authz' must be a list, received '{resources}'.") | ||
| current_app.auth.authz(method, list(set(resources))) | ||
| if not isinstance(resources_, list): | ||
| raise UserError(f"'authz' must be a list, received '{resources_}'.") | ||
| return current_app.auth.authz(method, list(set(resources_))) | ||
|
|
||
|
|
||
| def resources(): | ||
| """ | ||
| Returns a list of resources the user has access to. Uses Arborist if available. | ||
| """ | ||
| return current_app.auth.resources() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we likely need to consider caching - and this will be tricky b/c it's authz information. I expect this change is going to clog the network if indexd is making the same requests to arborist for 10k files when a user is requesting signed URLs. Can you do some performance evaluation of previous state and future state (where every record is controlled)? We have some load testing scripts, but I'm not certain they're all up to date to work with the Helm setup. But if you want inspiration: https://github.com/uc-cdis/gen3-code-vigil/blob/master/gen3-load-tests/load_testing_scripts/ga4gh-drs-performance.js Ultimately, I don't want to push this as-is if we can't scale this to work for the same scenario as above (10k files over 3 studies). We may need to consider how to improve efficiency |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| import hashlib | ||
| import sys | ||
|
|
||
| from contextlib import contextmanager | ||
|
|
||
|
|
@@ -133,18 +134,25 @@ def authz(self, method, resource): | |
| try: | ||
| # A successful call from arborist returns a bool, else returns ArboristError | ||
| try: | ||
| token = get_jwt_token() | ||
| if not token: | ||
| raise AuthzError("No JWT token found for authorization check") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arborist allows no token to be sent on purpose, it allows assignment of anonymous access. So we don't want to raise this error here
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| authorized = self.arborist.auth_request( | ||
| get_jwt_token(), "indexd", method, resource | ||
| token, "indexd", method, resource | ||
| ) | ||
| except Exception as e: | ||
| logger.error( | ||
| f"Request to Arborist failed; now checking admin access. Details:\n{e}" | ||
| ) | ||
| authorized = False | ||
|
|
||
| if not authorized: | ||
| token = get_jwt_token() | ||
| if not token: | ||
| raise AuthError("No JWT token found for authorization check") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above comment
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| # admins can perform all operations | ||
| is_admin = self.arborist.auth_request( | ||
| get_jwt_token(), "indexd", method, ["/services/indexd/admin"] | ||
| token, "indexd", method, ["/services/indexd/admin"] | ||
| ) | ||
| if not is_admin and not resource: | ||
| # if `authz` is empty (no `resource`), admin == access to | ||
|
|
@@ -157,7 +165,33 @@ def authz(self, method, resource): | |
| "The indexd admin '/programs' logic is deprecated. Please update your policy to '/services/indexd/admin'" | ||
| ) | ||
| if not is_admin: | ||
| raise AuthError("Permission denied.") | ||
| raise AuthError("Permission denied. (not is_admin)") | ||
| except AuthError as err: | ||
| logger.error(err) | ||
| raise err | ||
| except AuthzError as err: | ||
| logger.error(err) | ||
| raise err | ||
| except Exception as err: | ||
| logger.error(err) | ||
| raise AuthzError(err) | ||
|
|
||
| def resources(self): | ||
| """ | ||
| Returns a list of resources for the given user. | ||
| """ | ||
| if not self.arborist: | ||
| raise AuthError( | ||
| "Arborist is not configured; cannot perform authorization check" | ||
| ) | ||
| token = get_jwt_token() | ||
| try: | ||
| _ = self.arborist.auth_mapping( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't use _ as a variable name unless it's a return that is unused. Here we're returning it, so we need a name
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| jwt=token | ||
| ) | ||
| return _ | ||
| except Exception as err: | ||
| raise AuthError( | ||
| "Failed to get resources from Arborist. Please check your Arborist configuration." | ||
| ) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,13 @@ | ||
| import sys | ||
|
|
||
| import flask | ||
|
|
||
| from indexclient.client import IndexClient | ||
| from doiclient.client import DOIClient | ||
| from dosclient.client import DOSClient | ||
| from hsclient.client import HSClient | ||
|
|
||
| from indexd.auth import AuthzError | ||
| from indexd.utils import hint_match | ||
|
|
||
| from indexd.errors import AuthError | ||
|
|
@@ -20,6 +23,15 @@ | |
| blueprint.dist = [] | ||
|
|
||
|
|
||
| @blueprint.errorhandler(Exception) | ||
| def handle_uncaught_exception(err): | ||
| import traceback | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see previous comment about in-function imports |
||
| import sys | ||
| print("Uncaught Exception:", file=sys.stderr) | ||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error="Internal server error"), 500 | ||
|
|
||
|
|
||
| @blueprint.route("/alias/<path:alias>", methods=["GET"]) | ||
| def get_alias(alias): | ||
| """ | ||
|
|
@@ -60,7 +72,6 @@ def get_record(record): | |
| if not blueprint.dist or "no_dist" in flask.request.args: | ||
| raise | ||
| ret = dist_get_record(record) | ||
|
|
||
| return flask.jsonify(ret), 200 | ||
|
|
||
|
|
||
|
|
@@ -111,6 +122,11 @@ def handle_auth_error(err): | |
| return flask.jsonify(error=str(err)), 403 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(AuthzError) | ||
| def handle_authz_error(err): | ||
| return flask.jsonify(error=str(err)), 401 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(AliasNoRecordFound) | ||
| def handle_no_record_error(err): | ||
| return flask.jsonify(error=str(err)), 404 | ||
|
|
@@ -129,3 +145,6 @@ def get_config(setup_state): | |
| blueprint.alias_driver = alias_config["driver"] | ||
| if "DIST" in setup_state.app.config: | ||
| blueprint.dist = setup_state.app.config["DIST"] | ||
| blueprint.rbac = False | ||
| if "RBAC" in setup_state.app.config: | ||
| blueprint.rbac = setup_state.app.config["RBAC"] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -83,6 +83,8 @@ | |
| }, | ||
| } | ||
|
|
||
| CONFIG["RBAC"] = False # RBAC is not enabled by default | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should consider a different name for this and a little more description here about how this configuration affects the runtime of the service (some instructions to an operator to understand what True/False really does) I would maybe recommend I also suspect there is a use case for authorizing the discovery of the records separately from the authorization required for the underlying files. Important to remember that the authz in indexd records currently is intended to represent the authorization required for the underlying files - not the record itself. And I can forsee a potential use of this feature being: no data is discoverable until you "register", then all data is discoverable but you have to apply to specific studies to get access to underlying data. This solution as it stands is not flexible enough to support the above b/c it couples the authz for the underlying files with the authz to view the indexd record itself. I'm not convinced this is a super future-proof approach. What we could consider are 2 configs, 1 to turn discovery off and one to toggle whether or not there's a separate authz for all records ARE_RECORDS_DISCOVERABLE: False
# None below means that each record will be authorized based
# on the authz specified for the underlying files.
# If you set a global discovery authz, this OVERRIDES
# individual record authz for the purpose of discovery
# (e.g. reading the records). Importantly, it DOES NOT
# change any behavior with regards to the authz on the
# record controlling access to underlying data.
GLOBAL_DISCOVERY_AUTHZ: ["/indexd/discovery"]. # or NoneI'd like to make sure we support something like this to keep things future proof. So if GLOBAL_DISCOVERY_AUTHZ is set, you ignore the authz on the record and use it instead (for ONLY GET/read records). |
||
|
|
||
| AUTH = SQLAlchemyAuthDriver( | ||
| "postgresql://postgres:postgres@localhost:5432/indexd_tests" # pragma: allowlist secret | ||
| ) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,9 @@ | ||
| import sys | ||
| import traceback | ||
|
|
||
| import flask | ||
|
|
||
| from indexd.auth import AuthzError | ||
| from indexd.blueprint import dist_get_record | ||
|
|
||
| from indexd.errors import AuthError | ||
|
|
@@ -15,12 +19,32 @@ | |
| blueprint.dist = [] | ||
|
|
||
|
|
||
| @blueprint.errorhandler(Exception) | ||
| def handle_uncaught_exception(err): | ||
| print(f"Uncaught Exception: {err}", file=sys.stderr) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see previous comments regarding logging. we prefer not to use print statements directly. |
||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error=f"Internal server error"), 500 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(AuthzError) | ||
| def handle_authz_error(err): | ||
| ret = {"msg": str(err), "status_code": 401} | ||
| return flask.jsonify(ret), 401 | ||
|
|
||
|
|
||
| @blueprint.errorhandler(AuthError) | ||
| def handle_requester_auth_error(err): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any new public method needs a Google-style docstring |
||
| ret = {"msg": str(err), "status_code": 403} | ||
| return flask.jsonify(ret), 403 | ||
|
|
||
|
|
||
| @blueprint.route("/ga4gh/dos/v1/dataobjects/<path:record>", methods=["GET"]) | ||
| def get_dos_record(record): | ||
| """ | ||
| Returns a record from the local ids, alias, or global resolvers. | ||
| Returns DOS Schema | ||
| """ | ||
|
|
||
| try: | ||
| ret = blueprint.index_driver.get(record) | ||
| # record may be a baseID or a DID / GUID. If record is a baseID, ret["did"] is the latest GUID for that record. | ||
|
|
@@ -157,3 +181,6 @@ def get_config(setup_state): | |
| blueprint.alias_driver = alias_config["driver"] | ||
| if "DIST" in setup_state.app.config: | ||
| blueprint.dist = setup_state.app.config["DIST"] | ||
| blueprint.rbac = False | ||
| if "RBAC" in setup_state.app.config: | ||
| blueprint.rbac = setup_state.app.config["RBAC"] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,15 @@ | ||
| import os | ||
| import re | ||
| import sys | ||
|
|
||
| import flask | ||
| import json | ||
| from indexd.errors import AuthError, AuthzError | ||
| from indexd.errors import UserError | ||
| from indexd.index.errors import NoRecordFound as IndexNoRecordFound | ||
| from indexd.errors import IndexdUnexpectedError | ||
| from indexd.utils import reverse_url | ||
| import traceback | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please ensure isort-style imports. 3 import sections:
and each one is alphabetically ordered traceback is a built-in so it should be in the first block of imports |
||
|
|
||
| blueprint = flask.Blueprint("drs", __name__) | ||
|
|
||
|
|
@@ -369,6 +372,13 @@ def handle_unexpected_error(err): | |
| return flask.jsonify(ret), err.code | ||
|
|
||
|
|
||
| @blueprint.errorhandler(Exception) | ||
| def handle_uncaught_exception(err): | ||
| print(f"Uncaught Exception: {err}", file=sys.stderr) | ||
| traceback.print_exc(file=sys.stderr) | ||
| return flask.jsonify(error=f"Internal server error"), 500 | ||
|
|
||
|
|
||
| @blueprint.record | ||
| def get_config(setup_state): | ||
| index_config = setup_state.app.config["INDEX"] | ||
|
|
@@ -379,3 +389,6 @@ def get_config(setup_state): | |
| def get_config(setup_state): | ||
| if "DRS_SERVICE_INFO" in setup_state.app.config: | ||
| blueprint.service_info = setup_state.app.config["DRS_SERVICE_INFO"] | ||
| blueprint.rbac = False | ||
| if "RBAC" in setup_state.app.config: | ||
| blueprint.rbac = setup_state.app.config["RBAC"] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything we run must be Python 3.9
can you remove this and reinstall and relock on 3.9?