-
Notifications
You must be signed in to change notification settings - Fork 60
Add Redshift Iam Idc token authentication method with an eye towards future supported Idps #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-redshift contributing guide. |
We expect users of this method to provide a YAML-structured set of params including a uri, an authentication string, and whatever paramters might be needed to construct the correct payload equivalent to data in a curl request. There is an all-important under the hood POST which needs a set of params unique to each identity provider to generate access tokens for use with TokenAuthIdpPlugin.
829426f
to
afd9d13
Compare
dbt/adapters/redshift/connections.py
Outdated
normal request failures. | ||
""" | ||
# Handle the 429 rate-limiting case first | ||
if response.status_code == 429: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're hitting the rate limit could we add an exponential backoff/retry strategy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, I kind of knew you were going to prompt me for that. We're actively determining that customer experience. Let me filter this up to the wider team to see if it makes sense for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the scenarios where we're hitting the rate limit? I don't see a retry on this, so is this using some generalized retry? Does this get run on each model? Each run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we actually do a retry strategy -- the necessity of which is undetermined and trending towards 'not needed' based on @jenniferjsmmiller 's current research -- it'd be used on every connection. The 429 is specifically a rate limiting error status from okta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're reusing connections where possible, this feels like a low occurrence event. In other words, if we hit the rate limit, my guess is that something else is actually going wrong or the user is simply abusing the application (e.g. running a lot of threads of dbt-redshift in parallel via something like airflow, which isn't officially supported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't reuse redshift connections do we? And also there was concern about 100 threads being set and this occuring at startup but even that was something Jenn didn't seem to trigger in their research 🤞
dbt/adapters/redshift/connections.py
Outdated
normal request failures. | ||
""" | ||
# Handle the 429 rate-limiting case first | ||
if response.status_code == 429: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the scenarios where we're hitting the rate limit? I don't see a retry on this, so is this using some generalized retry? Does this get run on each model? Each run?
Okay team, I've generalized the framework out to include an Entra option which may work (may need some slight adjustment but I've tested this pretty well considering our infra options imo). Moreover, the okta framework is better:tm: now. A profile that wants to use this method will thus specify at least:
I've added a bunch more unit tests and re-tested this using my refresh token end to end model build test :D |
resolves #898
Problem
Add
refresh_token
-based authentication method.refresh_token
allows for use of Iam Idc tokens that we generate adhoc for theredshift_connector.connect
call. We had originally sought to provide a single access token and reuse it until TTL reached, but this is impossible -- tokens are only good for one use in this integration. Hence we must adhoc generate one each time and refresh tokens are an expedient and industry compliant method for doing soSolution
Provide a refresh token endpoint and necessary information to enable this.
Additional testing
End to end test using the following profile which points to a test redshift cluster on an AWS account with an integrated Okta <> Iam idc <> Redshift token authentication suite.
where
...
is some credential I've used for testing.Checklist