-
Notifications
You must be signed in to change notification settings - Fork 51
Option to disable getting secrets from env vars/global scope #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC @samansmink @carlopi (for some reason I cannot request you as reviewers) |
src/s3fs.cpp
Outdated
| return option_scope; | ||
| } | ||
|
|
||
| SettingLookupResult S3AuthParams::SetSecretOption(KeyValueSecretReader &secret_reader, string secret_option, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these functions are a little bit hard to understand right now. I think we should make it a bit more clear what order of checking for settings is when using this.
Perhaps this can be rewritten to match the API used by the KeyValueSecretReader to read settings in a cascading way? I would propose to create a new class CustomKeyValueReader which wraps the base KeyValueReader and injects the env variables .
// Custom KeyValue reader that will look for settings in a cascading way:
// 1. secret
// 2. setting
// 3. env variable
class CustomKeyValueReader {
public:
HttpfsKeyValueReader(FileOpener &opener_p, optional_ptr<FileOpenerInfo> info, const char **secret_types,
idx_t secret_types_len);
SettingLookupResult TryGetSecretKeyOrSettingOrEnv(const string &secret_key, const string &setting_name, const string &env_var_name, Value &result);
protected:
KeyValueSecretReader base_reader;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not look up a separate env_var_name for secret variables. We can still add the option, but I believe that will make secret initialization more confusing, and unnecessary logic to the PR. This PR is to just check a setting if env vars (Or GLOBAL scope settings) should be used when creating secrets.
The environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_KEY_ID are set in AWSEnvironmentCredentialsProvider::SetExtensionOptionValue on extension load, and then they are placed in db_config.options.set_options. We cannot disable loading the AWS_ACCESS_KEY, or AWS_SECRET_KEY_ID variables, because it is only run once on extension load before the auto_fetch_secret_info_from_env setting can be set/unset.
Also, I wonder if wrapping the KeyValueSecretReader is necessary. Happy to do it for the bug fix PR because it touches less code. I think a cleaner idea for v1.6 would be to add a max_scope parameter that defaults to SettingScope::GLOBAL. For Secret initialization you can pass max_scope = SettingScope::Local or SettingScope::SECRET depending on how strict secret initialization should be, and for other settings you can leave it as the default. If more cases like this come along, it will be easier to modify if the global or local setting should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh maybe i'm misunderstanding things here, but i'm confused now. The issue states that:
DuckDB automatically uses the environment AWS credentials even when no S3 secret has been created.
I feel like this does not align with the statement you make in this PR:
DuckDB silently uses env variables for secrets
Are we sure this issue is solving the right correct problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB automatically uses the environment AWS credentials even when no S3 secret has been created.
-> environment variables like AWS_REGION, AWS_ACCESS_KEY_ID etc.
DuckDB silently uses env variables for secrets.
-> Environment variables like AWS_REGION, AWS_ACCESS_KEY_ID etc. These variablees are used when no secret has been created/matches scope for the requested url
I think the confusion comes from when the environment variables are loaded and when they get used. I can expand
On httpfs extension load, the environment variables AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID etc. are loaded into the httpfs options s3_region, s3_access_key_id, s3_secret_access_key. See here for the whole list of loaded environment variables. Here is where they actually loaded and stored in the extension configuration options
Environment variables are never checked or read after extension load. After extension load only the extension configuration options are used
After httpfs is loaded, when a user attempts to read a remote url and the S3KeyValueReader cannot find a secret that matches scope etc., the extension config options "s3_access_key_id", "s3_secret_access_key" etc. are used. If these settings are not set by the user after the httpfs extension loads, the environment variable values that were loaded into the extension options are used.
This is how DuckDB ends up "silently" using env variables for secrets, or automatically uses AWS environment credentials when no secret has explicitly been created.
The issue our client is experiencing suffers from this
- The env vars AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID are set, but they are not valid (how they get in this state I don't know)
- They initialize a read from a public bucket.
-> here the invalid AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID values are used when creating the auth params - A 403 (unauthorized) is returned
This PR will allow a user to set the auto_fetch_secret_info_from_env option to false, so that when a secret options are initialized in ReadFrom(S3KeyValueReader &secret_reader, const std::string &file_path), the string settings that are returned with GLOBAL_SCOPE are not used. GLOBAL_SCOPE is returned for default values, or env values. Default values for all secret config options are empty, and I don't see that ever changing, so I think this way is relatively safe. Otherwise we have to still check global scope, and also check the current env variable value
|
My proposal would be:
|
carlopi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixes https://github.com/duckdblabs/duckdb-internal/issues/6620
Problem solved
DuckDB silently uses env variables for secrets. This can be undesireable for the following reasons
For v1.4.4, we are adding the ability to turn off the silent use of env variables for secret generation with the
auto_fetch_secret_info_from_envsetting. By default it will stay on in v1.4.X releases, but in v1.5.0 we will turn it off.This setting should not affect secrets created in the aws extension using the
envcredential chain, as those secrets will be generated with the aws sdk in the aws extension, which will not look at this setting.