Skip to content

Conversation

@Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Jan 3, 2026

Fixes https://github.com/duckdblabs/duckdb-internal/issues/6620

Problem solved

DuckDB silently uses env variables for secrets. This can be undesireable for the following reasons

  • workflows/scripts using env based secrets are moved to other machines that do not have these environment variables.
  • You are accessing public s3 data that does not require a secret, and for some reason the env secrets are no longer valid, resulting in a 403 for public data

For v1.4.4, we are adding the ability to turn off the silent use of env variables for secret generation with the auto_fetch_secret_info_from_env setting. By default it will stay on in v1.4.X releases, but in v1.5.0 we will turn it off.

This setting should not affect secrets created in the aws extension using the env credential chain, as those secrets will be generated with the aws sdk in the aws extension, which will not look at this setting.

@Tmonster Tmonster changed the title introduce option to disable getting secrets from env vars/global scope Option to disable getting secrets from env vars/global scope Jan 3, 2026
@Tmonster
Copy link
Contributor Author

Tmonster commented Jan 3, 2026

CC @samansmink @carlopi (for some reason I cannot request you as reviewers)

@carlopi carlopi requested review from carlopi and samansmink January 3, 2026 06:16
src/s3fs.cpp Outdated
return option_scope;
}

SettingLookupResult S3AuthParams::SetSecretOption(KeyValueSecretReader &secret_reader, string secret_option,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions are a little bit hard to understand right now. I think we should make it a bit more clear what order of checking for settings is when using this.

Perhaps this can be rewritten to match the API used by the KeyValueSecretReader to read settings in a cascading way? I would propose to create a new class CustomKeyValueReader which wraps the base KeyValueReader and injects the env variables .

// Custom KeyValue reader that will look for settings in a cascading way:
//     1. secret
//     2. setting
//     3. env variable
class CustomKeyValueReader {
public:
    HttpfsKeyValueReader(FileOpener &opener_p, optional_ptr<FileOpenerInfo> info, const char **secret_types,
	                     idx_t secret_types_len);
	                     
	SettingLookupResult TryGetSecretKeyOrSettingOrEnv(const string &secret_key, const string &setting_name,  const string &env_var_name, Value &result);
	
protected:
    KeyValueSecretReader base_reader;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not look up a separate env_var_name for secret variables. We can still add the option, but I believe that will make secret initialization more confusing, and unnecessary logic to the PR. This PR is to just check a setting if env vars (Or GLOBAL scope settings) should be used when creating secrets.

The environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_KEY_ID are set in AWSEnvironmentCredentialsProvider::SetExtensionOptionValue on extension load, and then they are placed in db_config.options.set_options. We cannot disable loading the AWS_ACCESS_KEY, or AWS_SECRET_KEY_ID variables, because it is only run once on extension load before the auto_fetch_secret_info_from_env setting can be set/unset.

Also, I wonder if wrapping the KeyValueSecretReader is necessary. Happy to do it for the bug fix PR because it touches less code. I think a cleaner idea for v1.6 would be to add a max_scope parameter that defaults to SettingScope::GLOBAL. For Secret initialization you can pass max_scope = SettingScope::Local or SettingScope::SECRET depending on how strict secret initialization should be, and for other settings you can leave it as the default. If more cases like this come along, it will be easier to modify if the global or local setting should be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh maybe i'm misunderstanding things here, but i'm confused now. The issue states that:

DuckDB automatically uses the environment AWS credentials even when no S3 secret has been created.

I feel like this does not align with the statement you make in this PR:

DuckDB silently uses env variables for secrets

Are we sure this issue is solving the right correct problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB automatically uses the environment AWS credentials even when no S3 secret has been created.

-> environment variables like AWS_REGION, AWS_ACCESS_KEY_ID etc.

DuckDB silently uses env variables for secrets.

-> Environment variables like AWS_REGION, AWS_ACCESS_KEY_ID etc. These variablees are used when no secret has been created/matches scope for the requested url

I think the confusion comes from when the environment variables are loaded and when they get used. I can expand

On httpfs extension load, the environment variables AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID etc. are loaded into the httpfs options s3_region, s3_access_key_id, s3_secret_access_key. See here for the whole list of loaded environment variables. Here is where they actually loaded and stored in the extension configuration options

Environment variables are never checked or read after extension load. After extension load only the extension configuration options are used

After httpfs is loaded, when a user attempts to read a remote url and the S3KeyValueReader cannot find a secret that matches scope etc., the extension config options "s3_access_key_id", "s3_secret_access_key" etc. are used. If these settings are not set by the user after the httpfs extension loads, the environment variable values that were loaded into the extension options are used.

This is how DuckDB ends up "silently" using env variables for secrets, or automatically uses AWS environment credentials when no secret has explicitly been created.

The issue our client is experiencing suffers from this

  • The env vars AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID are set, but they are not valid (how they get in this state I don't know)
  • They initialize a read from a public bucket.
    -> here the invalid AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY_ID values are used when creating the auth params
  • A 403 (unauthorized) is returned

This PR will allow a user to set the auto_fetch_secret_info_from_env option to false, so that when a secret options are initialized in ReadFrom(S3KeyValueReader &secret_reader, const std::string &file_path), the string settings that are returned with GLOBAL_SCOPE are not used. GLOBAL_SCOPE is returned for default values, or env values. Default values for all secret config options are empty, and I don't see that ever changing, so I think this way is relatively safe. Otherwise we have to still check global scope, and also check the current env variable value

@carlopi
Copy link
Collaborator

carlopi commented Jan 12, 2026

My proposal would be:

  • for v1.4.4: change name from auto_fetch_secret_info_from_env to disable_global_s3_configuration, that once set will block GLOBAL scope
  • for v1.5.0: introduce a CALL inherit_aws_config_from_environment(); function, that once called will set the config from ENV variable. It can be called multiple times, and can return a table with the settings that have just being set. Then on 1.5.0 if ones whats to opt-in to the ENV variables, it can do CALL inherit_aws_config_from_environment(). disable_global_s3_configuration default will kept to false

Copy link
Collaborator

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@carlopi carlopi merged commit 2f8e5a4 into duckdb:v1.4-andium Jan 13, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants