Skip to content

Conversation

lym953
Copy link
Contributor

@lym953 lym953 commented Oct 10, 2025

What does this PR do?

Support periodic reload of api key, i.e. when api key is read, if it has been longer than a threshold since the last load, then reload the api key.

Motivation

Some customers regularly rotate their api key in a secret. We need to provide a way for them to update our cached key.
DataDog/datadog-lambda-extension#834

Additional Notes

To use this in Lambda extension.

Jira: https://datadoghq.atlassian.net/browse/SVLS-7572

Describe how to test/QA your changes

Added unit tests.
Will also test with Lambda extension.

@lym953 lym953 force-pushed the yiming.luo/api-key-force-reload branch from bab0903 to e0e68a8 Compare October 10, 2025 03:51
@lym953 lym953 changed the title feat: [Trace Stats] Support periodic reload for api key feat: Support periodic reload for api key Oct 10, 2025
None => false,
// Reload only if it has been longer than reload interval since last reload
Some(reload_interval) => {
Instant::now() > last_reload_time + reload_interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to memory order semantics, this could still return true twice for two different threads (rare! But possible).

You may want acquire/release for the last_reload_time variable.

My very favorite post in this is here, which I'm sure you'll enjoy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice post! Glad to learn about this.

I added a separate lock loading_api_key, so we don't need to lock api_key or last_reload_time when calling secret manager. Then I added acquire/release semantics for this lock. Could you take a look?

// Check if reload is needed without acquiring write lock for api_key.
// If no, return the api key directly. If yes, acquire the write lock and reload the api key.
if self.should_load_api_key().await {
let mut api_key_write = api_key.write().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to model the API key as a combined struct so we write-lock only once?

     struct ApiKeyState {
         api_key: Option<String>,
         last_reload_time: Option<Instant>,
     }
     api_key_state: Arc<RwLock<ApiKeyState>>

Then you could use try_lock and if it fails, infer that another thread is already refreshing the key?

I'm not sure what you'd actually DO in the case where Thread 1 is refreshing the key and Thread 2 hits the refresh, which will be very likely as it's time based.

Presumably you either assume that the old key is still valid for some time, so thread 2 can simply keep using the old key.

If that's invalid, you could wait until the lock drops and read the new key, but that will block all flushing.

Today it looks like this will block flushing until we refresh the key anyway, is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to model the API key as a combined struct so we write-lock only once?

Done

Presumably you either assume that the old key is still valid for some time, so thread 2 can simply keep using the old key.

I think we can assume this and require customers to keep a grace period for the old key.

Today it looks like this will block flushing until we refresh the key anyway, is that expected?

I didn't figure out a good way to avoid this before, but it should be fixed in the latest commit.

@lym953 lym953 marked this pull request as draft October 10, 2025 21:08
// This prevents duplicate loads from multiple threads
if self.should_load_api_key().await {
let api_key_value = (resolver_fn)().await;
*api_key_state.write().await = ApiKeyState {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire the RwLock for api_key after resolver_fn() is called, so the AWS secret manager call won't block reads from other threads.

@lym953 lym953 marked this pull request as ready for review October 10, 2025 21:28
Base automatically changed from yiming.luo/fix-clippy-dogstatsd to main October 14, 2025 15:27
@lym953 lym953 force-pushed the yiming.luo/api-key-force-reload branch from ebeeedf to bb7aa47 Compare October 14, 2025 15:28
@lym953 lym953 merged commit 936b344 into main Oct 14, 2025
26 checks passed
@lym953 lym953 deleted the yiming.luo/api-key-force-reload branch October 14, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants