Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial approach for tracking new feature ids #3216

Draft
wants to merge 23 commits into
base: version-3
Choose a base branch
from

Conversation

richardwang1124
Copy link
Contributor

@richardwang1124 richardwang1124 commented Mar 21, 2025

Draft approach for tracking credentials related feature ids.

The proposed approach is to add credentials metrics in the signing plugin. When the signing handler is called, we retrieve the credentials class type, determine the relevant metrics for the credentials, and pass them to the UserAgent through the with_metric method.

For this to work a few things were changed:

  • Currently, the UserAgent plugin has a higher priority than the signing plugin. In order for the credentials metrics to be included in the metrics header, the priority of the UserAgent plugin was lowered to be below the signing plugin.
  • To access the credentials class within the signing handler, a signer attribute reader was added to the SignatureV4 class.
  • Each credentials class type has been modified to include a metrics attribute which is set by the credential_provider_chain. The metrics are passed into each class's initializer as a new option.

More details in review comments.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

  1. To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.

  2. For generated code changes, please checkout below instructions first:
    https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

github-actions bot commented Mar 21, 2025

Detected 1 possible performance regressions:

  • aws-sdk-core.gem_size_kb - z-score regression: 381.0 -> 382.5. Z-score: Infinity

Copy link

Detected 30 possible performance regressions:

  • aws-sdk-dynamodb.require_mem_retained - z-score regression: 2.54 -> 2.54. Z-score: 63.76
  • aws-sdk-dynamodb.require_mem_allocated - z-score regression: 10.75 -> 10.76. Z-score: 132.35
  • aws-sdk-dynamodb.put_item_small_allocated_kb - z-score regression: 77.83 -> 100.64. Z-score: 21.42
  • aws-sdk-dynamodb.put_item_large_allocated_kb - z-score regression: 11736.62 -> 11759.42. Z-score: 21.57
  • aws-sdk-kinesis.client_mem_allocated - z-score regression: 0.62 -> 0.62. Z-score: 86.85
  • aws-sdk-kinesis.put_record_small_allocated_kb - z-score regression: 48.73 -> 70.08. Z-score: 597.72
  • aws-sdk-kinesis.put_record_small_ms - z-score regression: 0.45 -> 0.81. Z-score: 19.39
  • aws-sdk-kinesis.put_record_large_allocated_kb - z-score regression: 196.84 -> 218.19. Z-score: 597.72
  • aws-sdk-s3.client_mem_retained - z-score regression: 0.12 -> 0.12. Z-score: Infinity
  • aws-sdk-s3.client_mem_allocated - z-score regression: 0.9 -> 0.9. Z-score: Infinity
  • aws-sdk-s3.get_object_small_allocated_kb - z-score regression: 63.1 -> 85.49. Z-score: 20.45
  • aws-sdk-s3.get_object_large_allocated_kb - z-score regression: 21568.38 -> 21590.71. Z-score: 19.37
  • aws-sdk-s3.put_object_small_allocated_kb - z-score regression: 82.14 -> 102.61. Z-score: 419.84
  • aws-sdk-s3.put_object_small_ms - z-score regression: 0.71 -> 1.02. Z-score: 18.28
  • aws-sdk-s3.put_object_large_allocated_kb - z-score regression: 9298.43 -> 9318.97. Z-score: 248.92
  • aws-sdk-s3.put_object_multipart_large_allocated_kb - z-score regression: 32122.47 -> 32207.35. Z-score: 408.5
  • aws-sdk-s3.head_object_allocated_kb - z-score regression: 111.34 -> 132.76. Z-score: 268.56
  • aws-sdk-sns.client_mem_retained - z-score regression: 0.1 -> 0.1. Z-score: Infinity
  • aws-sdk-sns.client_mem_allocated - z-score regression: 0.58 -> 0.59. Z-score: Infinity
  • aws-sdk-sns.publish_small_allocated_kb - z-score regression: 59.51 -> 81.1. Z-score: 387.39
  • aws-sdk-sns.publish_small_ms - z-score regression: 0.53 -> 0.84. Z-score: 18.18
  • aws-sdk-sns.publish_large_allocated_kb - z-score regression: 440.77 -> 462.1. Z-score: 303.71
  • aws-sdk-cloudwatchlogs.client_mem_allocated - z-score regression: 0.59 -> 0.6. Z-score: 21.94
  • aws-sdk-cloudwatchlogs.put_log_events_small_allocated_kb - z-score regression: 58.49 -> 79.83. Z-score: 424.24
  • aws-sdk-cloudwatchlogs.put_log_events_small_ms - z-score regression: 0.54 -> 0.84. Z-score: 17.23
  • aws-sdk-cloudwatchlogs.put_log_events_large_allocated_kb - z-score regression: 5992.49 -> 6013.79. Z-score: 391.7
  • aws-sdk-core.gem_size_kb - z-score regression: 381.0 -> 383.0. Z-score: Infinity
  • aws-sdk-cloudwatch.client_mem_allocated - z-score regression: 0.58 -> 0.59. Z-score: 39.15
  • aws-sdk-cloudwatch.put_metric_data_small_allocated_kb - z-score regression: 59.14 -> 80.47. Z-score: 326.29
  • aws-sdk-cloudwatch.put_metric_data_large_allocated_kb - z-score regression: 1535.47 -> 1556.81. Z-score: 424.24

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

unless v2_signing?(context.config)
signer = Sign.signer_for(
context[:auth_scheme],
context.config,
context[:sigv4_region],
context[:sigv4_credentials]
)
# TODO: temp added this, double check
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, only the sigv4 signer type has credentials, so we'll only check for metrics for this signer type.

end

private

def with_metric(credentials, &block)
puts "in with metric"
unless credentials && credentials.respond_to?(:metrics)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there are other credential types that aren't included in the credential provider chain (e.g. Sigv4::StaticCredentialsProvider) but I'm assuming we would only emit metics for the credential types in found in the chain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because our credentials interface can be ducktyped. StaticCredentialsProvider in Sigv4 should not really be used here. I think checking respond_to? or explicit classes that we care about (preferred?) might be better.


metrics = []
# Add check if flag isn't set, then emit Credentials CODE
if !credentials.metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The credential_provider_chain sets the metrics attribute. If it isn't set, then the credentials must have been provided through code/manually.

@@ -196,7 +228,7 @@ def metric_metadata
end
end

handler(Handler, step: :sign, priority: 97)
handler(Handler, step: :sign, priority: 49)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowered the priority so that the handler is called after signing is complete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want this to be on sign with priority 5 (basically one of the last things we do). I think user agent shows up in client side metrics so we can "disable" that feature or try to refactor accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what you mean by user agent showing up in client side metrics? What/where is this feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -51,6 +52,7 @@ def initialize(options = {})
@client = client_opts[:client] || STS::Client.new(client_opts)
@async_refresh = true
super
@metrics << 'CREDENTIALS_STS_ASSUME_ROLE' if @metrics
Copy link
Contributor Author

@richardwang1124 richardwang1124 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern shows up a few times for some other credential types as well. The service call (SSO, STS, etc) executed as part of the credential provider resolution process needs to contain all metrics up until that point. Only after the call succeeds should we include the metric for the call (CREDENTIALS_SSO, CREDENTIALS_STS_ASSUME_ROLE, etc). This is why I've included this line after super is executed, which executes refresh where the service call happens.

def static_credentials(options)
if options[:config]
Credentials.new(
options[:config].access_key_id,
options[:config].secret_access_key,
options[:config].session_token,
account_id: options[:config].account_id
account_id: options[:config].account_id,
metrics: ['CREDENTIALS_PROFILE']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of the credential types, I tried to pass the metrics information as part of the options instead of creating new parameters in order to preserve some level of backwards compatibility, but let me know if there's a better way to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach might be in the Sign plugin, you'd have a giant switch case that checks if is_a?(Credentials) then use API private credentials.source method to get either :profile or :code. In the credentials provider chain here, you can do

credentials = Credentials.new(...)
credentials.source = :profile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally started off with a giant switch statement in Sign with custom logic in each branch for the different credential providers, but then I realized I could make it more generic by just implementing :metrics for each class and calling credentials.metrics. Is the switch case necessary if each class has a source method?

ProcessCredentials.new([process_provider]) if process_provider
if process_provider
credentials = ProcessCredentials.new([process_provider])
credentials.metrics = ['CREDENTIALS_PROFILE_PROCESS', 'CREDENTIALS_PROCESS']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessCredentials are the only type of credentials which I've allowed access to modifying the metrics attribute. This is because I couldn't find a way to pass metrics as an optional argument to the initialize method since currently it only takes the process itself as input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all credential types should have an accessor like this - as API private. I think you should call it "source" and abstract away the metrics. The user agent plugin would check the source and apply the correct metric.

@@ -132,10 +132,19 @@ def assume_role_web_identity_credentials_from_config(opts = {})
if @config_enabled && @parsed_config
entry = @parsed_config.fetch(p, {})
if entry['web_identity_token_file'] && entry['role_arn']
# TODO: CREDENTIALS_PROFILE_STS_WEB_ID_TOKEN (q)

# Needed because of AssumeRole flow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed this here because of the AssumeRole source_profile chaining. If this method was called to create an AssumeRoleWebIDCredentials, then the metrics would just be ['CREDENTIALS_PROFILE_STS_WEB_ID_TOKEN']. However, if this method was called as part of the AssumeRole source_profile chain resolution, then it would need to add CREDENTIALS_PROFILE_STS_WEB_ID_TOKEN to the existing metrics list which contains CREDENTIALS_PROFILE_SOURCE_PROFILE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think if you wrap resolution in blocks with UserAgent.metric do then it becomes simpler? We can try this.

@@ -148,9 +157,10 @@ def assume_role_web_identity_credentials_from_config(opts = {})
# file, if present.
def sso_credentials_from_config(opts = {})
p = opts[:profile] || @profile_name
credentials = sso_credentials_from_profile(@parsed_credentials, p)
metrics = opts.delete(:metrics)
credentials = sso_credentials_from_profile(@parsed_credentials, p, metrics)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar idea to my comment above. If this method is called as part of the AssumeRole source_profile resolution, then it would need to include the existing metrics.

opts[:visited_profiles] ||= Set.new
opts[:credentials] = resolve_source_profile(opts[:source_profile], opts)
opts[:credentials], metrics = resolve_source_profile(opts[:source_profile], opts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was kind of a hacky way for me to get the right metrics depending on what kind of credential was returned after resolving source_profile. If there's a better way for me to get this information please let me know.

@@ -64,7 +90,10 @@ def self.feature(_feature, &block)

def self.metric(*metrics, &block)
Thread.current[:aws_sdk_core_user_agent_metric] ||= []
metrics = metrics.map { |metric| METRICS[metric] }.compact
metrics = metrics.map do |metric|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the chaining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this change. I was adding some extra put statements in the block temporarily.

@@ -196,7 +228,7 @@ def metric_metadata
end
end

handler(Handler, step: :sign, priority: 97)
handler(Handler, step: :sign, priority: 49)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want this to be on sign with priority 5 (basically one of the last things we do). I think user agent shows up in client side metrics so we can "disable" that feature or try to refactor accordingly.

@@ -39,6 +39,7 @@ class AssumeRoleCredentials
# end
#
def initialize(options = {})
@metrics = options.delete(:metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the pattern we should do. Since refresh is called on initialization (with super) to fetch credentials, we should just wrap the refresh logic with Aws::Plugins::UserAgent.metric(WHATEVER) do .. refresh .. end. I assume you added this because of nested calls? Wrapping multiple UserAgent.metric do..end blocks will take care of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this more as a precautionary measure in case the :metrics option interfered with later code.

If we do wrap around refresh instead, should I wrap the entire super or would I have to modify the RefreshingCredentials code? I'm assuming we'd only want to emit metrics for the credential provider types we're tracking and not just any class which includes RefreshingCredentials.

def static_credentials(options)
if options[:config]
Credentials.new(
options[:config].access_key_id,
options[:config].secret_access_key,
options[:config].session_token,
account_id: options[:config].account_id
account_id: options[:config].account_id,
metrics: ['CREDENTIALS_PROFILE']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach might be in the Sign plugin, you'd have a giant switch case that checks if is_a?(Credentials) then use API private credentials.source method to get either :profile or :code. In the credentials provider chain here, you can do

credentials = Credentials.new(...)
credentials.source = :profile

ProcessCredentials.new([process_provider]) if process_provider
if process_provider
credentials = ProcessCredentials.new([process_provider])
credentials.metrics = ['CREDENTIALS_PROFILE_PROCESS', 'CREDENTIALS_PROCESS']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all credential types should have an accessor like this - as API private. I think you should call it "source" and abstract away the metrics. The user agent plugin would check the source and apply the correct metric.

end

private

def with_metric(credentials, &block)
puts "in with metric"
unless credentials && credentials.respond_to?(:metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because our credentials interface can be ducktyped. StaticCredentialsProvider in Sigv4 should not really be used here. I think checking respond_to? or explicit classes that we care about (preferred?) might be better.

@@ -132,10 +132,19 @@ def assume_role_web_identity_credentials_from_config(opts = {})
if @config_enabled && @parsed_config
entry = @parsed_config.fetch(p, {})
if entry['web_identity_token_file'] && entry['role_arn']
# TODO: CREDENTIALS_PROFILE_STS_WEB_ID_TOKEN (q)

# Needed because of AssumeRole flow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think if you wrap resolution in blocks with UserAgent.metric do then it becomes simpler? We can try this.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link

You have made a change to core without a corresponding change to the CHANGELOG.md. This change will not result in a new version and will not published unless an entry is added to CHANGELOG.md.

Copy link
Contributor

@jterapin jterapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credentials are tricky to track down so good job so far! We should have a team discussion to finalize tracking and testing strategies to see what makes sense.

@@ -1,6 +1,8 @@
Unreleased Changes
------------------

* Feature - Add feature id tracking for credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use similar wording that we had previously when we update business metrics, i.e. example 1 and example 2

@@ -196,7 +228,7 @@ def metric_metadata
end
end

handler(Handler, step: :sign, priority: 97)
handler(Handler, step: :sign, priority: 49)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -342,7 +365,11 @@ def assume_role_process_credentials_from_config(profile)
if @parsed_config
credential_process ||= @parsed_config.fetch(profile, {})['credential_process']
end
ProcessCredentials.new([credential_process]) if credential_process
if credential_process
credentials = ProcessCredentials.new([credential_process])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is possible but what about having each credential class have a reader attribute that has these hardcoded within the class than setting to the metric as an attribute here?

I'm thinking depending on some context, this may not work but trying to think of ways to make this file easier to read through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that hardcoding the metrics into the classes would make the rest of the code easier to read. I think my biggest concern was that depending on where the source is from (e.g. profile or env vars), the metrics for that particular instance of the class would change, which is why I was setting the metrics at each location. However, thinking on it more, I can take Matt's source idea and set a write-only source attribute for each class, and then create a metrics method which, depending on the source, will return the correct list of metrics for that credential type. Let me try that.

Comment on lines +417 to +425
with_metrics('CREDENTIALS_PROFILE_SSO') do
credentials = SSOCredentials.new(
sso_account_id: prof_config['sso_account_id'],
sso_role_name: prof_config['sso_role_name'],
sso_session: prof_config['sso_session'],
sso_region: sso_region,
sso_start_url: sso_start_url,
)
credentials.metrics = %w[CREDENTIALS_PROFILE_SSO CREDENTIALS_SSO]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that CREDENTIALS_PROFILE_SSO will be logged twice? and you have to do uniq?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of them is CREDENTIALS_PROFILE_SSO for current SSO and one of them is CREDENTIALS_PROFILE_SSO_LEGACY for legacy SSO, so it should only be logged once.

If you're referring to why I'm using CREDENTIALS_PROFILE_SSO in with_metrics and also setting it in credentials.metrics, this is because the SSO service call which will resolve the credentials needs to only have CREDENTIALS_PROFILE_SSO (which corresponds to with_metrics('CREDENTIALS_PROFILE_SSO')) and if the credentials resolve successfully, then normal calls should include both, which is why I set credentials.metrics = %w[CREDENTIALS_PROFILE_SSO CREDENTIALS_SSO]. The CREDENTIALS_PROFILE_SSO metric isn't duplicated because with_metric uses UserAgent.metric which adds the metric, calls the block, and then removes the metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants