Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE 9670: Adds AWS credentials refresh to out_prometheus_remote_write #9765

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Tradunsky
Copy link
Contributor

@Tradunsky Tradunsky commented Dec 25, 2024

Handle 403 http error code when credentials expired, credentials refreshed from ~/.aws/credentials.

Fixes: #9670
Similar implementation already exists for kinesis_streams:

aws_client->provider->provider_vtable->


Steps to reproduce:

  1. Create temporary credentials in ~/.aws/credentials with min session duration time 900 (anything to reproduce quickly). Put the credentials form the command output to ~/.aws/credentials under default profile.
aws sts assume-role --role-arn arn:aws:iam::<account_number>:role/prometheus_role --role-session-name tmp --duration-seconds 900
#or 
aws sts get-session-token     --duration-seconds 900     --serial-number arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):mfa/role_name     --token-code MFA-CODE <MFA code> 
  1. Start fluent-bit with the following configuration file:
[SERVICE]
    Flush                      1
    Log_Level                  DEBUG

[INPUT]
    Name                       node_exporter_metrics
    Tag                        metrics
    Scrape_interval            30

[OUTPUT]
    Name                       prometheus_remote_write
    Match                      metrics
    Host                       aps-workspaces.us-west-2.amazonaws.com
    Port                       443
    Uri                        /workspaces/ws-<your workspaceid>/api/v1/remote_write
    AWS_Auth                   true
    AWS_region                 us-west-2
    Tls                        On
    Tls.verify                 On
    add_label                  test test
./bin/fluent-bit -c fluent-bit.conf
  1. Wait until the credentials expire and fluent-bit prometheus_remote_write out plugin starts to fail with 403 credentials expired as shown in the example:
[2024/12/24 16:31:49] [error] [output:prometheus_remote_write:prometheus_remote_write.1] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=403
{"message":"The security token included in the request is expired"}
  1. Repeat the step #1 to refresh credentials in ~/.aws/credentials with much fresh credentials (usually done by an automation):

Before the PR fix: Fluent-bit keeps failing with 403 as it is using old expired credentials that are cached in memory

[error] [output:prometheus_remote_write:prometheus_remote_write.1] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=403
{"message":"The security token included in the request is expired"}

After the PR fix: Fluent-bit picks up fresh credentials without downtime.

[2024/12/24 18:45:21] [ info] [output:prometheus_remote_write:prometheus_remote_write.0] auth error, refreshing creds
[2024/12/24 18:45:21] [debug] [aws_credentials] Refresh called on the env provider
[2024/12/24 18:45:21] [debug] [aws_credentials] Refresh called on the profile provider
[2024/12/24 18:45:21] [debug] [aws_credentials] Reading shared config file.
[2024/12/24 18:45:21] [debug] [aws_credentials] Reading shared credentials file.
[2024/12/24 18:45:21] [debug] [upstream] KA connection #89 to aps-workspaces.us-west-2.amazonaws.com:443 is now available
[2024/12/24 18:45:21] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_RETRY
...

[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] signing request with AWS Sigv4
[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=200
[2024/12/24 18:45:43] [debug] [upstream] KA connection #88 to aps-workspaces.us-west-2.amazonaws.com:443 is now available
[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_OK

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@Tradunsky
Copy link
Contributor Author

Hi team,
@edsiper , @leonardo-albertovich , @fujimotos , @koleini

Hope you had a great holiday season! 🤗

Please let me know if I can do anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS rolling credentials from file support for Prometheus
1 participant