-
Notifications
You must be signed in to change notification settings - Fork 24
fix(benchmark): align metrics config with new avalanchego task format #1580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
benchmark/bootstrap/aws-launch.sh
Outdated
| - > | ||
| sudo -u ubuntu -D /mnt/nvme/ubuntu/avalanchego --login | ||
| time task reexecute-cchain-range CURRENT_STATE_DIR=/mnt/nvme/ubuntu/exec-data/current-state BLOCK_DIR=/mnt/nvme/ubuntu/exec-data/blocks START_BLOCK=1 END_BLOCK=__END_BLOCK__ CONFIG=__CONFIG__ METRICS_ENABLED=false | ||
| METRICS_SERVER_ENABLED=false METRICS_COLLECTOR_ENABLED=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for reviewers: Should metrics be configurable via a CLI flag (e.g., --enable-metrics)?
Currently I kept metrics disabled (METRICS_SERVER_ENABLED=false) to match the previous behavior. To test with metrics enabled, you would need to:
-
Manually edit the script to change
falsetotrue:METRICS_SERVER_ENABLED=true METRICS_COLLECTOR_ENABLED=true
-
Or we could add a
--enable-metricsflag to the script that sets these environment variables accordingly.
Let me know if you'd like me to add this as a configurable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need two different ways to pass parameters to the task. This is why I was saying the parameter needed to be added to the taskfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right we don't need both. I initially included METRICS_COLLECTOR_ENABLED to show it's possible to pass multiple env vars.
The two options serve different purposes:
METRICS_SERVER_ENABLED- Starts a metrics HTTP server that exposes a/metricsendpointMETRICS_COLLECTOR_ENABLED- Controls whether a Prometheus agent is started to collect and forward metrics to a remote Prometheus instance (used by tmpnet and CI workflows)
Since this scripts runs Grafana locally and scrape the metrics endpoint, you only need METRICS_SERVER_ENABLED.
As for why these are environment variables instead of task variables, that's just how we designed it ava-labs/avalanchego#4443. Task variables are for the benchmark itself (which blocks, which config), environment variables are for the runtime context (metrics, Prometheus creds, runner type). Either approach would work, this is just convention.
rkuris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the default on, not off. Why would someone not want metrics?
Metrics Change Detection
|
Default was false, updated it to be true. |
93f01e1 to
3cdd82f
Compare
3cdd82f to
fa8676d
Compare
|
This just needs a little more TLC. As Brandon mentioned, configuration variables for the script should be handled the same way. This means setting METRICS_SERVER_ENABLED should be just another parameter to the task file. We could use some way to shut it off via an option when the script is run. How are invalid values handled? Is the error sane? If I set it to "tru" will it give me a good error message? This must be tested. Did it work? Please update the PR to show how you tested it. Ideally you tested it with firewood bootstrap. |
Summary of offline discussion:
|
Add configurable METRICS_SERVER_ENABLED environment variable to align with AvalancheGo's updated Taskfile which reads metrics configuration from environment variables instead of task variables. - Add --metrics-server flag (true/false, default: true) - Validate input with clear error message for invalid values - Normalize input to lowercase for case-insensitive matching - Display metrics server setting in configuration summary
Validation is done here because the underlying AvalancheGo task only recognizes exact true/false - anything else silently disables metrics. This makes the behavior explicit on the consumer (Firewood) rather than letting typos fail silently. |
| # Normalize to lowercase | ||
| METRICS_SERVER="${2,,}" | ||
| # Validate boolean value | ||
| if [[ "$METRICS_SERVER" != "true" && "$METRICS_SERVER" != "false" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the team agrees we can also map with common boolean aliases: true|t|1|yes|y|on and false|f|0|no|n|off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Fancier handling can be implemented in fwdctl (#1318).
| # Normalize to lowercase | ||
| METRICS_SERVER="${2,,}" | ||
| # Validate boolean value | ||
| if [[ "$METRICS_SERVER" != "true" && "$METRICS_SERVER" != "false" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Fancier handling can be implemented in fwdctl (#1318).
Why this should be merged
Patch is related to #1578
The
aws-launch.shscript is currently not collecting metrics because it uses the old avalanchego task interface. The new version of avalanchego's Taskfile.yml changed how metrics configuration is passed:METRICS_ENABLEDpassed as a task variableMETRICS_SERVER_ENABLEDand / orMETRICS_COLLECTOR_ENABLEDpassed as environment variablesThis mismatch causes metrics to silently not be configured when running benchmarks on EC2 instances.
Related PRs:
How this works
Changed the bootstrap command from:
To:
This aligns with how avalanchego's
benchmark_cchain_range.shnow reads these settings from the environment rather than from task variables.How this was tested
1. AvalancheGo environment variable behavior
Navigate to or clone AvalancheGo:
Default (metrics disabled):
Expected:
"metrics-server-enabled": "false"With metrics enabled:
Expected:
"metrics-server-enabled": "true"2. Firewood
aws-launch.shflag behaviorDefault (metrics enabled):
Expected:
Metrics Server: trueDisable metrics:
Expected:
Metrics Server: falseCase insensitivity:
Expected:
Metrics Server: true(normalized to lowercase)Invalid value (fail fast):
./benchmark/bootstrap/aws-launch.sh --metrics-server tru 2>&1Expected: