Skip to content

Conversation

@RodrigoVillar
Copy link
Contributor

@RodrigoVillar RodrigoVillar commented Oct 14, 2025

Why this should be merged

As referenced in #4362, this PR allows clients of the reexecution test to specify the port which the metrics server will listen on.

How this works

Adds a metrics-server-port flag to the reexecution test.

How this was tested

CI + queried metrics server locally.

Need to be documented in RELEASES.md?

No

@RodrigoVillar RodrigoVillar self-assigned this Oct 14, 2025
Base automatically changed from rodrigo/decouple-reexecution-metrics to master October 20, 2025 13:46
@RodrigoVillar
Copy link
Contributor Author

RodrigoVillar commented Oct 20, 2025

Here's an instance of setting the metrics port locally with the command: task reexecute-cchain-range START_BLOCK=1 END_BLOCK=50_000 METRICS_SERVER_PORT=5000

Screenshot 2025-10-20 at 15 42 55

@joshua-kim joshua-kim moved this to Backlog 🧊 in avalanchego Oct 27, 2025
@RodrigoVillar RodrigoVillar marked this pull request as ready for review October 28, 2025 12:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the ability to specify a custom port for the metrics server in the reexecution test, addressing issue #4362. Previously, the metrics server would only listen on a dynamic port (127.0.0.1:0), but now users can explicitly configure the port via a command-line flag.

Key changes:

  • Added a metrics-server-port flag to the reexecution test with a default value of 0 (dynamic port allocation)
  • Updated the Prometheus server implementation to accept a port parameter
  • Propagated the port configuration through the test infrastructure and build scripts

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/reexecute/c/vm_reexecute_test.go Added metricsServerPort variable and flag, threaded through to benchmark function and server startup
tests/reexecute/c/README.md Documented the new METRICS_SERVER_PORT parameter
tests/prometheus_server.go Refactored to support explicit port configuration via NewPrometheusServerWithPort
scripts/benchmark_cchain_range.sh Added METRICS_SERVER_PORT environment variable support
Taskfile.yml Propagated METRICS_SERVER_PORT variable through task definitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 32 to 33
${METRICS_COLLECTOR_ENABLED:+--metrics-collector-enabled=\"${METRICS_COLLECTOR_ENABLED}\"} \
${METRICS_SERVER_PORT:+--metrics-server-port=\"${METRICS_SERVER_PORT}\"}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the ordering here and do a pass throughout to ensure that the metric server port param is next to metrics server enabled so that these are properly clustered together by the resource that they change?

Is it possible to use a single parameter rather than a metrics server port and metrics server enabled param? For example, we could use only the port param, treat an unset param as disabled, and then treat 0 or another port param as enabling it with the specified port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a single parameter rather than a metrics server port and metrics server enabled param? For example, we could use only the port param, treat an unset param as disabled, and then treat 0 or another port param as enabling it with the specified port.

Yes, I've updated the PR to use a single parameter metrics-server-port as described above.

Could we change the ordering here and do a pass throughout to ensure that the metric server port param is next to metrics server enabled so that these are properly clustered together by the resource that they change?

Now that we have just one metrics server parameter, this comment is no longer applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: as requested in #4418 (comment), I've reverted the unification of the metrics server flags: 3651f47

@maru-ava
Copy link
Contributor

I would like to see the following comment addressed: #4415 (comment)

In keeping with that previous comment, I remain unconvinced of the wisdom of requiring that someone set a port to enable collection. Implicitly setting port 0 when collection is enabled would be consistent with how tmpnet collectors are configured by default, and I don't think there's a good reason to deviate here.

@RodrigoVillar
Copy link
Contributor Author

@maru-ava I've updated this PR so that if METRICS_COLLECTOR_ENABLED=true, then the metrics server port is implicitly set to 0 here: 527b652.

@github-project-automation github-project-automation bot moved this from Backlog 🧊 to In Progress 🏗️ in avalanchego Oct 28, 2025
# LABELS (optional): Comma-separated key=value pairs for metric labels.
# BENCHMARK_OUTPUT_FILE (optional): If set, benchmark output is also written to this file.
# METRICS_SERVER_ENABLED (optional): If set, enables the metrics server.
# METRICS_SERVER_PORT (optional): If set, determines the port the metrics server will listen to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see METRICS_SERVER_PORT be in addition to METRICS_SERVER_ENABLED rather than its replacement. The default port should remain zero - dynamic - and that detail shouldn't be required knowledge for those who don't want to set a specific port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3651f47

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the preference to make this in addition instead of a single flag @maru-ava ?

Made an opposite comment here #4418 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule of thumb here is 'Separate the Decision from the Details/Decouple Orthogonal Concerns': If two aspects of configuration answer different questions, they should be separate controls. Whether metrics export is enabled is a separable concern from configuring the port used for export. While conflating them seems like a reasonable simplification, best practice is to keep them separate for clarity and composability.

Copy link
Collaborator

@aaronbuchwald aaronbuchwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - fine with either collapsing the flags into one or leaving as it currently is (ref https://github.com/ava-labs/avalanchego/pull/4418/files#r2475491023)

@aaronbuchwald aaronbuchwald added this pull request to the merge queue Oct 30, 2025
Merged via the queue into master with commit 776ed1e Oct 30, 2025
35 checks passed
@aaronbuchwald aaronbuchwald deleted the rodrigo/support-explicit-metrics-port branch October 30, 2025 17:21
@github-project-automation github-project-automation bot moved this from In Progress 🏗️ to Done 🎉 in avalanchego Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done 🎉

Development

Successfully merging this pull request may close these issues.

Expose metrics server in tests without starting Prometheus agent / remote write

4 participants