-
Notifications
You must be signed in to change notification settings - Fork 818
feat(reexecute/c): decouple metrics server and collector #4415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The When running locally with When running locally with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR decouples the metrics server and collector functionality by replacing the boolean METRICS_ENABLED parameter with a more granular METRICS_MODE parameter. This allows users to run only the Prometheus server without the collector, enabling access to VM metrics while using their own monitoring stack.
- Introduces a new
metricsModetype with three values:disabled,server-only, andfull - Replaces
METRICS_ENABLEDwithMETRICS_MODEacross all configuration files and scripts - Updates the
collectRegistryfunction to conditionally start the collector based on the metrics mode
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/reexecute/c/vm_reexecute_test.go | Implements the new metricsMode type with validation and updates function signatures |
| tests/reexecute/c/README.md | Updates documentation to explain the three metrics mode options |
| scripts/benchmark_cchain_range.sh | Changes flag from metrics-enabled to metrics-mode |
| Taskfile.yml | Updates default value from "false" to "disabled" for the new metrics mode |
| .github/actions/c-chain-reexecution-benchmark/action.yml | Sets CI to use "full" metrics mode instead of boolean true |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| func (m *metricsMode) Set(s string) error { | ||
| s = strings.ToLower(strings.TrimSpace(s)) | ||
|
|
||
| switch s { | ||
| case "disabled": | ||
| *m = MetricsDisabled | ||
| case "server-only": | ||
| *m = MetricsServerOnly | ||
| case "full": | ||
| *m = MetricsFull | ||
| default: | ||
| return fmt.Errorf("invalid metrics mode: %s (valid options: disabled, server-only, full)", s) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use a string here and perhaps an alias type rather than implementing Set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified the metricsMode type here: a7cb056
| if metricsMode.shouldStartServer() { | ||
| collectRegistry(b, log, "c-chain-reexecution", prefixGatherer, labels, metricsMode.shouldStartCollector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clean this up a little bit? It seems odd that we decompose metricsMode into two separate booleans and use one to handle the if condition here and the other half of it as an argument that gets passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up the logic for starting the metrics server vs starting the metrics server and collector here: 9e319d0
tests/reexecute/c/README.md
Outdated
| - `METRICS_MODE=disabled`: no metrics are available. | ||
| - `METRICS_MODE=server-only`: starts a Prometheus server exporting VM metrics. A | ||
| link to the metrics endpoint is logged during execution. | ||
| - `METRICS_MODE=full`: starts both a Prometheus server exporting VM metrics and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given these names are not very self-explanatory (it's not clear what server-only and full refer to without the descriptions), I think it would be better to simply configure the two separately and require that if grafana is enabled, then the prometheus server must be enabled as well.
It's fine imo for the metrics server to be enabled and grafana disabled by default since that won't require setting any extra credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 5160738
| If running locally, metrics collection can be customized via the following parameters: | ||
|
|
||
| - `METRICS_SERVER_ENABLED`: starts a Prometheus server exporting VM metrics. | ||
| - `METRICS_COLLECTOR_ENABLED`: starts a Prometheus collector (if enabled, then `METRICS_SERVER_ENABLED` must be enabled as well). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just implicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it's own, I'm not opposed to METRICS_COLLECTOR_ENABLED=true implicitly setting METRICS_SERVER_ENABLED=true as well. However, considering that this PR will be followed up by #4418 (which adds the ability to configure a port for the metrics server), I think this becomes confusing (i.e. it isn't clear what happens if METRICS_COLLECTOR_ENABLED=true and METRICS_PORT=X without reading the description of METRICS_COLLECTOR_ENABLED ).
This could be fixed by renaming METRICS_COLLECTOR_ENABLED to METRICS_SERVER_AND_COLLECTOR_ENABLED, but this looks similar to a previous iteration of this PR which received this review comment: #4415 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to block on this, but a rethink is definitely suggested. I don't think the comment you linked to suggesting seperate flags precludes improving what appears in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? The only options regarding the flag design are as follows:
- Enabling the collector implicitly enables the server.
- Enabling the collector does not implicitly enable the server.
With this PR choosing the latter, I'm not sure what else needs to be rethought (if you think enabling the collector needs to implicitly enable the server, I'm happy to follow-up on that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is the chosen terminology. ENABLE_METRICS_SERVER starts a PrometheusServer instance, and that name could be confused with an actual Prometheus server (one that collects and aggregate metrics). I recommend changing ENABLE_METRICS_SERVER to ENABLE_METRICS_EXPORT and PrometheusServer to MetricsExporter. That way there is a unambiguous relationship between exporting metrics and collecting those exported metrics. Once that change is made, I think it would be a no-brainer to implicitly enable metrics export when collection is enabled rather than requiring that the user explicitly configure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this should be merged
As mentioned in #4362, splitting up the Prometheus server and collector is ideal for clients of Firewood who want access to VM metrics but would prefer to use their own monitoring stack.
Although #4362 also discusses the option of hardcoding the metrics server port, I've opted to split this out into a separate PR.
How this works
Changes the
METRICS_ENABLEDparameter toMETRICS_MODE, which has the following three options:disabled: does not start both the Prometheus server and collectorserver-only: starts only the Prometheus serverfull: starts both the Prometheus server and collectorI've opted for keeping a single environment variable rather than splitting
METRICS_ENABLEDsince there should never be a case for starting the Prometheus collector but not the Prometheus server. In either theserver-onlyorfullcases, a metrics endpoint or Grafana URL is printed out, respectively.How this was tested
CI
Need to be documented in RELEASES.md?
No