Skip to content

Conversation

pauldowman
Copy link
Contributor

This PR adds the following prometheus metrics to dispute-mon:

  • Gauge of total number of endpoints that returned an error (other than not found) during the last update cycle - note this is the number of rpcs that had at least 1 failure, so if the same rpc gives an error on 10 games that still counts as 1.
  • Gauge of total number of failures (other than not found) in the last update cycle. This counts each error from an rpc as a new event even if the rpc is the same (so one rpc giving 10 errors sets this to 10).
  • Gauge of the total number of games where some nodes reported not found and others had the block in the last update cycle.
  • Gauge of the total number of games where some nodes reported the output root as safe and others as unsafe in the last update cycle
  • Gauge of the total number of games where nodes returned different expected output roots in the last update cycle

Reviewers might want to look at each commit separately, there's one for each metric.

Closes #11020

@pauldowman pauldowman requested review from mbaxter and Inphi September 25, 2025 22:15
@pauldowman pauldowman requested review from a team as code owners September 25, 2025 22:15
@pauldowman pauldowman changed the title Pd/op dispute mon node endpoint error metrics op-dispute-mon: node endpoint error metrics Sep 25, 2025
Copy link
Contributor

@Inphi Inphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. There is some unused code that needs to be fixed.

an op-dispute-mon/main file was accidentally checked in. It should be removed.


// HasMixedSafety returns true if some rollup endpoints reported the root as safe and others as unsafe
// for this game. This indicates inconsistent safety assessment across the rollup node network.
func (g EnrichedGameData) HasMixedSafety() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to be recorded in a metric? I only see this function used by tests.

RollupEndpointUnsafeCount int

// RollupEndpointDifferentOutputRoots tracks whether rollup endpoints returned different output roots for this game.
RollupEndpointDifferentOutputRoots bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to observe the value of this field. I don't see a log or a metric set when this is updated.

uniqueEndpointErrors := make(map[string]bool)

for _, game := range games {
if game.RollupEndpointErrors != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if game.RollupEndpointErrors != nil {
if len(game.RollupEndpointErrors) != 0 {

since the map is always initialized in enrichGame. Also, it's idiomatic to use len for emptiness because it also works when the map is nil.

"unique_endpoint_count", errorCount,
"endpoints", getEndpointList(uniqueEndpointErrors))
} else {
m.logger.Debug("No rollup node endpoint errors found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: possibly noisy debug. Could use Trace instead.

"total_error_count", totalErrors,
"games_with_errors", countGamesWithErrors(games))
} else {
m.logger.Debug("No rollup node endpoint errors found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Noisy debug. Consider using Trace

m.metrics.RecordMixedAvailabilityGames(count)

if count > 0 {
m.logger.Info("Mixed availability summary", "gamesWithMixedAvailability", count, "totalGames", len(games))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth logging this as a Warn. It's far from ideal when this occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dispute-mon: Multiple rollup node metrics
2 participants