Skip to content

fix(infra): improve monitor + rebalancer deployment logic#7874

Merged
paulbalaji merged 13 commits intomainfrom
pb/fix-rebalancer-warp-route-id-extraction
Jan 23, 2026
Merged

fix(infra): improve monitor + rebalancer deployment logic#7874
paulbalaji merged 13 commits intomainfrom
pb/fix-rebalancer-warp-route-id-extraction

Conversation

@paulbalaji
Copy link
Collaborator

@paulbalaji paulbalaji commented Jan 23, 2026

Summary

Improves warp monitor and rebalancer deploy scripts with better defaults and less manual input.

Changes

Selection defaults to deployed services:

  • No --warp-route-id → lists only currently deployed services (not all registry warp routes)
  • Orphaned services (warp route removed from registry) are skipped with a warning

--yes / -y flag:

  • Skips per-service registry commit prompts (uses existing value or defaults to main)
  • Skips ATA payer balance warning prompts for warp monitors

Rebalancer helm chart:

  • Added WARP_ROUTE_ID env var (fixes RPC rotation script detection)
  • Configmap fallback for existing deployments

Refactored shared helpers:

  • filterOrphanedWarpRouteIds() - filters out services for warp routes no longer in registry
  • getDeployedRegistryCommit() - extracts registry commit from helm values
  • warpRouteExistsInRegistry() - checks if warp route exists in registry

Usage

# Redeploy monitors, prompting for each registry commit
pnpm tsx scripts/warp-routes/deploy-warp-monitor.ts -e mainnet3

# Redeploy monitors, using existing registry commits (no prompts after selection)
pnpm tsx scripts/warp-routes/deploy-warp-monitor.ts -e mainnet3 --yes

# Same for rebalancers
pnpm tsx scripts/rebalancer/deploy-rebalancer.ts -e mainnet3 --yes

Summary by CodeRabbit

  • New Features

    • Interactive selection of deployed rebalancers and warp monitors for deployment.
    • Detection and warnings for orphaned routes/monitors, with uninstall guidance.
    • Per-route registry-commit discovery from deployed releases (with validation caching).
    • Optional skip-confirmation mode to bypass interactive prompts and per-route preflight/balance checks.
    • Ability to read deployed route IDs from release config when available.
  • Chores

    • Updated Docker image tags for warpMonitor and rebalancer.
    • Helm values/env support to specify a warp route identifier.

✏️ Tip: You can customize this high-level summary in your review settings.

…efaults

- Add WARP_ROUTE_ID env var to rebalancer helm chart for consistency with warp-monitor
- Add configmap fallback for existing deployments without env var
- Add getDeployedRegistryCommit() to both rebalancer and warp-monitor helm managers
- Pre-fill registry commit prompt with existing value when redeploying
@changeset-bot
Copy link

changeset-bot bot commented Jan 23, 2026

⚠️ No Changeset found

Latest commit: 9236f24

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@paulbalaji paulbalaji changed the title fix(infra): rebalancer warp route ID extraction and registry commit defaults fix(infra): improve monitor + rebalancer deployment logic Jan 23, 2026
@paulbalaji paulbalaji marked this pull request as ready for review January 23, 2026 14:16
@paulbalaji paulbalaji requested a review from xeno097 January 23, 2026 14:19
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

This PR adds deployed-route discovery and orphan filtering to deploy scripts, introduces registry-commit extraction utilities and Helm helpers to read deployed commits, adds skip-confirmation flows, and updates rebalancer/warpMonitor Helm image tags and warpRouteId config/env.

Changes

Cohort / File(s) Summary
Docker Image Tags
\typescript/infra/config/docker.ts``
Updated mainnetDockerTags for warpMonitor and rebalancer to newer 20260123-1214xx timestamps.
Registry Utilities
\typescript/infra/config/registry.ts``
Added exported warpRouteExistsInRegistry(warpRouteId: string): boolean to check presence of a warp route in the registry.
Rebalancer Helm Config
\typescript/infra/helm/rebalancer/templates/_helpers.tpl`, `typescript/infra/helm/rebalancer/values.yaml``
Added conditional WARP_ROUTE_ID env var and warpRouteId values key to rebalancer Helm chart.
Shared CLI Utilities
\typescript/infra/scripts/agent-utils.ts``
Added withYes CLI option to skip confirmations and filterOrphanedWarpRouteIds to split valid vs orphaned route IDs using registry checks.
Rebalancer Deploy Script
\typescript/infra/scripts/rebalancer/deploy-rebalancer.ts``
Switched to checkbox selection of deployed releases, filters orphaned IDs, caches validated registry commits, prompts (or skips) per-route registry commit, and early-exits when no valid IDs.
Warp Monitor Deploy Script
\typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts``
Adds checkbox selection of deployed monitors, orphaned-ID handling, per-route registry-commit handling with skip-confirmation, validation cache, per-chain registry loading, and adjusted preflight invocation.
Rebalancer Helm Manager
\typescript/infra/src/rebalancer/helm.ts``
Added warpRouteId into helm values image section; added getDeployedRegistryCommit public static method; improved deployed-ID discovery with configmap fallback.
Helm Utilities
\typescript/infra/src/utils/helm.ts``
Added extractRegistryCommitFromHelmValues(values) and getDeployedRegistryCommit(warpRouteId, namespace, helmReleasePrefix) to derive registry commit from deployed helm values/releases.
Warp Monitor Helm Manager
\typescript/infra/src/warp-monitor/helm.ts``
Added optional skipConfirmation parameter to preflight/balance checks and static getDeployedRegistryCommit wrapper to read deployed registry commit.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as Deploy Script
    participant HelmMgr as Helm Manager
    participant Registry as Registry
    participant K8s as Deployed Config

    User->>CLI: start deployment
    CLI->>HelmMgr: getDeployed*WarpRouteIds()
    HelmMgr->>K8s: query releases / configmaps
    K8s-->>HelmMgr: deployed IDs (may miss warpRouteId)
    HelmMgr-->>CLI: deployed IDs list
    CLI->>User: show checkbox selection
    User-->>CLI: select IDs

    CLI->>Registry: filterOrphanedWarpRouteIds(selectedIds)
    Registry->>Registry: warpRouteExistsInRegistry(id) checks
    Registry-->>CLI: validIds + orphanedIds
    Note over CLI: warn about orphaned IDs

    loop for each validId
        CLI->>HelmMgr: getDeployedRegistryCommit(warpRouteId)
        HelmMgr->>K8s: helm get values / release
        K8s-->>HelmMgr: helm values
        HelmMgr-->>CLI: extracted registry commit or undefined
        alt skipConfirmation
            CLI->>CLI: use default commit
        else
            CLI->>User: prompt with default commit
            User-->>CLI: confirm/provide commit
        end
        CLI->>CLI: cache validated commit
        CLI->>HelmMgr: proceed with deployment for id
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Mo-Hussain
  • ltyu
  • antigremlin
  • xeno097

Poem

In the bog where services creep,
I round up routes that fell asleep.
Commits pulled, confirmations spared,
Helm sings soft — no mess to be repaired. 🐸

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(infra): improve monitor + rebalancer deployment logic' clearly summarizes the main change—improving deployment script logic for two infrastructure tools with better defaults and reduced manual input.
Description check ✅ Passed The PR description covers all required sections: a clear summary of changes, the new --yes flag functionality, rebalancer Helm updates, refactored helpers, and practical usage examples. It aligns well with the template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pb/fix-rebalancer-warp-route-id-extraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@typescript/infra/helm/rebalancer/templates/_helpers.tpl`:
- Around line 77-78: The WARP_ROUTE_ID env var is rendered unconditionally while
REGISTRY_URI uses a conditional, so update the Helm template in _helpers.tpl to
mirror REGISTRY_URI's pattern: wrap the WARP_ROUTE_ID env var block in a
conditional check for .Values.warpRouteId (similar to the existing
.Values.hyperlane.registryUri check) so the env var is only emitted when
.Values.warpRouteId is truthy, keeping template logic consistent.
🧹 Nitpick comments (3)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (1)

106-108: Unnecessary Promise.all wrapper for single promise.

Look, I'm not one to judge how folks do things in their own swamp, but wrapping a single promise in Promise.all and destructuring is a bit... onion-y when it doesn't need to be.

🧅 Simpler approach
-  const [registry] = await timedAsync('getRegistry', () =>
-    Promise.all([envConfig.getRegistry(true, chainsNeeded)]),
+  const registry = await timedAsync('getRegistry', () =>
+    envConfig.getRegistry(true, chainsNeeded),
   );
typescript/infra/scripts/rebalancer/deploy-rebalancer.ts (2)

152-152: TODO comment for stale release cleanup.

Noticed this TODO about uninstalling stale rebalancer releases. The warp-monitor has uninstallUnknownWarpMonitorReleases - might want a similar implementation here eventually. Just flagging it since it's already noted.

Would you like me to help draft an implementation for uninstallUnknownRebalancerReleases similar to the warp-monitor approach, or open an issue to track this?


48-93: Consider extracting shared deployment selection logic.

Now, I'm not saying you gotta do this right now - get out of me swamp first - but there's a fair bit of duplicated logic between this file and deploy-warp-monitor.ts:

  1. Interactive selection flow (checkbox prompt)
  2. Orphan filtering and warning messages
  3. Registry commit handling with validation cache

If these scripts evolve together, a shared helper could reduce maintenance burden. But for now, the duplication is manageable.

Also applies to: 99-126

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.02%. Comparing base (6b6fd0b) to head (9236f24).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7874   +/-   ##
=======================================
  Coverage   77.02%   77.02%           
=======================================
  Files         117      117           
  Lines        2651     2651           
  Branches      244      244           
=======================================
  Hits         2042     2042           
  Misses        593      593           
  Partials       16       16           
Components Coverage Δ
core 87.80% <ø> (ø)
hooks 71.86% <ø> (ø)
isms 81.10% <ø> (ø)
token 86.67% <ø> (ø)
middlewares 84.98% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Mo-Hussain
Copy link
Collaborator

--yes feel too much of a catch all. I may be more obvious to the user if we have specific args for

  • Skips per-service registry commit prompts (uses existing value or defaults to main)
  • Skips ATA payer balance warning prompts for warp monitors

--regisry main, --skipAtaPayerCheck ?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts`:
- Around line 59-64: The current exit message when deployedIds.length === 0 is
misleading for the case where a user supplied --warp-route-id; update the logic
around the deployedIds check to distinguish two cases: (1) user supplied
warpRouteId and that warp route exists in the registry but simply has no
deployed monitor yet — emit a clear message like "No deployed warp monitors
found for the specified warp route(s); running this script will attempt to
deploy one" or proceed to deploy; (2) user supplied warpRouteId but it was
filtered out as an orphaned route (the filtering around the "orphaned" check at
lines ~78-79) or no warpRouteId provided — keep an explanatory error advising
how to create or register the warp route. Use the existing symbols warpRouteId
and deployedIds (and the orphaned filter) to implement the conditional
messaging/flow so the error is no longer ambiguous.
🧹 Nitpick comments (1)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (1)

164-166: Consider explicit process exit on failure.

When main() throws, .catch(rootLogger.error) logs the error but doesn't explicitly exit with a non-zero code. Depending on Node.js behavior and how rootLogger.error is implemented, the process might still exit with code 0, which could mask failures in CI/CD pipelines.

🧅 Suggested improvement
 main()
   .then(() => rootLogger.info('Deploy successful!'))
-  .catch(rootLogger.error);
+  .catch((e) => {
+    rootLogger.error(e);
+    process.exit(1);
+  });

@paulbalaji
Copy link
Collaborator Author

paulbalaji commented Jan 23, 2026

--yes feel too much of a catch all. I may be more obvious to the user if we have specific args for

  • Skips per-service registry commit prompts (uses existing value or defaults to main)
  • Skips ATA payer balance warning prompts for warp monitors

--regisry main, --skipAtaPayerCheck ?

i disagree because we're still doing the check but just auto-yesing the prompt to continue - i think it doesn't make sense to have multiple --yes-like args

having more args also means it's additional hidden knowledge people have to know rather than a single --yes that typically does the full prompt skippping to script completion

furthermore --registryCommit XYZ is an arg we already have, but that's for setting that commit to all services deployed. --registry main is duplicate of that, and also doesn't solve the case of setting a default fallback. imo the existing logic of reusing the already configured commit or falling back to main is sufficient

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@typescript/infra/scripts/rebalancer/deploy-rebalancer.ts`:
- Around line 52-60: The list of deployed warpRouteId values built from
deployedPods via getDeployedRebalancerWarpRouteIds may contain duplicates
(replicas), causing duplicate checkboxes and repeated deployments; update the
deployedIds construction to remove duplicates before sorting (e.g., map to
p.warpRouteId, filter truthy, create a Set or use Array.from(new Set(...)) to
dedupe, then sort) so each warpRouteId appears only once; adjust any downstream
code that assumes deployedIds order if necessary.
🧹 Nitpick comments (5)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (2)

38-39: Consider the scope of --yes flag.

Listen, the --yes flag does a couple different things here - skips registry commit prompts and ATA payer balance warnings in preflight checks. That's not necessarily wrong, but some folks in the comments mentioned it might be clearer to have separate flags like --registry main and --skip-ata-payer-check.

If ya decide to keep it as-is (which is fair for a deployment tool), maybe just make sure the --help text spells out everything it skips so nobody's surprised, ya know?


122-148: Nice optimization with the commit validation cache.

This is a clever bit - keepin' track of what commits you've already validated so you're not fetchin' the same thing over and over when deployin' multiple monitors. Like not checkin' if there's onions in the cupboard every time you make a meal.

One small thing: when skipConfirmation is true and there's no deployed commit to fall back on (line 136), defaultin' to 'main' is reasonable, but it might be worth a debug log sayin' "No deployed commit found, using 'main'" so operators know what happened.

🔧 Optional: Add debug log for fallback
       if (skipConfirmation) {
-        registryCommit = defaultRegistryCommit ?? 'main';
+        if (defaultRegistryCommit) {
+          registryCommit = defaultRegistryCommit;
+        } else {
+          rootLogger.debug(`[${warpRouteId}] No deployed registry commit found, defaulting to 'main'`);
+          registryCommit = 'main';
+        }
       } else {
typescript/infra/scripts/rebalancer/deploy-rebalancer.ts (3)

41-44: Consider splitting --yes into explicit flags.

Right now --yes skips multiple prompts, which can feel a bit too broad for operators. A more explicit flag (e.g., --registry-commit defaulting, or a --skip-registry-prompt) keeps intent clear and avoids surprises.


119-126: Make the --yes fallback to main more explicit.

When no deployed commit exists, --yes silently uses main. That’s convenient but can surprise folks in automation. Consider a warning (or require --registry-commit when no deployed commit is found).

💡 Suggested tweak
       if (skipConfirmation) {
+        if (!defaultRegistryCommit) {
+          rootLogger.warn(
+            `[${warpRouteId}] No deployed registry commit found; defaulting to "main". Use --registry-commit to override.`,
+          );
+        }
         registryCommit = defaultRegistryCommit ?? 'main';
       } else {

158-158: Track the stale-release cleanup TODO.

Nice catch leaving the TODO, but it’ll linger unless tracked. Want me to open an issue or draft an implementation plan for stale release cleanup?

@Mo-Hussain
Copy link
Collaborator

--yes feel too much of a catch all. I may be more obvious to the user if we have specific args for

  • Skips per-service registry commit prompts (uses existing value or defaults to main)
  • Skips ATA payer balance warning prompts for warp monitors

--regisry main, --skipAtaPayerCheck ?

i disagree because we're still doing the check but just auto-yesing the prompt to continue - i think it doesn't make sense to have multiple --yes-like args

having more args also means it's additional hidden knowledge people have to know rather than a single --yes that typically does the full prompt skippping to script completion

furthermore --registryCommit XYZ is an arg we already have, but that's for setting that commit to all services deployed. --registry main is duplicate of that, and also doesn't solve the case of setting a default fallback. imo the existing logic of reusing the already configured commit or falling back to main is sufficient

Don't feel too strongly on this

@paulbalaji
Copy link
Collaborator Author

@claude review

@claude
Copy link
Contributor

claude bot commented Jan 23, 2026

Claude finished @paulbalaji's task in 56s —— View job


Code Review: PR #7874

Overall: Solid PR that improves UX by defaulting to deployed services and adding skip-confirmation flows. The code is well-structured and addresses the issues raised in earlier review rounds.


Architecture & Logic ✓

The changes are well-organized:

  • Shared utilities (filterOrphanedWarpRouteIds, getDeployedRegistryCommit, extractRegistryCommitFromHelmValues) properly extracted to reusable locations
  • Both deploy scripts follow consistent patterns for deployed-ID discovery, orphan filtering, and registry commit handling
  • Commit validation caching is a good optimization

Minor Issues

1. Missing process exit code on error (deploy-warp-monitor.ts:173-175, deploy-rebalancer.ts:169-171)

Both scripts log errors but don't exit with non-zero code on failure:

main()
  .then(() => rootLogger.info('Deploy successful!'))
  .catch(rootLogger.error);

In CI pipelines, this could mask failures. Consider:

.catch((e) => {
  rootLogger.error(e);
  process.exit(1);
});

2. Inconsistent error handling in getDeployedRebalancerWarpRouteIds (helm.ts:255-270)

The configmap fallback silently swallows all errors. While this is intentional (older deployments may not have the configmap), it might be worth logging at debug level for troubleshooting:

} catch (e) {
  rootLogger.debug(`Failed to read configmap for ${helmReleaseName}: ${e}`);
}

Observations (No Action Needed)

  1. Parallel deploy scripts: The two deploy scripts (deploy-rebalancer.ts and deploy-warp-monitor.ts) share similar structure. The TODO at deploy-rebalancer.ts:161 about uninstalling stale releases mirrors uninstallUnknownWarpMonitorReleases in the warp monitor. Future consolidation could reduce maintenance burden.

  2. The --yes flag scope: As discussed in PR comments, the flag skips multiple prompts (registry commit + ATA payer balance). The decision to keep it as a single flag rather than separate flags is reasonable for deployment scripts.

  3. Helm conditional for WARP_ROUTE_ID: Now properly wrapped in {{- if .Values.warpRouteId }} (fixed per earlier review feedback in _helpers.tpl:77-80).


Verdict: PR looks good. The minor issues noted are low priority and don't block merging.

@paulbalaji paulbalaji enabled auto-merge January 23, 2026 16:08
@hyper-gonk
Copy link
Contributor

hyper-gonk bot commented Jan 23, 2026

🐳 Monorepo Docker Image Built Successfully

Image Tags:

gcr.io/abacus-labs-dev/hyperlane-monorepo:pr-7874
gcr.io/abacus-labs-dev/hyperlane-monorepo:9236f24-20260123-160836

@paulbalaji paulbalaji added this pull request to the merge queue Jan 23, 2026
Merged via the queue into main with commit 308e7d4 Jan 23, 2026
110 checks passed
@paulbalaji paulbalaji deleted the pb/fix-rebalancer-warp-route-id-extraction branch January 23, 2026 16:26
@github-project-automation github-project-automation bot moved this from In Review to Done in Hyperlane Tasks Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants