fix(infra): improve monitor + rebalancer deployment logic#7874
fix(infra): improve monitor + rebalancer deployment logic#7874paulbalaji merged 13 commits intomainfrom
Conversation
…efaults - Add WARP_ROUTE_ID env var to rebalancer helm chart for consistency with warp-monitor - Add configmap fallback for existing deployments without env var - Add getDeployedRegistryCommit() to both rebalancer and warp-monitor helm managers - Pre-fill registry commit prompt with existing value when redeploying
|
📝 WalkthroughWalkthroughThis PR adds deployed-route discovery and orphan filtering to deploy scripts, introduces registry-commit extraction utilities and Helm helpers to read deployed commits, adds skip-confirmation flows, and updates rebalancer/warpMonitor Helm image tags and warpRouteId config/env. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as Deploy Script
participant HelmMgr as Helm Manager
participant Registry as Registry
participant K8s as Deployed Config
User->>CLI: start deployment
CLI->>HelmMgr: getDeployed*WarpRouteIds()
HelmMgr->>K8s: query releases / configmaps
K8s-->>HelmMgr: deployed IDs (may miss warpRouteId)
HelmMgr-->>CLI: deployed IDs list
CLI->>User: show checkbox selection
User-->>CLI: select IDs
CLI->>Registry: filterOrphanedWarpRouteIds(selectedIds)
Registry->>Registry: warpRouteExistsInRegistry(id) checks
Registry-->>CLI: validIds + orphanedIds
Note over CLI: warn about orphaned IDs
loop for each validId
CLI->>HelmMgr: getDeployedRegistryCommit(warpRouteId)
HelmMgr->>K8s: helm get values / release
K8s-->>HelmMgr: helm values
HelmMgr-->>CLI: extracted registry commit or undefined
alt skipConfirmation
CLI->>CLI: use default commit
else
CLI->>User: prompt with default commit
User-->>CLI: confirm/provide commit
end
CLI->>CLI: cache validated commit
CLI->>HelmMgr: proceed with deployment for id
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@typescript/infra/helm/rebalancer/templates/_helpers.tpl`:
- Around line 77-78: The WARP_ROUTE_ID env var is rendered unconditionally while
REGISTRY_URI uses a conditional, so update the Helm template in _helpers.tpl to
mirror REGISTRY_URI's pattern: wrap the WARP_ROUTE_ID env var block in a
conditional check for .Values.warpRouteId (similar to the existing
.Values.hyperlane.registryUri check) so the env var is only emitted when
.Values.warpRouteId is truthy, keeping template logic consistent.
🧹 Nitpick comments (3)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (1)
106-108: UnnecessaryPromise.allwrapper for single promise.Look, I'm not one to judge how folks do things in their own swamp, but wrapping a single promise in
Promise.alland destructuring is a bit... onion-y when it doesn't need to be.🧅 Simpler approach
- const [registry] = await timedAsync('getRegistry', () => - Promise.all([envConfig.getRegistry(true, chainsNeeded)]), + const registry = await timedAsync('getRegistry', () => + envConfig.getRegistry(true, chainsNeeded), );typescript/infra/scripts/rebalancer/deploy-rebalancer.ts (2)
152-152: TODO comment for stale release cleanup.Noticed this TODO about uninstalling stale rebalancer releases. The warp-monitor has
uninstallUnknownWarpMonitorReleases- might want a similar implementation here eventually. Just flagging it since it's already noted.Would you like me to help draft an implementation for
uninstallUnknownRebalancerReleasessimilar to the warp-monitor approach, or open an issue to track this?
48-93: Consider extracting shared deployment selection logic.Now, I'm not saying you gotta do this right now - get out of me swamp first - but there's a fair bit of duplicated logic between this file and
deploy-warp-monitor.ts:
- Interactive selection flow (checkbox prompt)
- Orphan filtering and warning messages
- Registry commit handling with validation cache
If these scripts evolve together, a shared helper could reduce maintenance burden. But for now, the duplication is manageable.
Also applies to: 99-126
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7874 +/- ##
=======================================
Coverage 77.02% 77.02%
=======================================
Files 117 117
Lines 2651 2651
Branches 244 244
=======================================
Hits 2042 2042
Misses 593 593
Partials 16 16
🚀 New features to boost your workflow:
|
|
--yes feel too much of a catch all. I may be more obvious to the user if we have specific args for
--regisry main, --skipAtaPayerCheck ? |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts`:
- Around line 59-64: The current exit message when deployedIds.length === 0 is
misleading for the case where a user supplied --warp-route-id; update the logic
around the deployedIds check to distinguish two cases: (1) user supplied
warpRouteId and that warp route exists in the registry but simply has no
deployed monitor yet — emit a clear message like "No deployed warp monitors
found for the specified warp route(s); running this script will attempt to
deploy one" or proceed to deploy; (2) user supplied warpRouteId but it was
filtered out as an orphaned route (the filtering around the "orphaned" check at
lines ~78-79) or no warpRouteId provided — keep an explanatory error advising
how to create or register the warp route. Use the existing symbols warpRouteId
and deployedIds (and the orphaned filter) to implement the conditional
messaging/flow so the error is no longer ambiguous.
🧹 Nitpick comments (1)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (1)
164-166: Consider explicit process exit on failure.When
main()throws,.catch(rootLogger.error)logs the error but doesn't explicitly exit with a non-zero code. Depending on Node.js behavior and howrootLogger.erroris implemented, the process might still exit with code 0, which could mask failures in CI/CD pipelines.🧅 Suggested improvement
main() .then(() => rootLogger.info('Deploy successful!')) - .catch(rootLogger.error); + .catch((e) => { + rootLogger.error(e); + process.exit(1); + });
i disagree because we're still doing the check but just auto-yesing the prompt to continue - i think it doesn't make sense to have multiple having more args also means it's additional hidden knowledge people have to know rather than a single furthermore |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@typescript/infra/scripts/rebalancer/deploy-rebalancer.ts`:
- Around line 52-60: The list of deployed warpRouteId values built from
deployedPods via getDeployedRebalancerWarpRouteIds may contain duplicates
(replicas), causing duplicate checkboxes and repeated deployments; update the
deployedIds construction to remove duplicates before sorting (e.g., map to
p.warpRouteId, filter truthy, create a Set or use Array.from(new Set(...)) to
dedupe, then sort) so each warpRouteId appears only once; adjust any downstream
code that assumes deployedIds order if necessary.
🧹 Nitpick comments (5)
typescript/infra/scripts/warp-routes/deploy-warp-monitor.ts (2)
38-39: Consider the scope of--yesflag.Listen, the
--yesflag does a couple different things here - skips registry commit prompts and ATA payer balance warnings in preflight checks. That's not necessarily wrong, but some folks in the comments mentioned it might be clearer to have separate flags like--registry mainand--skip-ata-payer-check.If ya decide to keep it as-is (which is fair for a deployment tool), maybe just make sure the
--helptext spells out everything it skips so nobody's surprised, ya know?
122-148: Nice optimization with the commit validation cache.This is a clever bit - keepin' track of what commits you've already validated so you're not fetchin' the same thing over and over when deployin' multiple monitors. Like not checkin' if there's onions in the cupboard every time you make a meal.
One small thing: when
skipConfirmationis true and there's no deployed commit to fall back on (line 136), defaultin' to'main'is reasonable, but it might be worth a debug log sayin' "No deployed commit found, using 'main'" so operators know what happened.🔧 Optional: Add debug log for fallback
if (skipConfirmation) { - registryCommit = defaultRegistryCommit ?? 'main'; + if (defaultRegistryCommit) { + registryCommit = defaultRegistryCommit; + } else { + rootLogger.debug(`[${warpRouteId}] No deployed registry commit found, defaulting to 'main'`); + registryCommit = 'main'; + } } else {typescript/infra/scripts/rebalancer/deploy-rebalancer.ts (3)
41-44: Consider splitting--yesinto explicit flags.Right now
--yesskips multiple prompts, which can feel a bit too broad for operators. A more explicit flag (e.g.,--registry-commitdefaulting, or a--skip-registry-prompt) keeps intent clear and avoids surprises.
119-126: Make the--yesfallback tomainmore explicit.When no deployed commit exists,
--yessilently usesmain. That’s convenient but can surprise folks in automation. Consider a warning (or require--registry-commitwhen no deployed commit is found).💡 Suggested tweak
if (skipConfirmation) { + if (!defaultRegistryCommit) { + rootLogger.warn( + `[${warpRouteId}] No deployed registry commit found; defaulting to "main". Use --registry-commit to override.`, + ); + } registryCommit = defaultRegistryCommit ?? 'main'; } else {
158-158: Track the stale-release cleanup TODO.Nice catch leaving the TODO, but it’ll linger unless tracked. Want me to open an issue or draft an implementation plan for stale release cleanup?
Don't feel too strongly on this |
|
@claude review |
|
Claude finished @paulbalaji's task in 56s —— View job Code Review: PR #7874Overall: Solid PR that improves UX by defaulting to deployed services and adding skip-confirmation flows. The code is well-structured and addresses the issues raised in earlier review rounds. Architecture & Logic ✓The changes are well-organized:
Minor Issues1. Missing process exit code on error ( Both scripts log errors but don't exit with non-zero code on failure: main()
.then(() => rootLogger.info('Deploy successful!'))
.catch(rootLogger.error);In CI pipelines, this could mask failures. Consider: .catch((e) => {
rootLogger.error(e);
process.exit(1);
});2. Inconsistent error handling in The configmap fallback silently swallows all errors. While this is intentional (older deployments may not have the configmap), it might be worth logging at debug level for troubleshooting: } catch (e) {
rootLogger.debug(`Failed to read configmap for ${helmReleaseName}: ${e}`);
}Observations (No Action Needed)
Verdict: PR looks good. The minor issues noted are low priority and don't block merging. |
🐳 Monorepo Docker Image Built SuccessfullyImage Tags: |
Summary
Improves warp monitor and rebalancer deploy scripts with better defaults and less manual input.
Changes
Selection defaults to deployed services:
--warp-route-id→ lists only currently deployed services (not all registry warp routes)--yes/-yflag:main)Rebalancer helm chart:
WARP_ROUTE_IDenv var (fixes RPC rotation script detection)Refactored shared helpers:
filterOrphanedWarpRouteIds()- filters out services for warp routes no longer in registrygetDeployedRegistryCommit()- extracts registry commit from helm valueswarpRouteExistsInRegistry()- checks if warp route exists in registryUsage
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.