Skip to content

Sync from lasso-cloud: error handling, load_balanced rename, graceful shutdown#33

Merged
jaxernst merged 3 commits intomainfrom
sync/cloud-provider-health-and-dashboard-cleanup
Feb 23, 2026
Merged

Sync from lasso-cloud: error handling, load_balanced rename, graceful shutdown#33
jaxernst merged 3 commits intomainfrom
sync/cloud-provider-health-and-dashboard-cleanup

Conversation

@jaxernst
Copy link
Copy Markdown
Owner

OSS Sync from lasso-cloud

Synced from cloud commits 9f19dfc..2826269 (12 commits on cloud main).

This PR includes two sync commits:

  1. f5b5a2d — Provider health monitoring + dashboard cleanup (previous sync)
  2. 60e92dc — Error handling fixes, load_balanced strategy rename, graceful shutdown

Changes

Strategy Rename (round_robin → load_balanced)

  • Renamed round_robin strategy to load_balanced across routes, controllers, config, and UI
  • Added backward-compatible /round-robin/ route aliases (map to rpc_load_balanced)
  • Updated strategy_from/2, parse_strategy/1, default_strategy/0 with backward compat
  • Updated EndpointHelpers, SimulatorControls, EndpointSelector JS hook
  • New load_balanced.ex strategy implementation

Error Handling & Provider Health

  • Fix probe classification for more accurate health status
  • Fix client_error failover — 4xx errors now reclassified correctly
  • Fix providers stuck unhealthy: graduated recovery, error exclusion, block range pre-filtering
  • New error_normalizer.ex and error_classification.ex improvements

Infrastructure

  • Add Plug.Cowboy.Drainer for graceful HTTP request draining on shutdown (SIGTERM during deploys)
  • Updated provider adapters: dRPC URL fix, merkle, llamarpc, 1rpc, generic

Docs

  • Updated: API Reference, Architecture, Configuration, Observability
  • New: Routing guide (docs/ROUTING.md)

Tests

  • New: probe_classification_test.exs, failover_strategy_test.exs, error_normalizer_test.exs
  • Updated integration tests for error handling changes

Skipped (cloud-only / diverged)

  • lib/lasso_cloud/ — all proprietary modules (billing, metering, entitlements)
  • lib/lasso_web/live/home_live.ex — pervasive current_account conditional rendering
  • lib/lasso_web/sockets/rpc_socket.ex — heavily cloud-specific (API key auth, CU metering, rate limiting)
  • config/runtime.exs, config/test.exs — CU quota enforcement config (cloud metering)
  • priv/repo/migrations/ — database migrations
  • docs/internal/ — internal planning docs
  • config/profiles/premium.yml — cloud-only tier

Verification

  • Compiles with --warnings-as-errors
  • No proprietary code leaked (grep for LassoCloud, APIKeyAuthPlug, CUPlug, etc.)
  • Formatted with mix format

🤖 Generated with Claude Code

jaxernst and others added 2 commits February 9, 2026 11:47
Core (lib/lasso/):
- Health probe batch coordinator: improved coordination logic
- Provider pool: rate limit awareness in health tiering, probe-based health reporting
- Selection: rate-limited provider tiering in strategy selection
- Circuit breaker: remove redundant error handling
- Error classification: updated rate limit detection
- JSONRPC error: improved error code handling

Dashboard (lib/lasso_web/):
- Consolidate format helpers into Formatting module (format_latency, format_rps, format_region_name, success_rate_color)
- Remove duplicated inline helpers from chain_details_panel and metrics_tab
- Refactor provider_details_panel block height resolution into helper functions
- Use cluster_block_heights for real-time consensus height in chain details
- Make strategy_display_name/strategy_description public in endpoint_helpers
- Remove unused get_strategy_description from helpers
- Remove unused chain status route and controller action
- Metrics tab: filter nil/unknown regions, use Formatting delegates

Tests:
- New: error_test, provider_pool_probe_health_test, selection_rate_limit_tiering_test
- Updated: error_classification_test, circuit_breaker_test, integration tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ename, graceful shutdown

Key changes:
- Rename round_robin strategy to load_balanced across routes, controllers, UI, and config
- Add backward-compatible round-robin route aliases
- Add Plug.Cowboy.Drainer for graceful HTTP request draining on shutdown
- Fix error handling: probe classification, client_error failover, 4xx reclassification
- Fix providers stuck unhealthy: graduated recovery, error exclusion, block range pre-filtering
- Update provider adapters (dRPC URL fix, merkle, llamarpc, 1rpc, generic)
- Add load_balanced strategy implementation
- Add new test coverage for error normalizer, failover strategy, probe classification
- Update docs: API reference, architecture, configuration, observability, routing

Synced from cloud commits 9f19dfc..2826269.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jaxernst
Copy link
Copy Markdown
Owner Author

Validation Check

Checked out this branch and ran basic validation:

  • Compile: mix compile --warnings-as-errors — clean, no warnings
  • Tests: mix test773 tests, 0 failures, 110 excluded (by tag: :skip, :integration, :real_providers, :slow)

This PR is from Feb 10 and predates the larger 50-commit cloud sync (Feb 21). The larger sync likely subsumes most of these changes (error handling, load_balanced rename, provider health improvements).

Recommendation: This can be closed — the subsequent cloud sync on cloud-sync-2026-02-21 covers the same ground with a more complete set of changes. Merging this first would create unnecessary conflict resolution work.

@claude — can you review the diff on this PR and confirm whether all changes here are superseded by the Feb 21 sync, or if there are any unique changes that should be cherry-picked?

@claude
Copy link
Copy Markdown

claude bot commented Feb 22, 2026

Claude finished @jaxernst's task in 3m 16s —— View job


PR Diff Analysis

  • Gather context: compare this PR's diff against cloud-sync-2026-02-21
  • Identify files/changes unique to this PR not in the Feb 21 sync
  • Identify files/changes in both PRs (superseded)
  • Summarize findings and recommendation

Findings

The Feb 21 sync branch isn't available locally, so this analysis is based on direct inspection of the two commits in this PR (f5b5a2d, 60e92dc) and reasoning about whether a 50-commit cloud sync starting from the same source would subsume them.

Short answer: Yes, close this PR. The Feb 21 sync should subsume all of these changes since it covers more commits from the same cloud main lineage. However, there are three areas worth spot-checking in the Feb 21 sync before merging it.


Changes in this PR and their supersession status

Commit 1 (f5b5a2d) — Provider health + dashboard cleanup

All of these are straightforward cloud-to-OSS syncs with no OSS-specific adaptations:

  • ProviderPool.update_probe_health/5 new API (replaces deprecated report_probe_results)
  • BatchCoordinator calling ProviderPool.update_probe_health instead of CircuitBreaker.record_failure directly for probe results
  • Rate-limit tiering in Selection.get_candidates/3 — deprioritizes rate-limited providers rather than excluding them
  • CircuitBreaker: removed double-reporting for rate limit failures
  • JSONRPC.Error: improved error code handling
  • Dashboard Formatting module consolidation, provider_details_panel refactor
  • New tests: error_test, provider_pool_probe_health_test, selection_rate_limit_tiering_test

All superseded by the Feb 21 sync if it covers the same cloud commits.

Commit 2 (60e92dc) — Error handling + load_balanced rename

The cloud→OSS logic here is largely mechanical, but two areas were manually adapted for OSS and need verification in the Feb 21 sync:

  1. Backward-compatible /round-robin/ route aliases (lib/lasso_web/router.ex):

    • In cloud, the rename was a clean swap. For OSS, this PR added /round-robin/ as an alias mapping to rpc_load_balanced to avoid breaking existing integrations.
    • If the Feb 21 sync did a clean rename without adding the alias, backward compat is broken.
    • Verify in Feb 21 sync →
  2. config/profiles/default.yml provider-specific settings:

    • This commit added rps_limit: 100, burst_limit: 500, per-provider subscribe_new_heads flags, max_block_range for blockpi/nodies, dRPC api_key_required: false, and renamed Lasso PublicLasso Free.
    • These are OSS config values. The cloud version of this file has cloud-tier equivalents; the OSS sync needs to ensure none of these are stomped by cloud-specific values.
  3. Plug.Cowboy.Drainer in application.ex: clean addition, no cloud specificity — will be superseded as-is.

  4. Error normalizer 4xx reclassification (error_normalizer.ex): distinguishes :json_rpc vs :raw 4xx responses for proper failover — pure logic, superseded cleanly.

  5. FailoverStrategy client_error handling: adds threshold-1 failover for :client_error before treating as terminal — pure logic, superseded cleanly.


Recommendation

Close this PR. The Feb 21 sync subsumes the cloud logic. Before merging that PR, verify:

  1. lib/lasso_web/router.ex contains both /load-balanced/:chain_id and /round-robin/:chain_id routes pointing to rpc_load_balanced
  2. config/profiles/default.yml retains OSS-appropriate values (rps_limit: 100, burst_limit: 500, name: Lasso Free, per-provider subscribe_new_heads/max_block_range flags)
  3. lib/lasso/application.ex includes the Plug.Cowboy.Drainer entry after the endpoint

If any of these were missed in the Feb 21 sync, they're the only items worth cherry-picking.

@jaxernst jaxernst merged commit 3287a08 into main Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant