Skip to content

chore: promote staging to staging-promote/71f41dd1-23309993684 (2026-03-19 19:18 UTC)#1425

Open
ironclaw-ci[bot] wants to merge 2 commits intostaging-promote/71f41dd1-23309993684from
staging-promote/52ca9d65-23312673755
Open

chore: promote staging to staging-promote/71f41dd1-23309993684 (2026-03-19 19:18 UTC)#1425
ironclaw-ci[bot] wants to merge 2 commits intostaging-promote/71f41dd1-23309993684from
staging-promote/52ca9d65-23312673755

Conversation

@ironclaw-ci
Copy link
Contributor

@ironclaw-ci ironclaw-ci bot commented Mar 19, 2026

Auto-promotion from staging CI

Batch range: 71f41dd12363497372864bc6eb3f7c334e05fd52..52ca9d6588f31fc9b6007c56ed7cd1995d5ad0df
Promotion branch: staging-promote/52ca9d65-23312673755
Base: staging-promote/71f41dd1-23309993684
Triggered by: Staging CI batch at 2026-03-19 19:18 UTC

Commits in this batch (2):

Current commits in this promotion (2)

Current base: staging-promote/71f41dd1-23309993684
Current head: staging-promote/52ca9d65-23312673755
Current range: origin/staging-promote/71f41dd1-23309993684..origin/staging-promote/52ca9d65-23312673755

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

nearfamiliarcow and others added 2 commits March 19, 2026 11:45
…requests (#1257)

The HTTP tool returned `ApprovalRequirement::Always` for requests with
credentials, but `Always` is hardcoded to ignore the session auto-approve
set. This meant users who clicked "always" were re-prompted on every
subsequent HTTP call — the UI offered "always" but the backend ignored it.

Two fixes:
1. HTTP credentialed requests now return `UnlessAutoApproved` instead of
   `Always`, so the session auto-approve set is respected.
2. `StatusUpdate::ApprovalNeeded` now carries `allow_always: bool`. All
   channel UIs (Telegram, Slack, Signal, REPL, Web) conditionally hide
   the "always" option when a tool truly requires per-invocation approval
   (`ApprovalRequirement::Always`, e.g. destructive shell commands).

Also boxes `PendingApproval` in `AgenticLoopResult::NeedApproval` to fix
a pre-existing clippy `large_enum_variant` warning.

Regression tests included (test_credentialed_requests_respect_auto_approve,
test_allow_always_matches_approval_requirement) but CI heuristic cannot
detect them in cross-fork PR diffs.

[skip-regression-check]

Co-authored-by: Tyler <tbond@tbond-m5.local>
* feat: receive relay events via webhook callbacks instead of SSE

Replace the SSE pull model with push-based webhook callbacks from
channel-relay. Eliminates the reconnect loop, stream token auth,
and SSE parser — events arrive via HTTP POST to /relay/events.

- Add webhook handler with HMAC signature verification
- Simplify RelayChannel to use mpsc from webhook handler
- Remove SSE connect/reconnect/parse logic from RelayClient
- Add register_callback() to RelayClient for callback URL registration
- Update activation flow to create event channel and register callback
- Wire relay webhook endpoint into web gateway

* fix: address review feedback on webhook callback PR

- Return 503 when relay event channel is full/closed (enables retry)
- Reject malformed timestamps with 400 instead of proceeding
- Allow relay activation without settings store (no-store/ephemeral mode)
- Check installed_relay_extensions set in is_relay_channel for no-db mode
- Fix staging test constructors for new RelayChannel signature

* security: adapt relay client to new channel-relay auth model

Adapts the relay integration to the hardened channel-relay security model:

- Switch from X-API-Key header to Authorization: Bearer sk-agent-*
  for all relay API calls (chat-api token verification)
- Remove register_callback() — PUT /callbacks endpoint removed
- Remove event_callback_url from initiate_oauth() — parameter removed
- Make signing_secret a required field in RelayConfig (new env var:
  CHANNEL_RELAY_SIGNING_SECRET)
- Update integration tests for Bearer auth and removed endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: use server-side approval tokens, remove caller-supplied routing

- Approval flow now calls POST /approvals to register server-side
  record, then embeds only the opaque approval_token in button value
- Remove instance_id parameter from proxy_provider() — channel-relay
  no longer accepts it (uses verified identity)
- Remove instance_id and user_id from initiate_oauth() — channel-relay
  derives them from the Bearer token
- Add create_approval() to RelayClient

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pass webhook_url during OAuth so callback_url is set on connection

The channel-relay OAuth flow now accepts webhook_url to set the
callback_url during connection creation. IronClaw computes its webhook
URL from callback_base + webhook_path and passes it during initiate_oauth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: remove webhook_url from OAuth initiation

Channel-relay now derives the callback URL from chat-api's instance_url.
IronClaw no longer supplies webhook_url during OAuth — the relay is the
authority on where events get delivered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: remove all URL params from OAuth initiation

IronClaw no longer supplies any URLs to channel-relay. The relay
derives all URLs from the trusted instance_url in chat-api.
initiate_oauth() takes no parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore CSRF nonce for OAuth callback validation

Re-add nonce generation and secret storage in auth_channel_relay.
The nonce is passed to channel-relay as state_nonce param (not a URL).
Channel-relay embeds it in the signed state and appends it to the
redirect URL so IronClaw's callback handler can validate and activate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: per-instance callback signing secrets

relay_signing_secret() now prefers OPENCLAW_GATEWAY_TOKEN (per-instance)
over the shared CHANNEL_RELAY_SIGNING_SECRET. A compromised instance
can no longer forge callbacks to other instances on the same relay.
CHANNEL_RELAY_SIGNING_SECRET is now optional in RelayConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: clean per-instance callback secrets, no shared secrets, no fallbacks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pass team_id to get_signing_secret for workspace-scoped lookup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* security: remove sender_id from create_approval — relay derives it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove stale relay sender_id validation

* fix: harden relay webhook activation lifecycle

---------

Co-authored-by: Pierre <pierre@near.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel Channel infrastructure scope: channel/web Web gateway channel scope: channel/wasm WASM channel runtime scope: tool/builtin Built-in tools scope: extensions Extension management size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 19, 2026
@claude
Copy link

claude bot commented Mar 19, 2026

Code review

No issues found.

This PR implements two well-integrated changes:

  1. Relay webhook refactoring (feat: receive relay events via webhook callbacks feat: receive relay events via webhook callbacks #1254): Cleanly transitions from SSE long-polling to webhook callbacks with proper HMAC-SHA256 signature verification, timestamp freshness validation (5-min window), and correct hex decoding + 32-byte length validation of signing secrets. The webhook handler at POST /relay/events is well-designed with proper error responses at each validation step.

  2. Approval system enhancement (fix: make "always" auto-approve work for credentialed HTTP requests fix(approval): make "always" auto-approve work for credentialed HTTP requests #1257): Correctly changes ApprovalRequirement::AlwaysUnlessAutoApproved for credentialed HTTP requests, allowing the session-wide "always" setting to apply. The allow_always field is properly threaded through PendingApproval, SubmissionResult, and all approval handlers (dispatcher, thread_ops, channel), with regression tests verifying the behavior.

All code follows CLAUDE.md guidelines: no unwrap()/expect() in production code, proper error context mapping, and strong type usage. Configuration simplification removes SSE polling parameters cleanly. Test cases are comprehensive and updated appropriately.

@claude
Copy link

claude bot commented Mar 19, 2026

Code review - Architecture findings

Found 7 architectural improvement opportunities (non-blocking):

  1. [MEDIUM:75] Stringly-typed approval flag logic instead of enum method

    • File: src/agent/dispatcher.rs:590, src/agent/thread_ops.rs:1090
    • The allow_always = !matches!(requirement, ApprovalRequirement::Always) pattern is duplicated across 3+ places. Should extract as ApprovalRequirement::allows_always() method to follow DRY principle and CLAUDE.md preference for strong types over computed booleans.
  2. [MEDIUM:65] Optional state accessed via match/if-let chains in relay webhook handler

    • File: src/channels/web/server.rs:765-820
    • Multiple guard clauses return HTTP errors inline. Consider extracting a RelayCallbackError type with thiserror to centralize validation logic and make it testable.
  3. [MEDIUM:60] Inconsistent error handling for mutex lock failures

    • File: src/extensions/manager.rs:597 (relay_signing_secret_cache)
    • Using lock().ok()? conflates lock failure (poison) with cache miss. Lock failure should panic/expect; cache miss is normal. Consider: self.relay_signing_secret_cache.lock().expect("lock poisoned").clone()
  4. [MEDIUM:65] API leaks internal Arc/Mutex types

    • File: src/extensions/manager.rs:578-587 (relay_event_tx getter)
    • Returns Arc<tokio::sync::Mutex<Option<...>>> directly, coupling callers to internal sync strategy. Consider offering a higher-level abstraction (e.g., get_relay_event_tx() that returns sender directly).
  5. [LOW:55] Duplicated allow_always computation

    • File: src/agent/dispatcher.rs:553-591, src/agent/thread_ops.rs:1072-1099
    • Same logic appears in both approval flows. Extract as helper to ensure consistency.
  6. [LOW:50] Stringly-typed state keys without constants

    • File: src/extensions/manager.rs (throughout)
    • Keys like "relay:{}:team_id" and "relay:{}:oauth_state" are hard-coded. Define as module constants to prevent typos and enable refactoring.
  7. [LOW:45] Error context not propagated in relay initialization

    • File: src/extensions/manager.rs:4007-4014
    • client.get_signing_secret() errors are wrapped as ExtensionError::Config without preserving the error chain. Use .map_err() to add context for better debuggability.

Non-issues: No production .unwrap()/.expect() violations. All error types use thiserror. Bearer auth strategy is consistent.

@claude
Copy link

claude bot commented Mar 19, 2026

Code review - Security & Safety findings

Completed security analysis. No critical vulnerabilities found.

[MEDIUM:65] Timestamp replay window for webhook callbacks

  • File: src/channels/web/server.rs:1854-1864
  • The webhook signature verification checks timestamp freshness (5-minute window) but does not implement replay protection. An attacker could capture a valid signed webhook and replay it multiple times within the window. The HMAC signature includes the timestamp, preventing trivial replay, but distributed replay within the window is theoretically possible. Recommend: Consider adding a nonce cache or sequence number tracking if high-value operations depend on webhook uniqueness.

[MEDIUM:50] Race condition window in relay event sender access

  • File: src/channels/web/server.rs:1886-1898
  • Between checking if relay channel is active and sending the event, a concurrent deactivate_relay_channel() could remove the sender. Mitigation: Code already handles this via try_send() error return and responds with 503. This is acceptable.

[LOW:40] Sensitive bytes in Arc could expose in debug panics

  • File: src/extensions/manager.rs:relay_signing_secret_cache
  • The 32-byte signing secret is stored in Arc<Mutex<Option<Vec>>>. While unlikely to be logged, consider using zeroize crate for sensitive byte arrays to prevent accidental memory exposure on panic.

[MEDIUM:50] Validation assumption on cached signing secret

  • File: src/extensions/manager.rs:relay_signing_secret()
  • The cached secret is assumed valid because it was validated by get_signing_secret() at fetch time. Since the cache is only populated by this method, invariant holds, but the assumption is implicit. Adding a comment explaining this invariant would clarify intent.

Verified Strengths:
✅ HMAC-SHA256 signature verification uses constant-time comparison (subtle::ConstantTimeEq)
✅ Bearer token auth properly used (not API keys in headers)
✅ No .unwrap()/.expect() in production webhook code
✅ Proper error handling on all validation steps

Conclusion: Webhook refactoring is well-implemented with no blocking security issues. Replay protection is a design consideration rather than a bug.

@claude
Copy link

claude bot commented Mar 19, 2026

Code review - Performance & Production findings

Found 6 performance/production considerations:

[HIGH:80] Synchronous mutex in async hot path

  • File: src/channels/web/server.rs:771 (relay_events_handler)
  • The webhook handler calls ext_mgr.relay_signing_secret() which uses std::sync::Mutex::lock() on every incoming event. This is a blocking synchronous lock acquired in an async context, which can block the tokio runtime under contention. Recommendation: Use Arc<tokio::sync::Mutex<_>> instead of Arc<std::sync::Mutex<_>> for relay_signing_secret_cache.

[HIGH:75] Lock contention on event sender per-request

  • File: src/channels/web/server.rs:831 (relay_events_handler)
  • Every webhook event acquires async lock on relay_event_tx via ext_mgr.relay_event_tx().lock().await. Under sustained webhook traffic, this per-request lock becomes a serialization bottleneck. Recommendation: Clone sender once at activation and store in lock-free cell (e.g., AtomicOption<Arc<Sender>>) to avoid repeated lock acquisitions.

[MEDIUM:65] Channel capacity mismatch between test and production

  • File: src/extensions/manager.rs:6435 (test setup vs production)
  • Test creates mpsc::channel(1) but production uses mpsc::channel(64). This capacity-1 test channel will cause try_send() to fail immediately under load, triggering 503 responses. Test doesn't validate the queue backpressure behavior that production encounters. Recommendation: Update test to capacity 64 to match production, or document intentional difference.

[MEDIUM:50] Unnecessary cloning of 32-byte signing secret

  • File: src/extensions/manager.rs (activation and verification paths)
  • The signing secret is cloned during cache storage (*cache = Some(signing_secret)) and again on retrieval (.clone()). While 32 bytes is small, under high webhook volume this adds allocations in the hot path. Recommendation: Use Arc<[u8; 32]> instead of Vec<u8> to eliminate allocations.

[LOW:60] Lost resilience in relay service recovery

  • File: src/channels/relay/channel.rs (removed ~200 lines of SSE reconnection logic)
  • Old SSE implementation had exponential backoff (1s→60s) and token renewal. New webhook approach has no recovery if relay service is temporarily unavailable. Webhook events fail with "event queue full" until manual restart. Recommendation: Add periodic health checks (e.g., list_connections()) to detect relay service degradation.

[LOW:40] Approval tuple repeatedly extracted for channel updates

  • File: src/agent/thread_ops.rs:507-527
  • The approval request is extracted multiple times for channel status broadcasts, causing repeated field access. Recommendation: Extract tool_name/description once and reuse.

Positive notes: No unbounded loops, missing timeouts, or resource leaks detected. Event channel sized appropriately at 64-event buffer. Bearer auth prevents token injection attacks in headers.

Summary: 2 high-priority async/concurrency improvements recommended to prevent bottlenecks under sustained webhook traffic. Remaining findings are minor optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: extensions Extension management scope: tool/builtin Built-in tools size: XL 500+ changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants