chore: promote staging to staging-promote/806d4028-23330265305 (2026-03-20 06:16 UTC) by ironclaw-ci[bot] · Pull Request #1456 · nearai/ironclaw

ironclaw-ci · 2026-03-20T06:16:11Z

Auto-promotion from staging CI

Batch range: c4ab382522c86e7e19d55fee760b125fb1970518..b952d229f941298af5748d421edca6513382f7f5
Promotion branch: staging-promote/b952d229-23331469361
Base: staging-promote/806d4028-23330265305
Triggered by: Staging CI batch at 2026-03-20 06:16 UTC

Commits in this batch (10):

cac6f40 Add owner-scoped permissions for full-job routines (Add owner-scoped permissions for full-job routines #1440)
6b0f84b perf: use Arc in embedding cache to avoid clones on miss path (perf: use Arc in embedding cache to avoid clones on miss path #1438)
8920322 fix: staging CI triage — consolidate retry parsing, fix flaky tests, add docs (fix: staging CI triage — consolidate retry parsing, fix flaky tests, add docs #1427)
8526cde fix: restore libSQL vector search with dynamic dimensions (fix: restore libSQL vector search with dynamic dimensions #1393)
455f543 fix(routines): surface errors when sandbox unavailable for full_job routines (fix(routines): surface errors when sandbox unavailable for full_job routines #769)
3a52334 fix: f32→f64 precision artifact in temperature causes provider 400 errors (fix: f32→f64 precision artifact in temperature causes provider 400 errors #1450)
806d402 feat: chat onboarding and routine advisor (feat: chat onboarding and routine advisor #927)
31c3b5b feat(agent): activate stuck_threshold for time-based stuck job detection (feat(agent): activate stuck_threshold for time-based stuck job detection #1234)
ef3d769 fix(security): validate embedding base URLs to prevent SSRF (fix(security): validate embedding base URLs to prevent SSRF #1221)
b952d22 fix: prefer execution-local message routing metadata (fix: prefer execution-local message routing metadata #1449)

Current commits in this promotion (3)

Current base: staging-promote/806d4028-23330265305
Current head: staging-promote/b952d229-23331469361
Current range: origin/staging-promote/806d4028-23330265305..origin/staging-promote/b952d229-23331469361

31c3b5b feat(agent): activate stuck_threshold for time-based stuck job detection (feat(agent): activate stuck_threshold for time-based stuck job detection #1234)
ef3d769 fix(security): validate embedding base URLs to prevent SSRF (fix(security): validate embedding base URLs to prevent SSRF #1221)
b952d22 fix: prefer execution-local message routing metadata (fix: prefer execution-local message routing metadata #1449)

Auto-updated by staging promotion metadata workflow

Waiting for gates:

Tests: pending
E2E: pending
Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

…ion (#1234) * feat(agent): activate stuck_threshold for time-based stuck job detection (#1223) The stuck_threshold field on DefaultSelfRepair was defined but never used (marked #[allow(dead_code)]). Jobs that got stuck in InProgress without transitioning to Stuck state (e.g., deadlock, unhandled timeout) were never detected by self-repair. Changes: - Add find_stuck_jobs_with_threshold() to ContextManager that detects InProgress jobs running longer than the threshold - Wire stuck_threshold into detect_stuck_jobs() so it uses threshold-based detection alongside explicit Stuck state detection - Remove dead_code annotation from stuck_threshold - Accept InProgress jobs in the stuck job detection filter Configurable via AGENT_STUCK_THRESHOLD_SECS (default: 300s). Closes #1223 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(agent): address PR #1234 review feedback for stuck_threshold - Transition InProgress jobs to Stuck before returning them from detect_stuck_jobs(), so attempt_recovery() (which requires Stuck state) works correctly on threshold-detected jobs - Add detect-and-repair E2E test covering the full InProgress -> Stuck -> recovery -> InProgress cycle - Rename idle_threshold -> elapsed_threshold in find_stuck_jobs_with_threshold for clarity - Add `use std::time::Duration` import and remove fully qualified paths - Update CLAUDE.md to reflect that stuck_threshold is now actively used Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: measure stuck_duration from Stuck transition, handle InProgress→Stuck in repair - Fix stuck_duration computation to use the most recent Stuck transition timestamp instead of started_at, preventing jobs that ran for hours before becoming stuck from immediately exceeding the threshold - Fix last_activity to also use the Stuck transition timestamp - Transition InProgress jobs to Stuck before calling attempt_recovery() in repair_stuck_job(), since attempt_recovery() requires JobState::Stuck - Add regression test verifying a recently-stuck job with old started_at is not misdetected as exceeding a 5-minute threshold Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(agent): address Copilot review comments on PR #1234 - Add comment in find_stuck_jobs_with_threshold() noting that started_at is not reset on Stuck->InProgress recovery, which may cause false positives for recovered jobs. Suggests tracking in_progress_since or using the most recent StateTransition as a future improvement. - Fix misleading test comment in stuck_duration_measured_from_stuck_transition test: explicitly Stuck jobs are always returned regardless of threshold. The test verifies stuck_duration is near-zero, not that the job is excluded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>

* fix(security): validate embedding base URLs to prevent SSRF (#1103) User-configurable base URLs (OLLAMA_BASE_URL, EMBEDDING_BASE_URL) were passed directly to reqwest with no validation, allowing SSRF attacks against cloud metadata endpoints, internal services, or file:// URIs. Adds validate_base_url() that rejects: - Non-HTTP(S) schemes (file://, ftp://) - HTTP to non-localhost destinations (prevents credential leakage) - HTTPS to private/loopback/link-local/metadata IPs (169.254.169.254, 10.x, 192.168.x, 172.16-31.x, CGN 100.64/10) - IPv4-mapped IPv6 bypass attempts Validation runs at config resolution time so bad URLs fail at startup. Closes #1103 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): add DNS resolution check, ULA blocking, and NEARAI_BASE_URL validation Address review feedback: - Resolve hostnames to IPs and check all resolved addresses against the blocklist (prevents DNS-based SSRF bypass where attacker uses a domain pointing to 169.254.169.254) - Add IPv6 Unique Local Address (fc00::/7) to the blocklist - Validate NEARAI_BASE_URL in llm config (was missing — especially dangerous since bearer tokens are forwarded to the configured URL) - Allow DNS resolution failure gracefully (don't block startup when DNS is temporarily unavailable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): add SSRF validation to all base URL chokepoints - Add validate_base_url() in resolve_registry_provider() covering all LLM providers (OpenAI, Anthropic, Ollama, openai_compatible, etc.) - Add validate_base_url() for NEARAI_AUTH_URL in LlmConfig::resolve() - Add validate_base_url() for TRANSCRIPTION_BASE_URL in TranscriptionConfig - Add missing SSRF test cases: CGN range, IPv4-mapped IPv6, ULA IPv6, URLs with credentials, empty/invalid URLs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: re-trigger CI with latest changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: trigger new run with skip-regression-check label Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): validate embedding base URLs to prevent SSRF (#1103) User-configurable base URLs (OLLAMA_BASE_URL, EMBEDDING_BASE_URL) were passed directly to reqwest with no validation, allowing SSRF attacks against cloud metadata endpoints, internal services, or file:// URIs. Adds validate_base_url() that rejects: - Non-HTTP(S) schemes (file://, ftp://) - HTTP to non-localhost destinations (prevents credential leakage) - HTTPS to private/loopback/link-local/metadata IPs (169.254.169.254, 10.x, 192.168.x, 172.16-31.x, CGN 100.64/10) - IPv4-mapped IPv6 bypass attempts Validation runs at config resolution time so bad URLs fail at startup. Closes #1103 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): add DNS resolution check, ULA blocking, and NEARAI_BASE_URL validation Address review feedback: - Resolve hostnames to IPs and check all resolved addresses against the blocklist (prevents DNS-based SSRF bypass where attacker uses a domain pointing to 169.254.169.254) - Add IPv6 Unique Local Address (fc00::/7) to the blocklist - Validate NEARAI_BASE_URL in llm config (was missing — especially dangerous since bearer tokens are forwarded to the configured URL) - Allow DNS resolution failure gracefully (don't block startup when DNS is temporarily unavailable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): add SSRF validation to all base URL chokepoints - Add validate_base_url() in resolve_registry_provider() covering all LLM providers (OpenAI, Anthropic, Ollama, openai_compatible, etc.) - Add validate_base_url() for NEARAI_AUTH_URL in LlmConfig::resolve() - Add validate_base_url() for TRANSCRIPTION_BASE_URL in TranscriptionConfig - Add missing SSRF test cases: CGN range, IPv4-mapped IPv6, ULA IPv6, URLs with credentials, empty/invalid URLs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: re-trigger CI with latest changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: trigger new run with skip-regression-check label Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>

* fix: prefer execution-local message routing metadata * test: cover message routing fallback metadata * refactor: simplify message target resolution * fix: ignore stale channel defaults for notify user metadata

claude · 2026-03-20T06:18:55Z

Code review

Found 7 issues:

[HIGH:HIGH] N+1 database queries in stuck job detection loop

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/agent/self_repair.rs#L111-L120

The detect_stuck_jobs() function calls get_context(job_id) twice per stuck job: first at line 114 to check state, then again at line 159 after potential state transition. This creates an N+1 query pattern that scales with the number of stuck jobs. Consider keeping the context from the first fetch or combining the operations.

[HIGH:HIGH] Blocking DNS resolution during config load with no timeout

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/config/helpers.rs#L285-L292

The validate_base_url() function performs blocking DNS resolution via to_socket_addrs() without a timeout. Called during startup from multiple locations (embeddings.rs:382, llm.rs:685/712/725, transcription.rs:749), a slow or misconfigured DNS lookup can block application startup indefinitely. Add a timeout or defer validation to async runtime.

[MEDIUM:HIGH] Stringly-typed routing metadata with duplicated keys

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/agent/agent_loop.rs#L123-L132

The chat_tool_execution_metadata() function builds JSON with hardcoded string keys ("notify_channel", "notify_user", "notify_thread_id", "notify_metadata"). Per CLAUDE.md's "prefer strong types over strings" guidance, these should be a strongly-typed struct (e.g., ExecutionRoutingMetadata) serialized to JSON, enabling compile-time verification of all callers and preventing key name divergence.

[MEDIUM:HIGH] Potential infinite stuck job re-detection loop

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/context/manager.rs#L797-L809

When a recovered job transitions from Stuck back to InProgress, its original started_at timestamp is not reset. The next repair scan will re-detect it as stuck (elapsed time still > threshold), creating a re-detection loop. The code acknowledges this (lines 798-801) but marks it as "future improvement." Track in_progress_since separately or use the most recent StateTransition with to == InProgress to avoid false positives.

[MEDIUM:MEDIUM] Repeated string allocations in message tool helpers

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/tools/builtin/message.rs#L872-L910

Helper functions metadata_string(), metadata_notify_user(), and routing resolution create multiple String clones and allocations on every message send. While not in a hot loop, optimize by borrowing where possible or reducing intermediate allocations.

[LOW:HIGH] Silent TOCTOU on InProgress→Stuck state transition

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/agent/self_repair.rs#L159-L177

After transitioning a job to Stuck at line 134, the code re-fetches context at line 159 to capture the new transition timestamp. If this re-fetch fails, the job silently disappears from stuck_jobs output but the transition already occurred, making the stuck job invisible to repair. Log at error! level (not warn!) when re-fetch fails after successful transition.

[LOW:MEDIUM] Non-obvious magic string filtering in metadata lookup

https://github.com/anthropics/ironclaw/blob/5c94138a0794948a601aa704664d7e8439a09c4c/src/tools/builtin/message.rs#L79-L81

The metadata_notify_user() helper filters out the string value "default" as a special case. This filtering is not documented in the function signature and violates the CLAUDE.md principle of "comments for non-obvious logic only." Use a named constant or enum variant to make the magic value explicit.

zmanian and others added 3 commits March 19, 2026 22:36

fix: prefer execution-local message routing metadata (#1449)

b952d22

* fix: prefer execution-local message routing metadata * test: cover message routing fallback metadata * refactor: simplify message target resolution * fix: ignore stale channel defaults for notify user metadata

ironclaw-ci bot added the staging-promotion label Mar 20, 2026

github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools scope: docs Documentation size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote staging to staging-promote/806d4028-23330265305 (2026-03-20 06:16 UTC)#1456

chore: promote staging to staging-promote/806d4028-23330265305 (2026-03-20 06:16 UTC)#1456
ironclaw-ci[bot] wants to merge 3 commits intostaging-promote/806d4028-23330265305from
staging-promote/b952d229-23331469361

ironclaw-ci bot commented Mar 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

claude bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ironclaw-ci bot commented Mar 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-promotion from staging CI

Commits in this batch (10):

Current commits in this promotion (3)

Uh oh!

claude bot commented Mar 20, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ironclaw-ci bot commented Mar 20, 2026 •

edited by github-actions bot

Loading