Skip to content

chore: promote staging to staging-promote/52ca9d65-23312673755 (2026-03-19 21:10 UTC)#1428

Merged
henrypark133 merged 4 commits intostaging-promote/52ca9d65-23312673755from
staging-promote/65062f3c-23317058602
Mar 20, 2026
Merged

chore: promote staging to staging-promote/52ca9d65-23312673755 (2026-03-19 21:10 UTC)#1428
henrypark133 merged 4 commits intostaging-promote/52ca9d65-23312673755from
staging-promote/65062f3c-23317058602

Conversation

@ironclaw-ci
Copy link
Contributor

@ironclaw-ci ironclaw-ci bot commented Mar 19, 2026

Auto-promotion from staging CI

Batch range: 71f41dd12363497372864bc6eb3f7c334e05fd52..65062f3cc069ebbd29f6d9be874ae5eff796e43a
Promotion branch: staging-promote/65062f3c-23317058602
Base: staging-promote/52ca9d65-23312673755
Triggered by: Staging CI batch at 2026-03-19 21:10 UTC

Commits in this batch (4):

Current commits in this promotion (3)

Current base: staging-promote/52ca9d65-23312673755
Current head: staging-promote/65062f3c-23317058602
Current range: origin/staging-promote/52ca9d65-23312673755..origin/staging-promote/65062f3c-23317058602

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

ilblackdragon and others added 2 commits March 19, 2026 13:37
* feat: LRU embedding cache for workspace search (#165)

Add CachedEmbeddingProvider that wraps any EmbeddingProvider with an
in-memory LRU cache keyed by SHA-256(model_name + text). This avoids
redundant HTTP calls when the same text is embedded multiple times
(common during reindexing and repeated searches).

- Cache uses HashMap + last_accessed tracking with manual LRU eviction
  (same pattern as llm::response_cache::CachedProvider)
- Lock is never held during HTTP calls to prevent blocking
- embed_batch() partitions into hits/misses and only fetches misses
- Default 10,000 entries (~58 MB for 1536-dim vectors)
- Configurable via EMBEDDING_CACHE_SIZE env var
- Workspace.with_embeddings() auto-wraps; with_embeddings_uncached()
  available for tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments on embedding cache

- Validate embed_batch return count matches expected miss count
- Replace unwrap_or_default() with proper error propagation
- Fix batch eviction: run final eviction pass after insert to enforce cap
- Fix test: use different-length inputs to verify ordering correctness
- Reject EMBEDDING_CACHE_SIZE=0 in config validation (minimum is 1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace .expect() with proper error handling in embed_batch

The all-cache-hits early-return path used .expect("all cache hits") which
violates the project convention of no .unwrap()/.expect() in production
code. Replaced with the same ok_or_else pattern used in the normal path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clarify memory sizing docs and use saturating_add for eviction

- Update memory comments in embedding_cache.rs, config/embeddings.rs,
  and workspace/mod.rs to note the ~58 MB figure is payload-only
  (actual memory is higher due to HashMap/key/allocation overhead)
- Use saturating_add(1) instead of + 1 for eviction threshold to
  prevent overflow if max_entries is usize::MAX

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review on embedding cache

- Avoid double-clone per miss in embed_batch: move embedding into
  results, clone only for the cache entry
- Evict per-insert instead of after all inserts to keep peak memory
  bounded during large batches
- Clamp max_entries to at least 1 in constructor to prevent unexpected
  eviction behavior when set to 0 via the public API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: reduce embedding_cache module visibility to private

Types are already re-exported via `pub use`, so the module itself
doesn't need to be public. Reduces unnecessary API surface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address serrrfirat review feedback on embedding cache

- Add TODO comment for O(n) LRU eviction scalability
- Add thundering herd note at lock release in embed()
- Warn when cache max_entries exceeds 100k
- Use with_embeddings_uncached() in integration test
- Add tests: error_does_not_pollute_cache, embed_batch_empty_input
- Update README with cache-aware with_embeddings() docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent u32 wrapping in FailThenSucceedMock failure counter

fetch_sub(1) wraps to u32::MAX when called past zero, silently
breaking the mock for 3+ calls. Switch to load-then-store to avoid
the wrapping bug in both embed() and embed_batch().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot and serrrfirat review findings on embedding cache

- Switch tokio::sync::Mutex to std::sync::Mutex (lock never held across
  .await — cheaper synchronous lock)
- Extract DEFAULT_EMBEDDING_CACHE_SIZE constant to avoid 10_000 duplication
  between EmbeddingCacheConfig and EmbeddingsConfig

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add all-misses batch test for embedding cache

Adds embed_batch_all_misses test covering the case where every text in a
batch is a cache miss — fulfilling the commitment from serrrfirat's review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: trigger CI re-check after rebase

* fix: use raw [u8;32] cache keys and pre-allocate HashMap capacity

Address Copilot review findings:
- cache_key() now returns [u8; 32] instead of hex String, avoiding a
  64-byte allocation per lookup
- HashMap::with_capacity(max_entries) avoids incremental reallocation
- Fix pre-existing staging compilation error in cli/routines.rs
  (missing max_tool_rounds/use_tools fields)

[skip-regression-check]

* fix: make cache accessors sync and update doc for [u8;32] keys

Address Copilot review:
- len(), is_empty(), clear() are now sync since they only take a
  std::sync::Mutex lock with no .await points
- Update cache_size doc comment to reflect [u8;32] keys instead of
  String keys

[skip-regression-check]

* fix: remove clone_on_copy for [u8; 32] cache keys

[skip-regression-check]

* ci: add safety comments to test code for no-panics check

The CI no-panics grep check cannot distinguish test code inside
src/ files from production code. Add // safety: test annotations
to .unwrap(), .expect(), and assert!() calls in #[cfg(test)] modules.

* fix: correct cache doc and demote hit/miss logs to trace

- Fix misleading "String keys" in memory comment (cache uses [u8; 32])
- Demote per-request hit/miss logs from debug to trace to reduce noise
  on hot paths (batch summary stays at trace too)

* docs: add missing Arc import in workspace README example

* perf: batch eviction in embed_batch to avoid O(n×m) cost

Replace per-insert evict_lru call with a single evict_k_oldest pass
that computes eviction count upfront and removes the k oldest entries
in one O(n) scan. Avoids O(n×m) HashMap iterations while holding the
mutex during batch inserts.

* fix: cap batch cache inserts at max_entries and use O(n) selection

- evict_k_oldest now uses select_nth_unstable_by_key for O(n) average
  partial selection instead of O(n log n) full sort
- embed_batch caps cached entries at max_entries when misses exceed
  capacity, preventing the cache from growing unbounded
- Added test: batch_exceeding_capacity_respects_max_entries

* fix: flatten test assert for fmt compatibility

Shorten assert message to fit single line so cargo fmt doesn't
split the safety annotation onto a separate line.

* fix: address review feedback and improve embedding cache (takeover #235)

- Fix merge conflict: add missing allow_always field in PendingApproval
- Thread EmbeddingCacheConfig through CLI memory commands so they respect
  EMBEDDING_CACHE_SIZE instead of silently using default (fixes #235 review)
- Cap HashMap pre-allocation at min(max_entries, 1024) to avoid upfront
  memory waste at large cache sizes
- Fix FailThenSucceedMock race: replace load+store with atomic fetch_update
- Remove noisy '// safety: test' comments (40+ lines of diff noise)
- Fix collapsed lines from comment removal
- Simplify redundant Ok(...collect()?) to just collect()

Co-Authored-By: ztsalexey <ztsalexey@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(embedding-cache): skip eviction on concurrent duplicate insert

When the lock is released for the HTTP call, another caller may insert
the same key. Re-check under lock and just update the existing entry
without evicting, avoiding unnecessary cache churn under concurrency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: ztsalexey <alexthebuildr@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ztsalexey <ztsalexey@users.noreply.github.com>
* feat: structured fallback deliverables for failed/stuck jobs (#221)

When a job fails or gets stuck, build a FallbackDeliverable that captures
partial results, action statistics, cost, timing, and repair attempts.
This replaces opaque error strings with structured data users can act on.

- Add FallbackDeliverable, LastAction, ActionStats types in context/fallback.rs
- Store fallback in JobContext.metadata["fallback_deliverable"] on failure
- Surface fallback in job_status tool output and SSE job_result events
- Update mark_failed() and mark_stuck() in worker to build fallback
- 8 unit tests covering zero/mixed actions, truncation, timing, serialization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments on fallback deliverables

- Fix doc comment: "200 chars" -> "200 bytes (UTF-8 safe)" since
  truncate_str operates on byte length, not character count.
- Add code comment documenting that SSE fallback_deliverable is
  currently always None (forward-compatible infrastructure).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: take Option<&FallbackDeliverable> instead of &Option<…>

Addresses Gemini review feedback: idiomatic Rust prefers
Option<&T> over &Option<T> for borrowed optional values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: guard against non-object metadata and add fallback test

- store_fallback_in_metadata now resets metadata to {} when it's any
  non-object type (string, array, number), not just null. Prevents
  panic on index assignment.
- Add test_job_status_includes_fallback_deliverable to verify the
  fallback field is surfaced in job_status tool output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use sanitized output in fallback preview + add integration tests

Security fix: FallbackDeliverable::build() now uses output_sanitized
instead of output_raw, preventing secrets/PII from leaking through
the job_status tool and SSE job_result events.

Also adds:
- test_fallback_uses_sanitized_output: proves raw secrets don't leak
- test_store_fallback_in_metadata_roundtrip: full serialize/deserialize
- test_store_fallback_handles_non_object_metadata: edge case coverage
- test_store_fallback_none_is_noop: None input is safe

Addresses serrrfirat review feedback on PR #236.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: harden fallback deliverables against review findings

- Truncate failure_reason to 1000 bytes to prevent metadata bloat
- Add tracing::warn on fallback serialization failure (was silently discarded)
- Fix module/struct docs to cover stuck jobs, remove stale SSE claim
- Fix job.rs test to use real FallbackDeliverable field names
- Add tests for failure_reason truncation and completed_at=None elapsed time
- Fix pre-existing clippy warning in settings.rs (field_reassign_with_default)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review findings on fallback deliverables

- Fix output_raw/output_sanitized field swap in ActionRecord::succeed()
  so sanitized data actually goes into the sanitized field (security)
- Return None instead of empty Memory when get_memory fails in
  build_fallback, with tracing::warn for observability
- Replace manual elapsed calculation with ctx.elapsed() which already
  clamps negative durations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve rebase conflicts and update tests for parameter swap

- Add fallback field to SseEvent::JobResult in job_monitor
- Fix type annotation in fallback deliverable test
- Update test_action_record_succeed_sets_fields for new parameter order
- Use create_job_for_user in test (API changed on main)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: trigger CI re-check after rebase

* fix: fall back to error message for failed action output_preview

When the last action is a failed tool call, output_sanitized is None,
leaving output_preview empty. Now falls back to the action's error
message so users see what went wrong.

[skip-regression-check]

* ci: add safety comments to test code for no-panics check

The CI no-panics grep check cannot distinguish test code inside
src/ files from production code. Add // safety: test annotations
to .unwrap(), .expect(), and assert!() calls in #[cfg(test)] modules.

* fix: clarify succeed() doc and avoid clone in output_preview

- Fix doc comment: output_raw is stored as pretty-printed JSON string,
  not a raw JSON value
- Borrow string slice directly in fallback preview to avoid cloning
  potentially large sanitized outputs before truncation

* refactor: reuse floor_char_boundary in truncate_str

Replace hand-rolled UTF-8 boundary logic with existing
crate::util::floor_char_boundary to reduce duplication.

* fix: rename SSE fallback field to fallback_deliverable for consistency

The SSE JobResult field was named `fallback` while everywhere else
(metadata key, job_status tool) uses `fallback_deliverable`. Align
the SSE wire format to avoid forcing clients to handle two names.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel/cli TUI / CLI channel scope: channel/web Web gateway channel scope: tool/builtin Built-in tools scope: workspace Persistent memory / workspace scope: orchestrator Container orchestrator scope: worker Container worker scope: docs Documentation size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 19, 2026
@claude
Copy link

claude bot commented Mar 19, 2026

Code review

Found 9 issues:

  1. [CRITICAL:100] Embedding cache clones embeddings on every hit

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L169-L179

The cache holds Vec embeddings (typically 1536 elements = ~6KB each). Every cache hit calls .clone() on the entire vector (entry.embedding.clone()), causing significant memory churn. With a 10,000-entry cache under sustained access, this defeats the purpose of caching.

  1. [HIGH:100] O(n) LRU eviction on every insertion at capacity

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L108-L121

The evict_lru() function iterates the entire HashMap to find minimum last_accessed time. At 10,000 entries, this is 10,000 iterations per eviction. A TODO comment acknowledges this but it's a production performance issue.

  1. [HIGH:100] Thundering herd: concurrent requests for same uncached key

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L2235-L2245

Multiple concurrent callers requesting the same uncached embedding each call the inner provider independently. This is documented as acceptable but under load (concurrent workspace searches on same text), it multiplies API calls by concurrency factor. Consider request coalescing.

  1. [HIGH:75] Unnecessary cloning of strings in embed_batch miss path

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L246-L250

Cloning entire strings just to fetch missing embeddings. For large batch operations, pass slice indices or borrowed references instead.

  1. [MEDIUM:100] No timeout on Mutex::lock() in cache operations

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L167

Using Mutex::lock() without timeout. If mutex becomes poisoned, unwrap_or_else(|e| e.into_inner()) will still block indefinitely on subsequent attempts. Use try_lock() with explicit error handling or a timeout.

  1. [MEDIUM:100] Sequential async DB calls in failure path

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/worker/job.rs#L1024-L1041

Two sequential DB calls (get_memory, get_context) during job failure/stuck transition. If these are independent, use tokio::join! for parallelism instead.

  1. [MEDIUM:75] No proactive memory limit tracking for embeddings

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/workspace/embedding_cache.rs#L70

At 10,000 entries × 1536 dimensions × 4 bytes/float ≈ 58MB+ (before HashMap overhead, per-entry Instant timestamps, etc.). No mechanism to prevent or warn about hitting memory limits. Consider proactive capacity checks.

  1. [MEDIUM:75] FallbackDeliverable serialization without size cap on action metadata

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/context/fallback.rs#L56-L100

Truncates failure_reason (1000 bytes) and output_preview (200 bytes), but serializes full ActionStats untruncated. Edge case: jobs with millions of actions could produce very large metadata JSON, impacting database storage and SSE event sizes.

  1. [MEDIUM:50] Incomplete forward-compatibility documentation

https://github.com/anthropics/ironclaw/blob/7bbc0463b2444f726e3c916e2e22e293466e1da5/src/orchestrator/api.rs#L338-L342

Comment states fallback_deliverable is "currently always None" but it's already being populated from JobContext.metadata. Update the comment to reflect actual behavior.

* Make hosted OAuth and MCP auth generic

* Address PR feedback and lint issues

* Suppress built-in Google secret in hosted proxy flows

* Align hosted OAuth secret suppression with proxy config

* Harden hosted OAuth callback helpers

* Tighten hosted OAuth URL rewriting
…4063

chore: promote staging to staging-promote/65062f3c-23317058602 (2026-03-19 23:06 UTC)
@github-actions github-actions bot added scope: llm LLM integration scope: extensions Extension management labels Mar 20, 2026
@henrypark133 henrypark133 merged commit 2326302 into staging-promote/52ca9d65-23312673755 Mar 20, 2026
13 checks passed
@henrypark133 henrypark133 deleted the staging-promote/65062f3c-23317058602 branch March 20, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/cli TUI / CLI channel scope: channel/web Web gateway channel scope: docs Documentation scope: extensions Extension management scope: llm LLM integration scope: orchestrator Container orchestrator scope: tool/builtin Built-in tools scope: worker Container worker scope: workspace Persistent memory / workspace size: XL 500+ changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants