chore: promote staging to staging-promote/52ca9d65-23312673755 (2026-03-19 21:10 UTC)#1428
Conversation
* feat: LRU embedding cache for workspace search (#165) Add CachedEmbeddingProvider that wraps any EmbeddingProvider with an in-memory LRU cache keyed by SHA-256(model_name + text). This avoids redundant HTTP calls when the same text is embedded multiple times (common during reindexing and repeated searches). - Cache uses HashMap + last_accessed tracking with manual LRU eviction (same pattern as llm::response_cache::CachedProvider) - Lock is never held during HTTP calls to prevent blocking - embed_batch() partitions into hits/misses and only fetches misses - Default 10,000 entries (~58 MB for 1536-dim vectors) - Configurable via EMBEDDING_CACHE_SIZE env var - Workspace.with_embeddings() auto-wraps; with_embeddings_uncached() available for tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on embedding cache - Validate embed_batch return count matches expected miss count - Replace unwrap_or_default() with proper error propagation - Fix batch eviction: run final eviction pass after insert to enforce cap - Fix test: use different-length inputs to verify ordering correctness - Reject EMBEDDING_CACHE_SIZE=0 in config validation (minimum is 1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace .expect() with proper error handling in embed_batch The all-cache-hits early-return path used .expect("all cache hits") which violates the project convention of no .unwrap()/.expect() in production code. Replaced with the same ok_or_else pattern used in the normal path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clarify memory sizing docs and use saturating_add for eviction - Update memory comments in embedding_cache.rs, config/embeddings.rs, and workspace/mod.rs to note the ~58 MB figure is payload-only (actual memory is higher due to HashMap/key/allocation overhead) - Use saturating_add(1) instead of + 1 for eviction threshold to prevent overflow if max_entries is usize::MAX Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Copilot review on embedding cache - Avoid double-clone per miss in embed_batch: move embedding into results, clone only for the cache entry - Evict per-insert instead of after all inserts to keep peak memory bounded during large batches - Clamp max_entries to at least 1 in constructor to prevent unexpected eviction behavior when set to 0 via the public API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: reduce embedding_cache module visibility to private Types are already re-exported via `pub use`, so the module itself doesn't need to be public. Reduces unnecessary API surface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address serrrfirat review feedback on embedding cache - Add TODO comment for O(n) LRU eviction scalability - Add thundering herd note at lock release in embed() - Warn when cache max_entries exceeds 100k - Use with_embeddings_uncached() in integration test - Add tests: error_does_not_pollute_cache, embed_batch_empty_input - Update README with cache-aware with_embeddings() docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent u32 wrapping in FailThenSucceedMock failure counter fetch_sub(1) wraps to u32::MAX when called past zero, silently breaking the mock for 3+ calls. Switch to load-then-store to avoid the wrapping bug in both embed() and embed_batch(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Copilot and serrrfirat review findings on embedding cache - Switch tokio::sync::Mutex to std::sync::Mutex (lock never held across .await — cheaper synchronous lock) - Extract DEFAULT_EMBEDDING_CACHE_SIZE constant to avoid 10_000 duplication between EmbeddingCacheConfig and EmbeddingsConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add all-misses batch test for embedding cache Adds embed_batch_all_misses test covering the case where every text in a batch is a cache miss — fulfilling the commitment from serrrfirat's review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: trigger CI re-check after rebase * fix: use raw [u8;32] cache keys and pre-allocate HashMap capacity Address Copilot review findings: - cache_key() now returns [u8; 32] instead of hex String, avoiding a 64-byte allocation per lookup - HashMap::with_capacity(max_entries) avoids incremental reallocation - Fix pre-existing staging compilation error in cli/routines.rs (missing max_tool_rounds/use_tools fields) [skip-regression-check] * fix: make cache accessors sync and update doc for [u8;32] keys Address Copilot review: - len(), is_empty(), clear() are now sync since they only take a std::sync::Mutex lock with no .await points - Update cache_size doc comment to reflect [u8;32] keys instead of String keys [skip-regression-check] * fix: remove clone_on_copy for [u8; 32] cache keys [skip-regression-check] * ci: add safety comments to test code for no-panics check The CI no-panics grep check cannot distinguish test code inside src/ files from production code. Add // safety: test annotations to .unwrap(), .expect(), and assert!() calls in #[cfg(test)] modules. * fix: correct cache doc and demote hit/miss logs to trace - Fix misleading "String keys" in memory comment (cache uses [u8; 32]) - Demote per-request hit/miss logs from debug to trace to reduce noise on hot paths (batch summary stays at trace too) * docs: add missing Arc import in workspace README example * perf: batch eviction in embed_batch to avoid O(n×m) cost Replace per-insert evict_lru call with a single evict_k_oldest pass that computes eviction count upfront and removes the k oldest entries in one O(n) scan. Avoids O(n×m) HashMap iterations while holding the mutex during batch inserts. * fix: cap batch cache inserts at max_entries and use O(n) selection - evict_k_oldest now uses select_nth_unstable_by_key for O(n) average partial selection instead of O(n log n) full sort - embed_batch caps cached entries at max_entries when misses exceed capacity, preventing the cache from growing unbounded - Added test: batch_exceeding_capacity_respects_max_entries * fix: flatten test assert for fmt compatibility Shorten assert message to fit single line so cargo fmt doesn't split the safety annotation onto a separate line. * fix: address review feedback and improve embedding cache (takeover #235) - Fix merge conflict: add missing allow_always field in PendingApproval - Thread EmbeddingCacheConfig through CLI memory commands so they respect EMBEDDING_CACHE_SIZE instead of silently using default (fixes #235 review) - Cap HashMap pre-allocation at min(max_entries, 1024) to avoid upfront memory waste at large cache sizes - Fix FailThenSucceedMock race: replace load+store with atomic fetch_update - Remove noisy '// safety: test' comments (40+ lines of diff noise) - Fix collapsed lines from comment removal - Simplify redundant Ok(...collect()?) to just collect() Co-Authored-By: ztsalexey <ztsalexey@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(embedding-cache): skip eviction on concurrent duplicate insert When the lock is released for the HTTP call, another caller may insert the same key. Re-check under lock and just update the existing entry without evicting, avoiding unnecessary cache churn under concurrency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ztsalexey <alexthebuildr@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: ztsalexey <ztsalexey@users.noreply.github.com>
* feat: structured fallback deliverables for failed/stuck jobs (#221) When a job fails or gets stuck, build a FallbackDeliverable that captures partial results, action statistics, cost, timing, and repair attempts. This replaces opaque error strings with structured data users can act on. - Add FallbackDeliverable, LastAction, ActionStats types in context/fallback.rs - Store fallback in JobContext.metadata["fallback_deliverable"] on failure - Surface fallback in job_status tool output and SSE job_result events - Update mark_failed() and mark_stuck() in worker to build fallback - 8 unit tests covering zero/mixed actions, truncation, timing, serialization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on fallback deliverables - Fix doc comment: "200 chars" -> "200 bytes (UTF-8 safe)" since truncate_str operates on byte length, not character count. - Add code comment documenting that SSE fallback_deliverable is currently always None (forward-compatible infrastructure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: take Option<&FallbackDeliverable> instead of &Option<…> Addresses Gemini review feedback: idiomatic Rust prefers Option<&T> over &Option<T> for borrowed optional values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard against non-object metadata and add fallback test - store_fallback_in_metadata now resets metadata to {} when it's any non-object type (string, array, number), not just null. Prevents panic on index assignment. - Add test_job_status_includes_fallback_deliverable to verify the fallback field is surfaced in job_status tool output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use sanitized output in fallback preview + add integration tests Security fix: FallbackDeliverable::build() now uses output_sanitized instead of output_raw, preventing secrets/PII from leaking through the job_status tool and SSE job_result events. Also adds: - test_fallback_uses_sanitized_output: proves raw secrets don't leak - test_store_fallback_in_metadata_roundtrip: full serialize/deserialize - test_store_fallback_handles_non_object_metadata: edge case coverage - test_store_fallback_none_is_noop: None input is safe Addresses serrrfirat review feedback on PR #236. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden fallback deliverables against review findings - Truncate failure_reason to 1000 bytes to prevent metadata bloat - Add tracing::warn on fallback serialization failure (was silently discarded) - Fix module/struct docs to cover stuck jobs, remove stale SSE claim - Fix job.rs test to use real FallbackDeliverable field names - Add tests for failure_reason truncation and completed_at=None elapsed time - Fix pre-existing clippy warning in settings.rs (field_reassign_with_default) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Copilot review findings on fallback deliverables - Fix output_raw/output_sanitized field swap in ActionRecord::succeed() so sanitized data actually goes into the sanitized field (security) - Return None instead of empty Memory when get_memory fails in build_fallback, with tracing::warn for observability - Replace manual elapsed calculation with ctx.elapsed() which already clamps negative durations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve rebase conflicts and update tests for parameter swap - Add fallback field to SseEvent::JobResult in job_monitor - Fix type annotation in fallback deliverable test - Update test_action_record_succeed_sets_fields for new parameter order - Use create_job_for_user in test (API changed on main) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: trigger CI re-check after rebase * fix: fall back to error message for failed action output_preview When the last action is a failed tool call, output_sanitized is None, leaving output_preview empty. Now falls back to the action's error message so users see what went wrong. [skip-regression-check] * ci: add safety comments to test code for no-panics check The CI no-panics grep check cannot distinguish test code inside src/ files from production code. Add // safety: test annotations to .unwrap(), .expect(), and assert!() calls in #[cfg(test)] modules. * fix: clarify succeed() doc and avoid clone in output_preview - Fix doc comment: output_raw is stored as pretty-printed JSON string, not a raw JSON value - Borrow string slice directly in fallback preview to avoid cloning potentially large sanitized outputs before truncation * refactor: reuse floor_char_boundary in truncate_str Replace hand-rolled UTF-8 boundary logic with existing crate::util::floor_char_boundary to reduce duplication. * fix: rename SSE fallback field to fallback_deliverable for consistency The SSE JobResult field was named `fallback` while everywhere else (metadata key, job_status tool) uses `fallback_deliverable`. Align the SSE wire format to avoid forcing clients to handle two names. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Code reviewFound 9 issues:
The cache holds Vec embeddings (typically 1536 elements = ~6KB each). Every cache hit calls .clone() on the entire vector (entry.embedding.clone()), causing significant memory churn. With a 10,000-entry cache under sustained access, this defeats the purpose of caching.
The evict_lru() function iterates the entire HashMap to find minimum last_accessed time. At 10,000 entries, this is 10,000 iterations per eviction. A TODO comment acknowledges this but it's a production performance issue.
Multiple concurrent callers requesting the same uncached embedding each call the inner provider independently. This is documented as acceptable but under load (concurrent workspace searches on same text), it multiplies API calls by concurrency factor. Consider request coalescing.
Cloning entire strings just to fetch missing embeddings. For large batch operations, pass slice indices or borrowed references instead.
Using Mutex::lock() without timeout. If mutex becomes poisoned, unwrap_or_else(|e| e.into_inner()) will still block indefinitely on subsequent attempts. Use try_lock() with explicit error handling or a timeout.
Two sequential DB calls (get_memory, get_context) during job failure/stuck transition. If these are independent, use tokio::join! for parallelism instead.
At 10,000 entries × 1536 dimensions × 4 bytes/float ≈ 58MB+ (before HashMap overhead, per-entry Instant timestamps, etc.). No mechanism to prevent or warn about hitting memory limits. Consider proactive capacity checks.
Truncates failure_reason (1000 bytes) and output_preview (200 bytes), but serializes full ActionStats untruncated. Edge case: jobs with millions of actions could produce very large metadata JSON, impacting database storage and SSE event sizes.
Comment states fallback_deliverable is "currently always None" but it's already being populated from JobContext.metadata. Update the comment to reflect actual behavior. |
* Make hosted OAuth and MCP auth generic * Address PR feedback and lint issues * Suppress built-in Google secret in hosted proxy flows * Align hosted OAuth secret suppression with proxy config * Harden hosted OAuth callback helpers * Tighten hosted OAuth URL rewriting
…4063 chore: promote staging to staging-promote/65062f3c-23317058602 (2026-03-19 23:06 UTC)
2326302
into
staging-promote/52ca9d65-23312673755
Auto-promotion from staging CI
Batch range:
71f41dd12363497372864bc6eb3f7c334e05fd52..65062f3cc069ebbd29f6d9be874ae5eff796e43aPromotion branch:
staging-promote/65062f3c-23317058602Base:
staging-promote/52ca9d65-23312673755Triggered by: Staging CI batch at 2026-03-19 21:10 UTC
Commits in this batch (4):
Current commits in this promotion (3)
Current base:
staging-promote/52ca9d65-23312673755Current head:
staging-promote/65062f3c-23317058602Current range:
origin/staging-promote/52ca9d65-23312673755..origin/staging-promote/65062f3c-23317058602Auto-updated by staging promotion metadata workflow
Waiting for gates:
Auto-created by staging-ci workflow