chore: sync public OSS from private main (2026-04-03) by abbudjoe · Pull Request #13 · fawxai/fawx

abbudjoe · 2026-04-03T02:58:32Z

Syncs 164 commits of changes from the private downstream repo.

Highlights

Step 13 refactor (slices 13.1-13.4b): streaming bridge, retry policy, request builder, compaction extraction
Detached headless stabilization: replay-safe tool history, structured evidence, workspace-root rebinding
Skill lifecycle release fixes: sign command, built/installed/server-loaded semantics, docs alignment, e2e verification
Chat attachments support
Tool error surfacing + loop hardening
Backlog cleanup: roadmap reconciliation

Cleanup

llama-cpp-sys removed from workspace (private-only crate)
cfg(feature = "llama-cpp") gates stripped from fx-llm
Author metadata normalized
docs/architecture excluded (proprietary)

Syncs 164 commits worth of changes from private main: - Step 13 refactor (streaming bridge, retry policy, request builder, compaction extraction) - Detached headless stabilization - Skill lifecycle release fixes (sign command, semantics, docs, e2e verification) - Chat attachments support - Tool error surfacing + loop hardening - Backlog cleanup and roadmap reconciliation

Critical fixes: - Replace unwrap() with expect() in create_user_input() helper (issue #1) - Add comprehensive documentation that WASM tests verify infrastructure only, not execution (issue #2) - Add TODO comments referencing PR #179 for real WASM runtime (issue #2) - Add timeout and multi_thread flavor to concurrent audit test to prevent deadlocks (issue #3) - Fix tautological assertions in edge case tests - now verify specific behavior (issue #4) High priority fixes: - Extend MockLlmProvider with 4 error types: ServiceUnavailable, RateLimitExceeded, Timeout, MalformedResponse (issue #5) - Add comprehensive doc comments to MockLlmProvider explaining matching strategy and thread-safety (issue #12) - Strengthen prompt injection test to require IntentCategory::Conversation (issue #6) - Add test_audit_hash_chain_tampering_detection test (issue #7) - Add test_skill_network_capability_denied test for runtime capability enforcement (issue #8) Medium priority fixes: - Extract all magic number encryption keys to named constants (issue #9) - Add comment in Cargo.toml explaining intentional E2E test dependencies (issue #10) - Make retry backoff timing test more robust with 80ms threshold instead of 100ms (issue #11) Low priority improvements: - Rename test_policy_allow_with_confirmation → test_policy_requires_confirmation_for_destructive_actions (issue #13) - Add task IDs to concurrent audit test events for better debugging (issue #14) - Remove unused test_storage_round_trip helper (was addressing issue #15 but simplified instead) All tests pass (28/28), clippy clean with -D warnings, formatted with rustfmt.

* docs: implementation spec for #1102 — kernel/loadable boundary tests * docs: implementation spec for #1104 — budget attack surface analysis * fix(docs): add issue links to #1104 spec exclusions * fix(docs): address R1 review comments on #1104 spec * feat(security): update security model with budget attack surface (#1104) * fix(security): add surface #13 defense posture row, remove out-of-scope spec (#1104)

* feat: wire kernel streaming to TUI for token-by-token rendering (#930) (#1095) * feat: wire kernel streaming to TUI for token-by-token rendering (#930) Add StreamRenderer component that subscribes to EventBus streaming events and renders LLM output token-by-token as it arrives. The existing batch rendering path continues to work for providers that don't support streaming. Changes: - StreamRenderer struct with handle_started/handle_delta/handle_finished - run_with_streaming_display replaces the thinking spinner when streaming events arrive (spinner shows until first StreamDelta, then switches to token-by-token output) - EventBus created in build_loop_engine_with_config and wired through TuiApp constructor - handle_message returns Option<String> — None when streaming rendered inline, Some(rendered) for batch display - Multi-phase support (Reason → Synthesize) with single prefix per cycle - Interrupt handling: [interrupted] marker on UserStopped during streaming - format_loop_metadata_for_result extracts metadata line for streaming mode - 5 unit tests: prefix_once_per_cycle, multi_phase, flush_on_each_delta, interrupted_marker, resets_between_turns All existing tests pass (208 fx-cli, 369 fx-kernel). * fix: address review findings for PR #1095 (TUI streaming) - Guard stderr flush in stop_spinner_if_active (only when spinner active) - Extract print_interrupted_marker as free function (stateless, no self) - Rename stream_renderer_flush_on_each_delta to match actual assertion - Rename stream_interrupted_shows_marker to _does_not_panic - Log tracing::warn on broadcast Lagged with skipped event count - Document cycle_active spec divergence in wave2-b spec * refactor: introduce DisplayLoopState struct + add async display loop test - Extract mutable loop state (renderer, spinner_active, frame_index, streamed) into DisplayLoopState struct, reducing dispatch_streaming_event from 6 parameters to 3 (ENGINEERING.md §2 compliance). - Add display_loop_processes_streaming_events_end_to_end integration test exercising the async wiring: EventBus → dispatch_streaming_event → StreamRenderer state transitions → stop signal exit. * fix: wire RouterLoopLlmProvider to real complete_stream Root cause: RouterLoopLlmProvider did not override complete_stream(), so the kernel LlmProvider trait default kicked in — calling complete() then wrapping the full response in stream::once(). This produced a single chunk with the entire response instead of real streaming deltas. Fix: delegate complete_stream() to self.router.complete_stream(), matching how generate() already uses the router streaming path. Also fix CompletionTestProvider::complete_stream to return a proper stream chunk from the completion response (was returning empty stream). * fix: Anthropic streaming drops tool calls — content_block_stop not handled Root cause: The Anthropic streaming SSE parser never emitted arguments_done=true. It handled content_block_start (id/name) and content_block_delta (partial_json) but silently ignored content_block_stop. The kernel's finalize_stream_tool_calls filters out any tool call where arguments_done is false, so ALL streaming tool calls from Anthropic were silently dropped. Symptom: model calls list_directory but the tool call vanishes, decide() sees only text content → Decision::Respond with stale conversation context (context bleed between iterations). Fix: emit a ToolUseDelta with arguments_done=true on content_block_stop events. For text blocks this creates a harmless entry (no id/name → filtered by finalize). For tool_use blocks it correctly marks the accumulated arguments as complete. Regression test: content_block_stop_marks_tool_arguments_done * fix: reprint streamed output with markdown formatting on completion * fix: address streaming review findings (NB1, NB2, NTH1) * fix: add carriage return before ANSI clear in streaming finalize The ANSI escape \x1b[{n}A moves cursor up N lines but stays at the same column. Without returning to column 0 first, \x1b[J clears from mid-line, leaving the original 'assistant ›' prefix visible — resulting in a double prefix on the reprinted line. Fix: insert \r between cursor-up and clear-to-end. * fix: inject tool ID into Anthropic streaming argument deltas for multi-tool calls * fix: drop cursor-based markdown reprint, defer to incremental (#1097) The buffer+reprint approach (clear raw output via ANSI cursor-up, reprint with render_markdown) is fundamentally unreliable: terminal scrollback, varying widths, and shared prefix lines make line estimation fragile, causing double-prefix artifacts. Replace with a simple trailing newline. Raw streamed text is kept as-is — readable but unformatted. Full markdown formatting during streaming is deferred to incremental rendering (#1097). * debug: add SSE event tracing for multi-tool diagnosis (temporary) * fix: remove dead estimate_rendered_lines (reprint dropped) * Changes to be committed: new file: docs/specs/budget-soft-ceiling.md * fix: remove debug traces, fix clippy blockers, clean up scope - Remove temporary SSE debug eprintln! tracing (multi-tool diagnosis complete) - Extract LoopEngineBundle struct from 4-tuple return (clippy type-complexity) - Remove dead estimate_rendered_lines method and its test (reprint dropped) - Remove budget-soft-ceiling.md (moved to feat/loop-resilience branch, PR #1098) - All tests passing, clippy clean with -D warnings * feat: loop resilience — budget soft-ceiling, fan-out cap, tool result truncation (#1098) * docs: add loop resilience spec (budget soft-ceiling, fan-out cap, tool result truncation) Three independent fixes for production loop failures: 1. Budget soft-ceiling at 80% with BudgetState enum and wrap-up mode 2. Fan-out cap (default 4) to prevent context overflow from parallel tools 3. Tool result truncation (default 16KB) to bound per-result context cost Addresses Fawx self-diagnosed improvements from 2026-03-03 smoke testing. * docs(spec): address all review findings on loop-resilience spec Blocking fixes: - B1: Corrected all file paths to actual crate locations (fx-kernel/src/budget.rs, fx-kernel/src/loop_engine.rs). No loop-engine crate exists. - B2: Specified cost_cents as primary trigger, llm_calls as secondary. Wall time excluded. Rationale documented. - B3: Reduced BudgetState to Normal + Low only. Exhausted already handled by BudgetExceeded. Critical removed (YAGNI — no distinct behavior). - B4: Clarified wrap-up is a new LLM call via perceive() directive injection. 20% headroom at default 0.80 fraction is sufficient. Decision::Respond routing explained. Non-blocking fixes: - NB1: Wrap-up directive injected in perceive() — documented as cleanest injection point. - NB2: Monotonic scope clarified to single run_cycle() call. - NB3: soft_ceiling_fraction added to BudgetConfig (configurable, consistent with existing pattern). - NB4: Split test case 2 into unit-testable parts (directive presence, tool blocking) vs smoke test (coherent output). - NB5: Line estimates adjusted to ~400, risk assessment kept. Nice-to-haves: - NTH1: Using existing SignalKind::Performance instead of new variant. - NTH2: Added section documenting ConversationBudget vs BudgetTracker orthogonality. - NTH3: Per-resource thresholds noted as future design space in 'What This Does NOT Cover'. * feat: loop resilience — budget soft-ceiling, fan-out cap, tool result truncation Implements three independent resilience fixes for the loop engine: 1. Budget Soft-Ceiling (BudgetState enum): - BudgetState::Normal / BudgetState::Low based on cost_cents and llm_calls - Configurable soft_ceiling_fraction (default 0.80) - When Low: tools blocked, decomposition blocked, wrap-up directive injected - Performance signal emitted on Normal→Low transition 2. Fan-Out Cap: - max_fan_out (default 4) limits tool calls per LLM response - Excess calls deferred with descriptive message - Friction signal emitted when cap is hit 3. Tool Result Truncation: - max_tool_result_bytes (default 16384) caps individual results - UTF-8 char boundary safe truncation - Marker: [truncated — N bytes omitted, M total] All 16 test cases from the spec implemented plus additional edge cases. 392 tests passing, clippy clean. Closes #1098 * fix: address all review findings on PR #1098 NB1: Deferred tool messages were dead code — append_deferred_message pushed to continuation_messages but synthesize_tool_fallback builds from all_tool_results. Fix: inject deferred tools as synthetic ToolResult entries into all_tool_results immediately after fan-out cap, ensuring they appear in every return path (finalize_tool_response and synthesize_tool_fallback). NB2: Performance signal lacked exactly-once guard — emitted on every perceive() call while budget Low. Fix: add budget_low_signaled bool to LoopEngine, set on first emission, reset in prepare_cycle(). NB3: Deferred message used Message::assistant(). Fix: replaced with synthetic ToolResult entries (combined with NB1 fix). NB4: Fan-out cap not applied to continuation tool calls. Fix: apply apply_fan_out_cap() when state.current_calls is reassigned from response.tool_calls in the continuation loop. NB5: Eq trait removed from BudgetConfig/AllocationPlan/BudgetTracker by f64 field. Fix: replace soft_ceiling_fraction: f64 with soft_ceiling_percent: u8 (default 80), convert to f64 only in state(). Restores Eq derivation on all three types. NTH1: Added budget_state_monotonicity_through_thresholds test — records cost to 50% (Normal), 81% (Low), checks Low stays Low. NTH2: Added budget_state_boundary_at_exactly_80_percent test — verifies 80/100 = Normal (exceeds_fraction uses >) and 81/100 = Low. NTH3: Wrap-up directive changed from Message::user() to Message::system() in perceive() — it's an engine directive, not user input. NTH4: truncate_tool_result returns Cow<str> instead of String to avoid allocation when no truncation needed. Callers use .into_owned(). All tests: cargo test -p fx-kernel — 400 passed, 0 failed. Clippy: cargo clippy -p fx-kernel -- -D warnings — clean. * fix(kernel): inject deferred tool results for continuation rounds, avoid needless alloc in truncation NB1: Continuation-round fan-out now calls append_deferred_tool_results() for capped tool calls, matching the initial-round behavior. The model receives synthetic ToolResult entries for deferred calls so it knows which calls were not executed. NTH1: truncate_tool_results() now guards with a length check before calling truncate_tool_result(), avoiding the Cow::Borrowed -> into_owned() allocation when no truncation is needed. Updated continuation_tool_calls_capped_by_fan_out test to verify deferred results are present and correctly marked. * docs: add specs for decompose gate (#1100), per-tool retry (#1101), memory decay (#1103) * fix: remove temporary SSE/TRACK debug traces from anthropic streaming Multi-tool streaming confirmed working via live test (6 read_file calls, fan-out cap correctly limited to 4, model re-requested deferred 2). Debug eprintln! calls were interleaving with stdout making response text unreadable. Marked 'TEMPORARY DEBUG — remove after multi-tool diagnosis' in the code. * fix: address R4 review findings — NB1 done marker, NTH1-3 comments and cohesion NB1: content_block_stop now emits arguments_delta: None instead of Some("") — the done marker is a completion signal, not a delta. Updated is_done_marker predicate to check arguments_delta.is_none() and added regression test assertion. NB2: estimate_rendered_lines was already removed in d3556e37 — N/A. NTH1: Added inline comment noting future [...{n} events skipped...] marker for broadcast::Receiver lagged events. NTH2: Added comment on finalize() noting threshold-based skip as future enhancement for long response reprint flash. NTH3: Moved print_interrupted_marker() from free function to associated function on StreamRenderer for cohesion. B1-B3: Already resolved in prior commits (estimate_rendered_lines removed in d3556e37, LoopEngineBundle struct in ee4da96b). * fix(kernel): check budget soft-ceiling within tool round loop (#1105) (#1108) The budget soft-ceiling (BudgetState::Low at 80%) was only checked at entry to act_with_tools(), not within the inner tool round loop. When budget crossed 80% mid-loop (e.g., reading 49 files batched at 4/round), rounds continued until hard exhaustion, producing truncated output. Changes: - Record tool execution and LLM continuation costs incrementally in execute_tool_round() so budget.state() reflects actual consumption - Check budget.state() at the top of each loop iteration in act_with_tools(); break to synthesis fallback when Low - Check budget.state() before request_tool_continuation() in execute_tool_round(); skip LLM call when Low (BudgetLow variant) - Adjust execute_action_and_finalize() to avoid double-recording costs for tool actions (already recorded incrementally) - Extract compact_tool_continuation(), record_tool_execution_cost(), record_continuation_cost(), emit_budget_low_break_signal() helpers Regression test verifies: budget at 76%, 3 tool calls, after round 1 budget crosses 80%, loop breaks, falls through to synthesis. * feat(tui): incremental markdown rendering during streaming (#1097) (#1107) * feat(tui): incremental markdown rendering during streaming (#1097) Add MarkdownRenderer struct that maintains parse state across streaming deltas, supporting headers, bold, italic, inline code, code blocks, and lists with ANSI formatting. Wire into StreamRenderer::handle_delta() so each token gets markdown-aware rendering. On finalize(), flush pending state. Graceful fallback: unclosed markers pass through as raw text. Never crashes or swallows content. * fix(tui): address PR #1107 review findings (NB1, NB2, NTH1-3) NB1: Replace raw ANSI escape codes for bold/italic with crossterm Stylize trait for consistent formatting throughout. NB2: Use header level parameter - h1 gets underline+bright, h2 bold+medium, h3 bold+dim for visual differentiation. NTH1: Add + list prefix support alongside - and *. NTH2: Tighten code fence detection - only accept triple backtick followed by valid language identifier or nothing. NTH3: Add tests for multi-format lines, flush inside code block, nested formatting edge cases, plus prefix, fence rejection, header levels. * fix(llm): use per-block index tracking to prevent streaming tool argument concatenation (#1106) (#1109) * fix(llm): use per-block index tracking to prevent streaming tool argument concatenation (#1106) Replace single current_tool_use_id with HashMap<usize, String> keyed by Anthropic's content-block index. This prevents argument concatenation when multiple tool_use blocks stream concurrently — deltas for tool A arriving after tool B's content_block_start no longer get stamped with the wrong ID. Changes: - AnthropicSseState: tool_ids_by_index replaces tool_use_active + current_tool_use_id - parse_sse_data: accepts &mut tool_ids_by_index, dispatches to focused handler functions (handle_block_start/delta/stop/message_start/delta) - track_tool_use_state removed — index-based routing makes it obsolete - content_block_stop done markers now carry the correct tool ID - All call sites updated (stream_from_sse, parse_sse_payload) Tests: - interleaved_tool_deltas_route_to_correct_tool_id: tool A delta after tool B start still carries tool A's ID - four_concurrent_tool_blocks_all_arguments_clean: 4 blocks with out-of-order deltas all route correctly - Existing tests updated with index fields to match real API * fix: remove duplicate #[test] attribute * test(kernel): loop engine error path coverage (#1099) (#1112) * test(kernel): loop engine error path coverage (#1099) Add 11 tests covering 5 critical error paths in the loop engine: 1. Budget exhaustion mid-tool-call (2 tests) - budget_exhaustion_mid_tool_execution_returns_budget_exhausted - budget_exhaustion_preserves_partial_response 2. Decomposition depth >2 integration (3 tests) - decompose_depth_three_chain_respects_depth_cap - decompose_at_max_depth_returns_fallback - decompose_depth_cap_prevents_infinite_recursion_end_to_end 3. Tool friction → escalation (2 tests) - repeated_tool_failures_synthesize_without_infinite_retry - tool_friction_caps_at_max_iterations 4. Context overflow during tool round (2 tests) - context_overflow_during_tool_round_returns_error - context_overflow_mid_tool_round_is_recoverable 5. Cancellation during decomposition (2 tests) - cancellation_during_decomposition_returns_user_stopped - cancellation_during_slow_tool_in_decomposition_is_clean * fix(kernel): address PR #1112 review findings on error-path tests NB1: Replace catch-all Ok(_)/{} arms in context overflow tests with explicit variant matches (Complete, Error, BudgetExhausted) and content assertions — no silent acceptance of unexpected results. NB2: Fix budget_exhaustion_preserves_partial_response to allow 1 tool invocation before exhaustion, then assert partial_response.is_some() with non-empty content. NB3: Rename decompose_depth_three_chain_respects_depth_cap to decompose_at_depth_zero_with_cap_three_completes to match actual behavior (single-level decomposition at depth 0). NB4: Remove BudgetExhausted acceptance from cancellation test — with 20 LLM calls of budget, only UserStopped or Complete are valid. NTH1: Extract ~150 lines of shared test helpers (ScriptedLlm, StubToolExecutor, CancelAfterNthCallLlm, factory functions) into test_fixtures module to reduce duplication. NTH2: Tighten budget_exhaustion_mid_tool test to use max_llm_calls: 1 for deterministic budget exhaustion. NTH3: Remove unexercised scripted LLM responses from tool_friction_caps_at_max_iterations and other tests. * docs: spec for synthesis context guard (#1110) (#1111) * docs: spec for synthesis context guard (#1110) * feat(kernel): synthesis context guard — three-layer defense against context overflow (#1110) Layer 1: Round loop aggregate check — accumulated result bytes tracked in BudgetTracker and fed into BudgetState::Low, which already triggers the round loop break in act_with_tools(). Layer 2: Eviction before synthesis — evict_oldest_results() in synthesize_tool_fallback() replaces oldest results with stubs (preserving tool_call_id and tool_name) until under max_synthesis_tokens. Single oversized results are truncated via existing truncate_tool_result(). Layer 3: Budget state trigger — accumulated_result_bytes field on BudgetTracker, recorded per round in execute_tool_round(). Exceeding max_aggregate_result_bytes triggers BudgetState::Low. New BudgetConfig fields: - max_aggregate_result_bytes (default 400KB): Layer 3 threshold - max_synthesis_tokens (default 100K): Layer 2 eviction limit Tests: eviction reduces token count, oldest evicted first, stubs include tool_name, single oversized result truncated, no eviction when under limit, budget state trigger, reset clears accumulated bytes. * fix(fx-kernel): address review findings on synthesis context guard NB1: Clamp max_synthesis_tokens to floor of 1000 so zero config doesn't evict all results, leaving nothing for synthesis. NB2: Add comment explaining inverse token estimate undershoot is intentional (conservative eviction preferred over over-eviction). NB3: Add comment documenting that evicted stubs in ActionResult are intentional (synthesis-only path, results not used downstream). NTH1: Add tracing::debug! for total accumulated bytes when eviction is NOT triggered (aids 'why didn't it evict?' debugging). NTH2: Add future-proofing comment on max_aggregate_result_bytes noting it should derive from model context window. Includes regression test for zero-max-tokens floor clamping. All 412 fx-kernel tests pass, clippy clean. * fix: lower default max_synthesis_tokens from 100K to 50K 100K of tool results + prompt overhead exceeded 200K context window. 50K gives comfortable headroom for synthesis prompt + instructions. * feat(kernel): per-tool retry budget with blocked escalation (#1101) (#1117) * feat(memory): time-based memory decay and pruning (#1103) (#1116) * feat(memory): time-based memory decay and pruning (#1103) * fix(memory): address R1 review — remove duplicate decay_config, dead code, visibility (#1103) * fix(memory): stabilize snapshot_returns_all test — sort before comparing (#1103) * feat(kernel): per-tool retry budget with blocked escalation (#1101) * fix(kernel): address R1 review — clippy, test coverage, reorder perf (#1101) * chore(docs): move deprecated android/ios/h2/sprint docs to docs/deprecated/ (#1127) * docs: spec for #1104 — budget manipulation attack surface (#1120) * docs: implementation spec for #1102 — kernel/loadable boundary tests * docs: implementation spec for #1104 — budget attack surface analysis * fix(docs): add issue links to #1104 spec exclusions * fix(docs): address R1 review comments on #1104 spec * feat(security): update security model with budget attack surface (#1104) * fix(security): add surface #13 defense posture row, remove out-of-scope spec (#1104) * feat(kernel): decompose complexity gate — batch detection, floor, cost (#1100) (#1121) * feat(kernel): decompose complexity gate — batch detection, floor, cost (#1100) * fix(kernel): address R1 review — extract helper, remove clone, fix noop, boundary test (#1100) * fix(kernel): update test 6 for == 1 tool requirement in complexity floor (#1100) * fix(ci): remove obsolete H2 spec contract jobs, fix clippy field_reassign_with_default * feat: add GitHub skill for PR management (#1129) * feat(security): kernel/loadable boundary tests (#1102) Rebased onto current staging after #1121 (decompose gate) and #1120 (budget attack surface) merged. Clean cherry-pick of test additions from c10928f6 — no production code changes. Tests cover: - Kernel isolation: tool results cannot influence budget/depth decisions - Budget immutability: loadable layer cannot modify kernel budget state - Context integrity: tool outputs are truncated, not trusted as instructions - Recursion depth enforcement: tools cannot bypass depth limits - Registry immutability: concurrent registration, duplicate rejection - Skill trait boundaries: lifecycle isolation, error containment - Memory validation: key format enforcement, size limits, path traversal * feat(loadable): GitHubSkill with create_pr and comment_pr (#1128) * fix: remove unrelated #1102 boundary tests and unused kv_set - Restore engine crate files (fx-kernel, fx-loadable, fx-memory) to staging state — boundary test code belongs in PR #1119, not here - Remove unused kv_set function and its FFI import (YAGNI) - Branch now contains ONLY github-skill files Fixes R2 review findings B1 and NB1 on PR #1129 * fix(clippy): allow field_reassign_with_default in test code, fix unused vars * feat(security): kernel/loadable boundary tests (#1102) (#1119) Rebased onto current staging after #1121 (decompose gate) and #1120 (budget attack surface) merged. Clean cherry-pick of test additions from c10928f6 — no production code changes. Tests cover: - Kernel isolation: tool results cannot influence budget/depth decisions - Budget immutability: loadable layer cannot modify kernel budget state - Context integrity: tool outputs are truncated, not trusted as instructions - Recursion depth enforcement: tools cannot bypass depth limits - Registry immutability: concurrent registration, duplicate rejection - Skill trait boundaries: lifecycle isolation, error containment - Memory validation: key format enforcement, size limits, path traversal

Address all 13 items from the review comment: Blocking: - #1: Replace serde_json::Value metadata with typed ThoughtMetadata enum - #2: Edge classification uses explicit caller declaration (add_edge vs add_back_edge) with index-order sanity checks, not index arithmetic - #3: Define Generate partial failure as all-or-nothing per parent Non-blocking: - #4: ThoughtIdAllocator uses plain u64, not AtomicU64 - #5: Remove created_at ghost field (Instant not serializable) - #6: Replace raw usize with GraphNodeId wrapper throughout - #7: Fix line count estimate to ~1,900 (was ~1,100) - #8: Replace (usize, usize, bool) tuple with named EdgeSpec struct - #9: GoT + sub_goals combination returns error instead of silent ignore Nice-to-have: - #10: LLM score parsing uses regex extraction with fallback - #11: Each operation emits tracing::info_span! with node/op/cycle - #12: Single budget mechanism (session-level), removed max_total_tokens - #13: Document refine() sets last_node to final internal node Step 6 gains 4 new test cases (12-15) covering partial failure, parameter conflict rejection, refine wiring, and score parsing.

abbudjoe force-pushed the chore/release-sync-2026-04-03 branch from ab19708 to 0b458bb Compare April 3, 2026 03:02

abbudjoe closed this Apr 3, 2026

abbudjoe deleted the chore/release-sync-2026-04-03 branch April 3, 2026 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: sync public OSS from private main (2026-04-03)#13

chore: sync public OSS from private main (2026-04-03)#13
abbudjoe wants to merge 1 commit intomainfrom
chore/release-sync-2026-04-03

abbudjoe commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant