Skip to content

fix(agents): detect and break tool-call reflex loops (#658)#664

Merged
penso merged 4 commits intomainfrom
evergreen-paper
Apr 12, 2026
Merged

fix(agents): detect and break tool-call reflex loops (#658)#664
penso merged 4 commits intomainfrom
evergreen-paper

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Apr 11, 2026

Summary

Fixes #658.

The runner previously dispatched tool calls with empty or malformed args straight to tool.execute without pre-validation, and had no detection for repeated identical failures. A model stuck in a reflex-retry loop (e.g. exec({}) on every iteration) burned through all 25 iterations before max_iterations fired, producing a ~4 minute dead zone with no visible progress.

Three defensive layers are added at the runner boundary — any one alone would have prevented the reported scenario; together they harden the runner against the whole class.

1. Pre-dispatch schema validation (Fix B from the issue)

New crates/agents/src/tool_arg_validator.rs checks each tool call's arguments against the tool's own parameters_schema() before execute runs. Missing required fields and top-level type mismatches short-circuit to a directive error message that names the failure, echoes the sent args, and explicitly tells the model not to retry with identical arguments:

Tool call rejected before execution by `exec`.
Missing required field(s): `command`.
You sent: {}
Do not retry with the same arguments. If you do not know what arguments to use,
respond in plain text and ask the user for clarification.

Deliberately narrow in scope — only catches the reflex-retry class (missing-required / wrong-type at the top level). Tools still own deeper semantic validation.

2. Loop detector with escalating intervention (Fix A + Fix C)

New crates/agents/src/tool_loop_detector.rs tracks a ring buffer of recent (tool_name, args_hash, error_hash) outcomes and fires when N consecutive failures share the same tool and (args OR error).

Two escalation stages:

  1. Stage 1 — Nudge: inject a directive intervention message into the conversation history listing the exact repeated calls, explicitly forbidding another tool call, and telling the model to respond in plain text.
  2. Stage 2 — Strip tools: if a fourth consecutive failure lands after the nudge, pass schemas_for_api = vec![] for a single turn so the model physically cannot emit another tool call. After that forced-text turn, normal schemas are restored.

Any successful tool call resets both the ring buffer and the escalation stage, so legitimate retry patterns (fail → retry with different args → succeed) do not trip the detector.

3. Event reorder + debug logging (Fix D)

  • RunnerEvent::ToolCallStart is now emitted only for calls that pass validation. Rejected calls emit a new ToolCallRejected event instead, so the UI stops showing 💻 Executing command... for calls that never executed.
  • The streaming tool-call accumulator now logs each finalized args string at debug! level so future variants of "default to {} because no deltas arrived" can be diagnosed from a single log file.

Config

Two new fields in [tools] (defaults are opt-out, per CLAUDE.md):

agent_loop_detector_window = 3                         # 0 = disable
agent_loop_detector_strip_tools_on_second_fire = true

New RunnerEvent variants

Both surfaced through crates/chat/src/lib.rs event forwarder so the UI and channels get appropriate signals:

  • ToolCallRejected { id, name, arguments, error } — reported as a tool_call_end with rejected: true
  • LoopInterventionFired { stage, tool_name } — reported as a notice with loopInterventionStage + stuckTool

Test plan

Automated

New tests (all passing):

  • tool_arg_validator::tests::* — 13 unit tests covering empty schema, missing required, null-as-missing, type mismatch, non-object args, array/object types, unknown types, LLM error message formatting
  • tool_loop_detector::tests::* — 11 unit tests covering window=0 disabled, 3-identical-fires-nudge, 4th-strips-tools, strip-disabled-stays-nudged, success resets state, same-error-different-args still fires, different tools do not fire, legitimate-retry does not fire, canonicalization stability, intervention message content
  • runner::tests::reflex_loop_fires_detector_and_terminates_non_streaming — end-to-end: reflex exec({}) → validation rejects → detector fires → intervention → model returns text → run terminates at iter ≤5
  • runner::tests::reflex_loop_fires_detector_and_terminates_streaming — same scenario on the streaming path (uses stream_with_tools + mid-stream tool_use with no argument deltas)
  • runner::tests::legitimate_retry_does_not_fire_loop_detector — regression: fail once with a real error, retry with different args, succeed. Detector must not fire.

Validation

Completed

  • cargo test -p moltis-agents — 352 passed, 0 failed
  • cargo test -p moltis-config — 185 passed
  • cargo test -p moltis-chat — 173 passed
  • cargo test --workspace --exclude moltis-providers --exclude moltis-gateway — all green
  • cargo +nightly-2025-11-30 fmt --all -- --check — clean
  • cargo +nightly-2025-11-30 clippy -p moltis-agents -p moltis-config -p moltis-chat --all-targets -- -D warnings — clean

Remaining

  • just lint — local env missing CUDA toolkit for llama-cpp-sys-2; CI will run the full matrix
  • just test — same
  • Swift/iOS build steps — Darwin-only gates, will be exercised by CI

Manual QA

  • Reproduce the original [Bug]: Runner dispatches empty-args tool calls, no loop detection on repeated identical failures (25-iter dead zone) #658 scenario with a real Claude Haiku session (ambiguous prompt that triggers empty exec args) and confirm the run terminates within ~4 iterations instead of hanging for 25 iterations
  • Confirm the UI activity log shows the new loop-detected notice and that rejected calls no longer display "Executing command..."
  • Verify a legitimate failure-then-retry flow (e.g. ls /nonexistentls /tmp) does not emit any LoopInterventionFired event
  • Confirm agent_loop_detector_window = 0 in moltis.toml disables detection end-to-end

Runner previously dispatched tool calls with empty or malformed args
straight to tool.execute without pre-validation, then had no detection
for repeated identical failures. A model stuck in a reflex-retry loop
(e.g. exec({}) on every iteration) would burn through all 25 iterations
before max_iterations fired, producing a ~4 minute dead zone with no
visible progress.

Three defensive layers added at the runner boundary:

1. Pre-dispatch schema validation (tool_arg_validator.rs): each tool
   call's arguments are checked against the tool's parameters_schema
   before execute() runs. Missing required fields or top-level type
   mismatches short-circuit to a structured, directive error that
   names the failure, echoes the args, and tells the model not to
   retry with identical arguments.

2. Loop detector with escalating intervention (tool_loop_detector.rs):
   tracks a ring buffer of recent (tool, args_hash, error_hash)
   outcomes. Three consecutive failures sharing the same tool and
   (args OR error) fire stage 1 (inject a strong directive
   intervention message). A fourth consecutive failure after the
   nudge fires stage 2 (strip tool schemas for one turn, forcing a
   text response). Any successful tool call resets the state.

3. Event reordering + raw-args debug logging: ToolCallStart is now
   emitted only after validation passes; rejected calls emit a new
   ToolCallRejected event so the UI stops showing "Executing..." for
   calls that never executed. The streaming tool-call accumulator
   logs each finalized args string at debug level to aid diagnosis
   of future variants.

New config fields in [tools]:
- agent_loop_detector_window (default 3, 0 = off)
- agent_loop_detector_strip_tools_on_second_fire (default true)

New RunnerEvent variants surfaced through the chat event forwarder:
- ToolCallRejected: reported as a tool_call_end with rejected=true
- LoopInterventionFired: reported as a notice with stage + stuck tool

Integration tests cover the reflex-loop scenario end-to-end in both
the non-streaming and streaming paths, plus a legitimate one-shot
retry regression test to ensure normal failure/recovery patterns do
not trip the detector.

Entire-Checkpoint: c441f764a037
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 11, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing evergreen-paper (158085a) with main (c3da499)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 87.55328% with 146 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/agents/src/runner.rs 83.62% 85 Missing ⚠️
crates/chat/src/lib.rs 5.26% 36 Missing ⚠️
crates/agents/src/tool_arg_validator.rs 94.96% 13 Missing ⚠️
crates/agents/src/tool_loop_detector.rs 96.62% 12 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR adds three layered defenses against tool-call reflex loops in the agent runner: pre-dispatch schema validation (tool_arg_validator.rs), a rolling loop detector with two escalation stages (tool_loop_detector.rs), and event reordering so the UI never shows "Executing…" for calls that never ran. The implementation is well-tested with 24+ new unit tests and three end-to-end integration tests covering the non-streaming path, the streaming path, and the legitimate-retry regression.

All three issues flagged in the previous Greptile round have been resolved: clear_strip_tools now flushes the deque (not just transitions the stage), integer type-checking accepts 30.0 as a valid integer-valued float, and format_strip_tools_message now returns String for API consistency with format_intervention_message.

Confidence Score: 5/5

Safe to merge — all three previously-flagged issues are resolved and no new P0/P1 defects found.

All prior Greptile concerns addressed: clear_strip_tools flushes deque, integer check accepts 30.0, format_strip_tools_message returns String. 24+ unit tests plus 3 E2E integration tests cover all edge cases. No remaining P0/P1 findings.

No files require special attention.

Important Files Changed

Filename Overview
crates/agents/src/tool_loop_detector.rs New rolling loop detector with two escalation stages (nudge / strip-tools). consume_pending_action correctly handles trailing-success suppression and stage-skip edge cases. clear_strip_tools now flushes the deque to prevent oscillation.
crates/agents/src/tool_arg_validator.rs New lightweight schema validator. Correctly handles required fields, null-as-missing, type mismatches, and integer-valued floats. 13 unit tests cover all branches.
crates/agents/src/runner.rs Integration of schema validation and loop detector into both streaming and non-streaming loops. apply_loop_detector_intervention correctly derives post-batch action via consume_pending_action.
crates/chat/src/lib.rs New RunnerEvent variants forwarded to WebSocket clients correctly. ToolCallRejected emits terminal tool_call_end with rejected: true.
crates/config/src/schema.rs Two new ToolsConfig fields with sensible defaults and serde helpers, Default impl updated consistently.

Sequence Diagram

sequenceDiagram
    participant R as Runner Loop
    participant V as tool_arg_validator
    participant LD as ToolLoopDetector
    participant T as Tool.execute
    participant E as on_event callback

    R->>V: validate_tool_args(schema, args)
    alt Validation fails
        V-->>R: Err(ToolArgError)
        R->>E: ToolCallRejected { rejected: true }
        R->>LD: record(failure fingerprint)
    else Validation passes
        V-->>R: Ok(())
        R->>E: ToolCallStart
        R->>T: execute(args)
        T-->>R: (success/failure, result)
        R->>E: ToolCallEnd
        R->>LD: record(success/failure fingerprint)
    end

    R->>LD: consume_pending_action()
    alt No intervention
        LD-->>R: None
    else Stage 1 - Nudge
        LD-->>R: InjectNudge
        R->>E: LoopInterventionFired { stage: 1 }
        R->>R: messages.push(directive user message)
    else Stage 2 - Strip tools
        LD-->>R: StripTools
        R->>E: LoopInterventionFired { stage: 2 }
        R->>R: strip_tools_next_iter = true
        Note over R: Next iter: schemas_for_api = vec![]
        R->>LD: clear_strip_tools() [flushes deque + resets]
    end
Loading

Reviews (5): Last reviewed commit: "fix(agents): treat success=false without..." | Re-trigger Greptile

Comment thread crates/agents/src/tool_loop_detector.rs
Comment thread crates/agents/src/tool_arg_validator.rs Outdated
Comment thread crates/agents/src/tool_loop_detector.rs Outdated
penso added 2 commits April 11, 2026 20:37
Three P2 findings from code review:

1. Loop detector oscillation — `clear_strip_tools()` used to leave the
   `recent` deque full of `window` matching failures while only
   transitioning the stage from StripTools → Nudged. A single identical
   failure after tools were restored would immediately re-fire stage 2
   with `stage: Nudged` + `strip_on_second_fire: true`, creating a
   strip → text → restore → single-fail → strip oscillation that burned
   through iterations with almost no runway for the model. Treat the
   forced-text turn as a full reset: clear both the stage AND the deque.
   Added a dedicated regression test (`post_strip_single_failure_does_not_immediately_refire`).

2. Integer type check rejected valid integer-valued floats. Some LLMs
   serialize integers with a trailing decimal (e.g. `"timeout": 30.0`)
   and `serde_json` stores those as f64-backed Numbers whose `as_i64()`
   / `as_u64()` return `None`. The validator now accepts any float whose
   fractional part is zero, so `30.0` passes while `30.5` is still
   rejected. Covered by `integer_accepts_integer_valued_floats`.

3. API asymmetry — `format_strip_tools_message` returned `&'static str`
   while `format_intervention_message` returned `String`. Both are
   consumed identically at the call site via `Into<String>`. Changed
   the former to return `String` for uniformity.

Entire-Checkpoint: 73323a6c31e5
)

Two P2 edge cases surfaced by the second Greptile review (PR #664):

1. **False intervention after trailing success in the same batch.** When a
   batch was `[fail, fail, success]` and the detector was one failure away
   from the window, the fail that pushed the window full would set a
   pending nudge, the trailing success would reset the detector, and the
   runner would still inject the stale intervention after the batch.

2. **Stage-skip when both escalations fire within one batch.** A parallel
   batch like `[fail, fail, fail, fail]` would fire `InjectNudge` on the
   third call and `StripTools` on the fourth. The old per-call accumulator
   shadowed the nudge with the strip action, robbing the model of its
   chance to recover via plain text before tools were stripped.

The fix changes both result-processing loops to derive the intervention
action from the detector's *post-batch state* rather than from per-call
`record()` return values:

- Add `consume_pending_action()` on the detector. It looks at the current
  `stage` plus a new internal `nudge_delivered` flag and returns the
  correct single action, one-shotting stage transitions so the same
  intervention can't be applied twice. When it would advance to
  `StripTools` without having delivered a nudge first, it demotes back to
  `Nudged` and returns `InjectNudge` so the nudge lands first; strip-tools
  can fire on the next batch if the pattern persists.

- Extract the intervention-application logic into a single helper,
  `apply_loop_detector_intervention`, shared by the streaming and
  non-streaming runner loops. This eliminates duplicated match arms and
  ensures both paths stay in sync.

Regression coverage:

- `tool_loop_detector::tests` — 5 new unit tests covering
  `consume_pending_action` behaviour for all stage/delivered combinations,
  trailing-success suppression, and the stage-skip guard.
- `runner::tests::mixed_batch_with_trailing_success_does_not_fire_intervention`
  — end-to-end, a parallel batch `[exec({}), exec({}), exec("true")]`
  must not emit any `LoopInterventionFired` event because the trailing
  success recovers cleanly.
- `runner::tests::parallel_batch_with_stage_skip_delivers_nudge_first` —
  end-to-end, a parallel batch of four identical `exec({})` calls must
  emit a stage-1 `LoopInterventionFired` first, not jump straight to
  stage 2.

Entire-Checkpoint: 37184e642bc0
@penso
Copy link
Copy Markdown
Collaborator Author

penso commented Apr 11, 2026

@greptile-apps review

Tools that return `{success: false}` without an `error` key were treated
as successes by the loop detector because `is_failure` derived from
`error_hash.is_some()`. This means e.g. a `BrowserResponse` with
`success: false` and no error text would silently reset the detector
instead of contributing to the reflex-loop window.

Split `ToolCallFingerprint` into explicit `success()` / `failure()`
constructors and store a `failed: bool` field. The runner now passes
`!success` from the dispatch result, so logical failures without an
error string are correctly counted.

New test: `failure_without_error_string_still_counts_as_failure`.
Entire-Checkpoint: 7aff7471302c
@penso penso merged commit 8828cc5 into main Apr 12, 2026
40 checks passed
@penso penso deleted the evergreen-paper branch April 12, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Runner dispatches empty-args tool calls, no loop detection on repeated identical failures (25-iter dead zone)

2 participants