fix(agents): detect and break tool-call reflex loops (#658)#664
Conversation
Runner previously dispatched tool calls with empty or malformed args
straight to tool.execute without pre-validation, then had no detection
for repeated identical failures. A model stuck in a reflex-retry loop
(e.g. exec({}) on every iteration) would burn through all 25 iterations
before max_iterations fired, producing a ~4 minute dead zone with no
visible progress.
Three defensive layers added at the runner boundary:
1. Pre-dispatch schema validation (tool_arg_validator.rs): each tool
call's arguments are checked against the tool's parameters_schema
before execute() runs. Missing required fields or top-level type
mismatches short-circuit to a structured, directive error that
names the failure, echoes the args, and tells the model not to
retry with identical arguments.
2. Loop detector with escalating intervention (tool_loop_detector.rs):
tracks a ring buffer of recent (tool, args_hash, error_hash)
outcomes. Three consecutive failures sharing the same tool and
(args OR error) fire stage 1 (inject a strong directive
intervention message). A fourth consecutive failure after the
nudge fires stage 2 (strip tool schemas for one turn, forcing a
text response). Any successful tool call resets the state.
3. Event reordering + raw-args debug logging: ToolCallStart is now
emitted only after validation passes; rejected calls emit a new
ToolCallRejected event so the UI stops showing "Executing..." for
calls that never executed. The streaming tool-call accumulator
logs each finalized args string at debug level to aid diagnosis
of future variants.
New config fields in [tools]:
- agent_loop_detector_window (default 3, 0 = off)
- agent_loop_detector_strip_tools_on_second_fire (default true)
New RunnerEvent variants surfaced through the chat event forwarder:
- ToolCallRejected: reported as a tool_call_end with rejected=true
- LoopInterventionFired: reported as a notice with stage + stuck tool
Integration tests cover the reflex-loop scenario end-to-end in both
the non-streaming and streaming paths, plus a legitimate one-shot
retry regression test to ensure normal failure/recovery patterns do
not trip the detector.
Entire-Checkpoint: c441f764a037
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds three layered defenses against tool-call reflex loops in the agent runner: pre-dispatch schema validation ( All three issues flagged in the previous Greptile round have been resolved: Confidence Score: 5/5Safe to merge — all three previously-flagged issues are resolved and no new P0/P1 defects found. All prior Greptile concerns addressed: clear_strip_tools flushes deque, integer check accepts 30.0, format_strip_tools_message returns String. 24+ unit tests plus 3 E2E integration tests cover all edge cases. No remaining P0/P1 findings. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant R as Runner Loop
participant V as tool_arg_validator
participant LD as ToolLoopDetector
participant T as Tool.execute
participant E as on_event callback
R->>V: validate_tool_args(schema, args)
alt Validation fails
V-->>R: Err(ToolArgError)
R->>E: ToolCallRejected { rejected: true }
R->>LD: record(failure fingerprint)
else Validation passes
V-->>R: Ok(())
R->>E: ToolCallStart
R->>T: execute(args)
T-->>R: (success/failure, result)
R->>E: ToolCallEnd
R->>LD: record(success/failure fingerprint)
end
R->>LD: consume_pending_action()
alt No intervention
LD-->>R: None
else Stage 1 - Nudge
LD-->>R: InjectNudge
R->>E: LoopInterventionFired { stage: 1 }
R->>R: messages.push(directive user message)
else Stage 2 - Strip tools
LD-->>R: StripTools
R->>E: LoopInterventionFired { stage: 2 }
R->>R: strip_tools_next_iter = true
Note over R: Next iter: schemas_for_api = vec![]
R->>LD: clear_strip_tools() [flushes deque + resets]
end
Reviews (5): Last reviewed commit: "fix(agents): treat success=false without..." | Re-trigger Greptile |
Three P2 findings from code review: 1. Loop detector oscillation — `clear_strip_tools()` used to leave the `recent` deque full of `window` matching failures while only transitioning the stage from StripTools → Nudged. A single identical failure after tools were restored would immediately re-fire stage 2 with `stage: Nudged` + `strip_on_second_fire: true`, creating a strip → text → restore → single-fail → strip oscillation that burned through iterations with almost no runway for the model. Treat the forced-text turn as a full reset: clear both the stage AND the deque. Added a dedicated regression test (`post_strip_single_failure_does_not_immediately_refire`). 2. Integer type check rejected valid integer-valued floats. Some LLMs serialize integers with a trailing decimal (e.g. `"timeout": 30.0`) and `serde_json` stores those as f64-backed Numbers whose `as_i64()` / `as_u64()` return `None`. The validator now accepts any float whose fractional part is zero, so `30.0` passes while `30.5` is still rejected. Covered by `integer_accepts_integer_valued_floats`. 3. API asymmetry — `format_strip_tools_message` returned `&'static str` while `format_intervention_message` returned `String`. Both are consumed identically at the call site via `Into<String>`. Changed the former to return `String` for uniformity. Entire-Checkpoint: 73323a6c31e5
) Two P2 edge cases surfaced by the second Greptile review (PR #664): 1. **False intervention after trailing success in the same batch.** When a batch was `[fail, fail, success]` and the detector was one failure away from the window, the fail that pushed the window full would set a pending nudge, the trailing success would reset the detector, and the runner would still inject the stale intervention after the batch. 2. **Stage-skip when both escalations fire within one batch.** A parallel batch like `[fail, fail, fail, fail]` would fire `InjectNudge` on the third call and `StripTools` on the fourth. The old per-call accumulator shadowed the nudge with the strip action, robbing the model of its chance to recover via plain text before tools were stripped. The fix changes both result-processing loops to derive the intervention action from the detector's *post-batch state* rather than from per-call `record()` return values: - Add `consume_pending_action()` on the detector. It looks at the current `stage` plus a new internal `nudge_delivered` flag and returns the correct single action, one-shotting stage transitions so the same intervention can't be applied twice. When it would advance to `StripTools` without having delivered a nudge first, it demotes back to `Nudged` and returns `InjectNudge` so the nudge lands first; strip-tools can fire on the next batch if the pattern persists. - Extract the intervention-application logic into a single helper, `apply_loop_detector_intervention`, shared by the streaming and non-streaming runner loops. This eliminates duplicated match arms and ensures both paths stay in sync. Regression coverage: - `tool_loop_detector::tests` — 5 new unit tests covering `consume_pending_action` behaviour for all stage/delivered combinations, trailing-success suppression, and the stage-skip guard. - `runner::tests::mixed_batch_with_trailing_success_does_not_fire_intervention` — end-to-end, a parallel batch `[exec({}), exec({}), exec("true")]` must not emit any `LoopInterventionFired` event because the trailing success recovers cleanly. - `runner::tests::parallel_batch_with_stage_skip_delivers_nudge_first` — end-to-end, a parallel batch of four identical `exec({})` calls must emit a stage-1 `LoopInterventionFired` first, not jump straight to stage 2. Entire-Checkpoint: 37184e642bc0
|
@greptile-apps review |
Tools that return `{success: false}` without an `error` key were treated
as successes by the loop detector because `is_failure` derived from
`error_hash.is_some()`. This means e.g. a `BrowserResponse` with
`success: false` and no error text would silently reset the detector
instead of contributing to the reflex-loop window.
Split `ToolCallFingerprint` into explicit `success()` / `failure()`
constructors and store a `failed: bool` field. The runner now passes
`!success` from the dispatch result, so logical failures without an
error string are correctly counted.
New test: `failure_without_error_string_still_counts_as_failure`.
Entire-Checkpoint: 7aff7471302c
Summary
Fixes #658.
The runner previously dispatched tool calls with empty or malformed args straight to
tool.executewithout pre-validation, and had no detection for repeated identical failures. A model stuck in a reflex-retry loop (e.g.exec({})on every iteration) burned through all 25 iterations beforemax_iterationsfired, producing a ~4 minute dead zone with no visible progress.Three defensive layers are added at the runner boundary — any one alone would have prevented the reported scenario; together they harden the runner against the whole class.
1. Pre-dispatch schema validation (Fix B from the issue)
New
crates/agents/src/tool_arg_validator.rschecks each tool call's arguments against the tool's ownparameters_schema()beforeexecuteruns. Missing required fields and top-level type mismatches short-circuit to a directive error message that names the failure, echoes the sent args, and explicitly tells the model not to retry with identical arguments:Deliberately narrow in scope — only catches the reflex-retry class (missing-required / wrong-type at the top level). Tools still own deeper semantic validation.
2. Loop detector with escalating intervention (Fix A + Fix C)
New
crates/agents/src/tool_loop_detector.rstracks a ring buffer of recent(tool_name, args_hash, error_hash)outcomes and fires when N consecutive failures share the same tool and (args OR error).Two escalation stages:
schemas_for_api = vec![]for a single turn so the model physically cannot emit another tool call. After that forced-text turn, normal schemas are restored.Any successful tool call resets both the ring buffer and the escalation stage, so legitimate retry patterns (fail → retry with different args → succeed) do not trip the detector.
3. Event reorder + debug logging (Fix D)
RunnerEvent::ToolCallStartis now emitted only for calls that pass validation. Rejected calls emit a newToolCallRejectedevent instead, so the UI stops showing💻 Executing command...for calls that never executed.debug!level so future variants of "default to{}because no deltas arrived" can be diagnosed from a single log file.Config
Two new fields in
[tools](defaults are opt-out, per CLAUDE.md):New RunnerEvent variants
Both surfaced through
crates/chat/src/lib.rsevent forwarder so the UI and channels get appropriate signals:ToolCallRejected { id, name, arguments, error }— reported as atool_call_endwithrejected: trueLoopInterventionFired { stage, tool_name }— reported as anoticewithloopInterventionStage+stuckToolTest plan
Automated
New tests (all passing):
tool_arg_validator::tests::*— 13 unit tests covering empty schema, missing required, null-as-missing, type mismatch, non-object args, array/object types, unknown types, LLM error message formattingtool_loop_detector::tests::*— 11 unit tests covering window=0 disabled, 3-identical-fires-nudge, 4th-strips-tools, strip-disabled-stays-nudged, success resets state, same-error-different-args still fires, different tools do not fire, legitimate-retry does not fire, canonicalization stability, intervention message contentrunner::tests::reflex_loop_fires_detector_and_terminates_non_streaming— end-to-end: reflexexec({})→ validation rejects → detector fires → intervention → model returns text → run terminates at iter ≤5runner::tests::reflex_loop_fires_detector_and_terminates_streaming— same scenario on the streaming path (usesstream_with_tools+ mid-stream tool_use with no argument deltas)runner::tests::legitimate_retry_does_not_fire_loop_detector— regression: fail once with a real error, retry with different args, succeed. Detector must not fire.Validation
Completed
cargo test -p moltis-agents— 352 passed, 0 failedcargo test -p moltis-config— 185 passedcargo test -p moltis-chat— 173 passedcargo test --workspace --exclude moltis-providers --exclude moltis-gateway— all greencargo +nightly-2025-11-30 fmt --all -- --check— cleancargo +nightly-2025-11-30 clippy -p moltis-agents -p moltis-config -p moltis-chat --all-targets -- -D warnings— cleanRemaining
just lint— local env missing CUDA toolkit forllama-cpp-sys-2; CI will run the full matrixjust test— sameManual QA
execargs) and confirm the run terminates within ~4 iterations instead of hanging for 25 iterationsloop-detectednotice and that rejected calls no longer display "Executing command..."ls /nonexistent→ls /tmp) does not emit anyLoopInterventionFiredeventagent_loop_detector_window = 0inmoltis.tomldisables detection end-to-end