-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation replay loop after high-volume tool operations (F025 sweep) #179
Description
Problem
After running the F025 retrieval sweep (350 API calls via run_python), Nous re-executes the entire sweep from scratch when asked simple follow-up questions like "push raw data to git." It replays completed steps, re-checks APIs, re-builds scripts, and re-runs queries instead of recognizing the work is done.
Observed Behavior
- Tim asks Nous to push results to git
- Instead of
git add && git push, Nous starts: "Let me check the repo state and then run the full sweep" - It re-verifies the API, re-examines SQL, rebuilds the sweep script
- Runs another 350 queries
- Reports results again
Root Cause Analysis
The conversation history contains hundreds of near-identical tool call/response pairs from the sweep. The model pattern-matches on this dominant context and continues the "sweep" behavior instead of responding to the new message.
Why compaction doesn't save us:
- Compaction threshold: 60% of context window (~120K tokens for 200K window)
- Tool pruning tiers: soft-trim at age 3, metadata-degrade at age 8, hard-clear at age 12
run_pythonprofile: "standard" (3/8/12 ages)
The problem: 350 tool calls happen within a SINGLE run_python execution. From the pruning system's perspective, that's ONE tool result, not 350. The entire sweep output (1.8MB of JSON) sits in one tool result block. Tool pruning operates on individual tool_result messages, not on the size of individual results.
Even after soft-trimming (keeping first 1500 + last 1500 chars), the conversation still has:
- The full sweep script in a
run_pythontool_use block - A trimmed but still-present result showing it was a sweep
- All the assistant reasoning around it ("Let me run 350 queries...")
The assistant messages describing the sweep plan and methodology are never pruned (pruning only touches tool results). So the model sees "here's how to run a sweep" instructions in its own prior messages and follows them again.
Proposed Fixes
1. Repetitive operation detection (new)
Detect when the model is re-executing a pattern that already exists in conversation history:
- Track tool call signatures (name + key args hash)
- If the same signature was used >N times in recent history, inject a system hint: "This operation was already completed. Results are at [location]. Proceed with the user's current request."
2. Aggressive pruning for bulk operations
- Add a
bulkorsweepdecay profile: (1, 2, 4) — aggressively clear repetitive operations - Auto-detect bulk patterns: same tool called >10 times with similar args
- Summarize the entire sequence into one line: "[Ran 350 search queries across 7 weight ratios. Results saved to docs/F025-sweep-raw-results.json]"
3. Task completion markers
- After large multi-tool operations, explicitly inject a completion marker into the conversation: "TASK COMPLETE: F025 sweep finished. 350 queries, results saved."
- The model can use this as a boundary to avoid replaying
4. Assistant message pruning (careful)
- Currently only tool results are pruned. Old assistant messages describing completed plans persist forever.
- Consider summarizing old assistant planning messages alongside their tool results
- Risk: losing important context. Needs careful design.
5. Compaction awareness of repetitive content
- When
should_compacttriggers, the summarizer should detect repetitive patterns and compress them aggressively - "350 similar API calls" should become one sentence in the summary, not 350 entries
Immediate Workaround
/new to start a fresh session clears the issue.
Priority
P1 — this makes Nous unusable after any bulk operation until the session is reset.
— ⚡ Emerson