Skip to content

Conversation replay loop after high-volume tool operations (F025 sweep) #179

@tfatykhov

Description

@tfatykhov

Problem

After running the F025 retrieval sweep (350 API calls via run_python), Nous re-executes the entire sweep from scratch when asked simple follow-up questions like "push raw data to git." It replays completed steps, re-checks APIs, re-builds scripts, and re-runs queries instead of recognizing the work is done.

Observed Behavior

  1. Tim asks Nous to push results to git
  2. Instead of git add && git push, Nous starts: "Let me check the repo state and then run the full sweep"
  3. It re-verifies the API, re-examines SQL, rebuilds the sweep script
  4. Runs another 350 queries
  5. Reports results again

Root Cause Analysis

The conversation history contains hundreds of near-identical tool call/response pairs from the sweep. The model pattern-matches on this dominant context and continues the "sweep" behavior instead of responding to the new message.

Why compaction doesn't save us:

  • Compaction threshold: 60% of context window (~120K tokens for 200K window)
  • Tool pruning tiers: soft-trim at age 3, metadata-degrade at age 8, hard-clear at age 12
  • run_python profile: "standard" (3/8/12 ages)

The problem: 350 tool calls happen within a SINGLE run_python execution. From the pruning system's perspective, that's ONE tool result, not 350. The entire sweep output (1.8MB of JSON) sits in one tool result block. Tool pruning operates on individual tool_result messages, not on the size of individual results.

Even after soft-trimming (keeping first 1500 + last 1500 chars), the conversation still has:

  • The full sweep script in a run_python tool_use block
  • A trimmed but still-present result showing it was a sweep
  • All the assistant reasoning around it ("Let me run 350 queries...")

The assistant messages describing the sweep plan and methodology are never pruned (pruning only touches tool results). So the model sees "here's how to run a sweep" instructions in its own prior messages and follows them again.

Proposed Fixes

1. Repetitive operation detection (new)

Detect when the model is re-executing a pattern that already exists in conversation history:

  • Track tool call signatures (name + key args hash)
  • If the same signature was used >N times in recent history, inject a system hint: "This operation was already completed. Results are at [location]. Proceed with the user's current request."

2. Aggressive pruning for bulk operations

  • Add a bulk or sweep decay profile: (1, 2, 4) — aggressively clear repetitive operations
  • Auto-detect bulk patterns: same tool called >10 times with similar args
  • Summarize the entire sequence into one line: "[Ran 350 search queries across 7 weight ratios. Results saved to docs/F025-sweep-raw-results.json]"

3. Task completion markers

  • After large multi-tool operations, explicitly inject a completion marker into the conversation: "TASK COMPLETE: F025 sweep finished. 350 queries, results saved."
  • The model can use this as a boundary to avoid replaying

4. Assistant message pruning (careful)

  • Currently only tool results are pruned. Old assistant messages describing completed plans persist forever.
  • Consider summarizing old assistant planning messages alongside their tool results
  • Risk: losing important context. Needs careful design.

5. Compaction awareness of repetitive content

  • When should_compact triggers, the summarizer should detect repetitive patterns and compress them aggressively
  • "350 similar API calls" should become one sentence in the summary, not 350 entries

Immediate Workaround

/new to start a fresh session clears the issue.

Priority

P1 — this makes Nous unusable after any bulk operation until the session is reset.

— ⚡ Emerson

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions