Skip to content

hotfix: retry on path/read_path/domain LLM hallucinations#35

Merged
AVADSA25 merged 1 commit intomainfrom
fix/retry-all-permission-types
May 3, 2026
Merged

hotfix: retry on path/read_path/domain LLM hallucinations#35
AVADSA25 merged 1 commit intomainfrom
fix/retry-all-permission-types

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented May 3, 2026

Summary

PR #34 only retried skill_not_authorized violations. Real-world Qwen drift hits the other three PermissionViolation reasons too — confirmed by today's anchor-example run where the plan listed api.exchangerate-api.com but the model emitted bare exchangerate-api.com and the agent went blocked_on_permission.

This extends the same single-shot correction-nudge pattern to all four reasons.

What changed

codec_agent_runner.py

  • New helper _build_correction_nudge(pv, action, agent_grants, global_grants) that emits a closed-world allowlist string per reason:
    • skill_not_authorized → list allowed skills
    • path_not_authorized → list allowed write_paths globs
    • read_path_not_authorized → list allowed read_paths globs
    • domain_not_authorized → list allowed network_domains
  • _execute_checkpoint retry block dispatches on pv.reason via the helper instead of the if/else that only matched skills.
  • SECOND consecutive miss still raises → blocked_on_permission (unchanged user-visible escape hatch).

tests/test_agent_runner.py — 3 new tests next to the existing skill retry test:

  • test_domain_hallucination_retries_with_corrected_domain_list — the forex scenario
  • test_write_path_hallucination_retries_with_corrected_path_list
  • test_read_path_hallucination_retries_with_corrected_path_list

Test plan

🤖 Generated with Claude Code

PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits
`domain_not_authorized` and path violations just as often (e.g. plan
allows api.exchangerate-api.com, model emits bare exchangerate-api.com).

Refactor the retry block in _execute_checkpoint to dispatch on all
four PermissionViolation reasons via _build_correction_nudge():

  - skill_not_authorized       -> list allowed skills
  - path_not_authorized        -> list allowed write_paths globs
  - read_path_not_authorized   -> list allowed read_paths globs
  - domain_not_authorized      -> list allowed network_domains

Each nudge appended to history with _skill_correction_nudge marker so
_qwen_next_action sees the corrected closed-world allowlist on the
retry. SECOND consecutive miss still raises -> blocked_on_permission.

Tests (4 total in this slice, 46 total in test_agent_runner.py):
- existing skill retry test still green
- domain retry test (forex anchor scenario)
- write_path retry test
- read_path retry test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 9369bad into main May 3, 2026
1 check passed
AVADSA25 added a commit that referenced this pull request May 4, 2026
User repro 2026-05-04 09:58:
  Project: "Read all markdown files in ~/codec-repo/docs/ and create
           an index.md that lists each file with its first heading and
           a one-line description"
  Result:  Plan failed: plan invalid: plan references unknown skills:
           ['file_read']

Same hallucination CLASS as PR #35 but at a different LAYER.
PR #35 fixed retries during execution (codec_agent_runner). This is
failing earlier — at plan validation, before the plan is even saved.
The user never even got to the approve-or-reject step.

Root cause:
Qwen drafts plans naming skills that don't exist. `file_read` and
`fetch_url` are the two we've seen. The actual file-reading skill is
`file_ops` (which reads, writes, appends, lists). The actual URL fetch
is `web_fetch`. The user-visible result was the same as PR #35 —
project mode dies before doing anything useful.

Fix (mirrors PR #35's pattern at planning layer):
1. After validate_plan_skills returns ok=False, instead of raising,
   build a corrective prompt listing the missing skills, the FULL
   allowed registry, and the three most common confusions
   (file_read→file_ops, fetch_url→web_fetch, read_file→file_ops).
2. Re-call _qwen_chat ONCE with the appended correction.
3. Re-validate the second draft. If valid, use it. If not, raise
   with BOTH attempts in the message so the user sees the model is
   consistently confused (vs a one-off transient miss).
4. If the retry call itself fails (Qwen flakes between attempts),
   raise with the ORIGINAL validation error — more diagnostic than
   "qwen flaked on retry".

Also:
- Strengthen _PLAN_SYSTEM_PROMPT with the same three confusion hints
  so the FIRST draft is more likely to succeed (cuts the retry rate).

Tests (3 new in tests/test_agent_plan.py — all pass):
- test_draft_plan_retries_on_hallucinated_skill_then_succeeds
  Reproduces the exact user case: file_read on attempt 1, file_ops
  on attempt 2, plan succeeds.
- test_draft_plan_retry_also_fails_raises_with_both_attempts
  Both attempts hallucinate (file_read, then read_file): error
  message contains both for diagnostic value.
- test_draft_plan_retry_qwen_unavailable_surfaces_original_error
  Retry call raises ConnectionError: original validation error
  surfaces with "retry failed" appended.

All 3 existing draft_plan tests still pass — backward-compat preserved.
The existing test_draft_plan_rejects_unknown_skill now exercises BOTH
attempts (fake_qwen_chat returns same bad plan each time) and still
raises with the missing skill in the message.

Total: 35/35 file pass + 7 pre-existing pynput env failures (unchanged).

Co-authored-by: Mickael Farina <farina.mickael@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants