hotfix: retry on path/read_path/domain LLM hallucinations by AVADSA25 · Pull Request #35 · AVADSA25/codec

AVADSA25 · 2026-05-03T20:49:56Z

Summary

PR #34 only retried skill_not_authorized violations. Real-world Qwen drift hits the other three PermissionViolation reasons too — confirmed by today's anchor-example run where the plan listed api.exchangerate-api.com but the model emitted bare exchangerate-api.com and the agent went blocked_on_permission.

This extends the same single-shot correction-nudge pattern to all four reasons.

What changed

codec_agent_runner.py

New helper _build_correction_nudge(pv, action, agent_grants, global_grants) that emits a closed-world allowlist string per reason:
- skill_not_authorized → list allowed skills
- path_not_authorized → list allowed write_paths globs
- read_path_not_authorized → list allowed read_paths globs
- domain_not_authorized → list allowed network_domains
_execute_checkpoint retry block dispatches on pv.reason via the helper instead of the if/else that only matched skills.
SECOND consecutive miss still raises → blocked_on_permission (unchanged user-visible escape hatch).

tests/test_agent_runner.py — 3 new tests next to the existing skill retry test:

test_domain_hallucination_retries_with_corrected_domain_list — the forex scenario
test_write_path_hallucination_retries_with_corrected_path_list
test_read_path_hallucination_retries_with_corrected_path_list

Test plan

All 46 tests in tests/test_agent_runner.py pass locally (43 prior + 3 new + 1 skill retry from hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry #34)
python -c "import codec_agent_runner" clean
Re-run forex anchor example after merge — expected: agent fetches rates, writes to project folder, no blocked_on_permission

🤖 Generated with Claude Code

PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits `domain_not_authorized` and path violations just as often (e.g. plan allows api.exchangerate-api.com, model emits bare exchangerate-api.com). Refactor the retry block in _execute_checkpoint to dispatch on all four PermissionViolation reasons via _build_correction_nudge(): - skill_not_authorized -> list allowed skills - path_not_authorized -> list allowed write_paths globs - read_path_not_authorized -> list allowed read_paths globs - domain_not_authorized -> list allowed network_domains Each nudge appended to history with _skill_correction_nudge marker so _qwen_next_action sees the corrected closed-world allowlist on the retry. SECOND consecutive miss still raises -> blocked_on_permission. Tests (4 total in this slice, 46 total in test_agent_runner.py): - existing skill retry test still green - domain retry test (forex anchor scenario) - write_path retry test - read_path retry test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User repro 2026-05-04 09:58: Project: "Read all markdown files in ~/codec-repo/docs/ and create an index.md that lists each file with its first heading and a one-line description" Result: Plan failed: plan invalid: plan references unknown skills: ['file_read'] Same hallucination CLASS as PR #35 but at a different LAYER. PR #35 fixed retries during execution (codec_agent_runner). This is failing earlier — at plan validation, before the plan is even saved. The user never even got to the approve-or-reject step. Root cause: Qwen drafts plans naming skills that don't exist. `file_read` and `fetch_url` are the two we've seen. The actual file-reading skill is `file_ops` (which reads, writes, appends, lists). The actual URL fetch is `web_fetch`. The user-visible result was the same as PR #35 — project mode dies before doing anything useful. Fix (mirrors PR #35's pattern at planning layer): 1. After validate_plan_skills returns ok=False, instead of raising, build a corrective prompt listing the missing skills, the FULL allowed registry, and the three most common confusions (file_read→file_ops, fetch_url→web_fetch, read_file→file_ops). 2. Re-call _qwen_chat ONCE with the appended correction. 3. Re-validate the second draft. If valid, use it. If not, raise with BOTH attempts in the message so the user sees the model is consistently confused (vs a one-off transient miss). 4. If the retry call itself fails (Qwen flakes between attempts), raise with the ORIGINAL validation error — more diagnostic than "qwen flaked on retry". Also: - Strengthen _PLAN_SYSTEM_PROMPT with the same three confusion hints so the FIRST draft is more likely to succeed (cuts the retry rate). Tests (3 new in tests/test_agent_plan.py — all pass): - test_draft_plan_retries_on_hallucinated_skill_then_succeeds Reproduces the exact user case: file_read on attempt 1, file_ops on attempt 2, plan succeeds. - test_draft_plan_retry_also_fails_raises_with_both_attempts Both attempts hallucinate (file_read, then read_file): error message contains both for diagnostic value. - test_draft_plan_retry_qwen_unavailable_surfaces_original_error Retry call raises ConnectionError: original validation error surfaces with "retry failed" appended. All 3 existing draft_plan tests still pass — backward-compat preserved. The existing test_draft_plan_rejects_unknown_skill now exercises BOTH attempts (fake_qwen_chat returns same bad plan each time) and still raises with the missing skill in the message. Total: 35/35 file pass + 7 pre-existing pynput env failures (unchanged). Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AVADSA25 merged commit 9369bad into main May 3, 2026
1 check passed

AVADSA25 mentioned this pull request May 4, 2026

hotfix: plan-time retry on hallucinated skill names #41

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix: retry on path/read_path/domain LLM hallucinations#35

hotfix: retry on path/read_path/domain LLM hallucinations#35
AVADSA25 merged 1 commit intomainfrom
fix/retry-all-permission-types

AVADSA25 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 3, 2026

Summary

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants