hotfix: retry on path/read_path/domain LLM hallucinations#35
Merged
Conversation
PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits `domain_not_authorized` and path violations just as often (e.g. plan allows api.exchangerate-api.com, model emits bare exchangerate-api.com). Refactor the retry block in _execute_checkpoint to dispatch on all four PermissionViolation reasons via _build_correction_nudge(): - skill_not_authorized -> list allowed skills - path_not_authorized -> list allowed write_paths globs - read_path_not_authorized -> list allowed read_paths globs - domain_not_authorized -> list allowed network_domains Each nudge appended to history with _skill_correction_nudge marker so _qwen_next_action sees the corrected closed-world allowlist on the retry. SECOND consecutive miss still raises -> blocked_on_permission. Tests (4 total in this slice, 46 total in test_agent_runner.py): - existing skill retry test still green - domain retry test (forex anchor scenario) - write_path retry test - read_path retry test Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
AVADSA25
added a commit
that referenced
this pull request
May 4, 2026
User repro 2026-05-04 09:58:
Project: "Read all markdown files in ~/codec-repo/docs/ and create
an index.md that lists each file with its first heading and
a one-line description"
Result: Plan failed: plan invalid: plan references unknown skills:
['file_read']
Same hallucination CLASS as PR #35 but at a different LAYER.
PR #35 fixed retries during execution (codec_agent_runner). This is
failing earlier — at plan validation, before the plan is even saved.
The user never even got to the approve-or-reject step.
Root cause:
Qwen drafts plans naming skills that don't exist. `file_read` and
`fetch_url` are the two we've seen. The actual file-reading skill is
`file_ops` (which reads, writes, appends, lists). The actual URL fetch
is `web_fetch`. The user-visible result was the same as PR #35 —
project mode dies before doing anything useful.
Fix (mirrors PR #35's pattern at planning layer):
1. After validate_plan_skills returns ok=False, instead of raising,
build a corrective prompt listing the missing skills, the FULL
allowed registry, and the three most common confusions
(file_read→file_ops, fetch_url→web_fetch, read_file→file_ops).
2. Re-call _qwen_chat ONCE with the appended correction.
3. Re-validate the second draft. If valid, use it. If not, raise
with BOTH attempts in the message so the user sees the model is
consistently confused (vs a one-off transient miss).
4. If the retry call itself fails (Qwen flakes between attempts),
raise with the ORIGINAL validation error — more diagnostic than
"qwen flaked on retry".
Also:
- Strengthen _PLAN_SYSTEM_PROMPT with the same three confusion hints
so the FIRST draft is more likely to succeed (cuts the retry rate).
Tests (3 new in tests/test_agent_plan.py — all pass):
- test_draft_plan_retries_on_hallucinated_skill_then_succeeds
Reproduces the exact user case: file_read on attempt 1, file_ops
on attempt 2, plan succeeds.
- test_draft_plan_retry_also_fails_raises_with_both_attempts
Both attempts hallucinate (file_read, then read_file): error
message contains both for diagnostic value.
- test_draft_plan_retry_qwen_unavailable_surfaces_original_error
Retry call raises ConnectionError: original validation error
surfaces with "retry failed" appended.
All 3 existing draft_plan tests still pass — backward-compat preserved.
The existing test_draft_plan_rejects_unknown_skill now exercises BOTH
attempts (fake_qwen_chat returns same bad plan each time) and still
raises with the missing skill in the message.
Total: 35/35 file pass + 7 pre-existing pynput env failures (unchanged).
Co-authored-by: Mickael Farina <farina.mickael@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #34 only retried
skill_not_authorizedviolations. Real-world Qwen drift hits the other three PermissionViolation reasons too — confirmed by today's anchor-example run where the plan listedapi.exchangerate-api.combut the model emitted bareexchangerate-api.comand the agent wentblocked_on_permission.This extends the same single-shot correction-nudge pattern to all four reasons.
What changed
codec_agent_runner.py_build_correction_nudge(pv, action, agent_grants, global_grants)that emits a closed-world allowlist string per reason:skill_not_authorized→ list allowed skillspath_not_authorized→ list allowedwrite_pathsglobsread_path_not_authorized→ list allowedread_pathsglobsdomain_not_authorized→ list allowednetwork_domains_execute_checkpointretry block dispatches onpv.reasonvia the helper instead of the if/else that only matched skills.blocked_on_permission(unchanged user-visible escape hatch).tests/test_agent_runner.py— 3 new tests next to the existing skill retry test:test_domain_hallucination_retries_with_corrected_domain_list— the forex scenariotest_write_path_hallucination_retries_with_corrected_path_listtest_read_path_hallucination_retries_with_corrected_path_listTest plan
tests/test_agent_runner.pypass locally (43 prior + 3 new + 1 skill retry from hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry #34)python -c "import codec_agent_runner"cleanblocked_on_permission🤖 Generated with Claude Code