Conversation
Two related issues found during user's first real anchor-example run: ## 1. Notifications counted but invisible User reported: "I keep getting notifications but I don't see what it is, doesn't show in history or report." Root cause: codec_tasks.html:913 was filtering the Reports tab to ONLY n.type === 'task_report'. All Phase 1+2+3+3.5 notification types (question, shift_report, agent_update, agent_blocked, agent_question, agent_done, agent_aborted, proactive_suggestion) got dropped from view. Fix: introduce REPORT_NOTIF_TYPES whitelist with all 8 CODEC-generated types. Reports tab now shows everything. Newest-first sort added. ## 2. LLM skill-name hallucination → unrecoverable block User's forex-briefing agent (agent_07b2b6eeab02) failed at checkpoint 1 because Qwen-3.6 chose to call skill `fetch_url` (doesn't exist) instead of `web_fetch` (in manifest). Permission gate correctly blocked, but there was no recovery path — agent just sat in blocked_on_permission needing a useless /grant call for a non-existent skill. Fix in `codec_agent_runner._execute_checkpoint`: - On `PermissionViolation(reason="skill_not_authorized")`, append a correction-nudge entry to history (lists allowed skills), re-call _qwen_next_action ONCE - If the second action passes the gate, execution continues - If the second action ALSO violates, fall through to the original block path Naming drift recovers in one retry instead of stalling. Defensive: non-skill violations (path, domain) still block immediately because those are scoped, real permission gaps the user must grant. ## Tests +1 test in test_agent_runner.py: test_skill_hallucination_retries_with_corrected_skill_list — fakes Qwen returning fetch_url first, then weather (allowed), then checkpoint_done; asserts only weather actually ran, nudge appears in history, no exception raised. 42 → 43 in test_agent_runner.py. No regressions in other suites.
3 tasks
AVADSA25
added a commit
that referenced
this pull request
May 3, 2026
PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits `domain_not_authorized` and path violations just as often (e.g. plan allows api.exchangerate-api.com, model emits bare exchangerate-api.com). Refactor the retry block in _execute_checkpoint to dispatch on all four PermissionViolation reasons via _build_correction_nudge(): - skill_not_authorized -> list allowed skills - path_not_authorized -> list allowed write_paths globs - read_path_not_authorized -> list allowed read_paths globs - domain_not_authorized -> list allowed network_domains Each nudge appended to history with _skill_correction_nudge marker so _qwen_next_action sees the corrected closed-world allowlist on the retry. SECOND consecutive miss still raises -> blocked_on_permission. Tests (4 total in this slice, 46 total in test_agent_runner.py): - existing skill retry test still green - domain retry test (forex anchor scenario) - write_path retry test - read_path retry test Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related issues from your first real anchor-example run.
Issue 1 — notifications counted but invisible
Root cause:
codec_tasks.html:913was filtering the Reports tab to ONLYn.type === 'task_report'. All Phase 1+2+3+3.5 notification types got silently dropped.Fix: introduce
REPORT_NOTIF_TYPESwhitelist with all 8 CODEC types:task_report(existing crews)question(Phase 1 Step 3 ask_user)shift_report(Phase 2 Step 7)agent_update,agent_blocked,agent_question,agent_done,agent_aborted(Phase 3 Step 10)proactive_suggestion(Phase 3.5)Reports tab now shows everything. Sorted newest-first.
Issue 2 — LLM skill-hallucination → unrecoverable block
Your forex-briefing agent (
agent_07b2b6eeab02) failed at checkpoint 1 because Qwen-3.6 chose to call skillfetch_url(doesn't exist) instead ofweb_fetch(which IS in the manifest). The permission gate correctly blocked — but there was no recovery, so the agent stalled inblocked_on_permissionwaiting for a useless/grantfor a non-existent skill.Fix in
codec_agent_runner._execute_checkpoint:PermissionViolation(reason="skill_not_authorized"), append a correction nudge to history (lists allowed skills) and re-call_qwen_next_actionONCENaming drift now self-corrects in one retry. Most LLM hallucinations are surface naming — this gives Qwen a chance to course-correct without needing user intervention.
Tests
+1 test:
test_skill_hallucination_retries_with_corrected_skill_listfetch_url(hallucinated) →weather(real) →checkpoint_doneweatheractually ran via_run_skilltest_agent_runner.py: 42 → 43. No regressions.What to do after merge
Then:
/chatProject mode → click [abort] on the status pill, orcurl -X POST http://localhost:8090/api/agents/agent_07b2b6eeab02/abort)/tasks#reports→ you'll now see all the agent_update / agent_blocked / agent_done events from this run + previous🤖 Generated with Claude Code