hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry by AVADSA25 · Pull Request #34 · AVADSA25/codec

AVADSA25 · 2026-05-03T20:23:05Z

Summary

Two related issues from your first real anchor-example run.

Issue 1 — notifications counted but invisible

"I keep getting notifications but I don't see what it is — doesn't show in history or report."

Root cause: codec_tasks.html:913 was filtering the Reports tab to ONLY n.type === 'task_report'. All Phase 1+2+3+3.5 notification types got silently dropped.

Fix: introduce REPORT_NOTIF_TYPES whitelist with all 8 CODEC types:

task_report (existing crews)
question (Phase 1 Step 3 ask_user)
shift_report (Phase 2 Step 7)
agent_update, agent_blocked, agent_question, agent_done, agent_aborted (Phase 3 Step 10)
proactive_suggestion (Phase 3.5)

Reports tab now shows everything. Sorted newest-first.

Issue 2 — LLM skill-hallucination → unrecoverable block

Your forex-briefing agent (agent_07b2b6eeab02) failed at checkpoint 1 because Qwen-3.6 chose to call skill fetch_url (doesn't exist) instead of web_fetch (which IS in the manifest). The permission gate correctly blocked — but there was no recovery, so the agent stalled in blocked_on_permission waiting for a useless /grant for a non-existent skill.

Fix in codec_agent_runner._execute_checkpoint:

On PermissionViolation(reason="skill_not_authorized"), append a correction nudge to history (lists allowed skills) and re-call _qwen_next_action ONCE
If the second action passes the gate → execution continues
If the second action also violates → falls through to the original block path
Path/domain violations still block immediately (those are real scoped gaps the user has to grant)

Naming drift now self-corrects in one retry. Most LLM hallucinations are surface naming — this gives Qwen a chance to course-correct without needing user intervention.

Tests

+1 test: test_skill_hallucination_retries_with_corrected_skill_list

Fakes Qwen returning fetch_url (hallucinated) → weather (real) → checkpoint_done
Asserts only weather actually ran via _run_skill
Asserts correction nudge appears in history
No exception raised

test_agent_runner.py: 42 → 43. No regressions.

What to do after merge

cd ~/codec-repo
git pull
pm2 restart codec-dashboard codec-agent-runner

Then:

Abort the stalled agent (in /chat Project mode → click [abort] on the status pill, or curl -X POST http://localhost:8090/api/agents/agent_07b2b6eeab02/abort)
Re-drop the forex briefing prompt — should run further this time (will retry on hallucinations)
Click the bell at top → navigate to /tasks#reports → you'll now see all the agent_update / agent_blocked / agent_done events from this run + previous

🤖 Generated with Claude Code

Two related issues found during user's first real anchor-example run: ## 1. Notifications counted but invisible User reported: "I keep getting notifications but I don't see what it is, doesn't show in history or report." Root cause: codec_tasks.html:913 was filtering the Reports tab to ONLY n.type === 'task_report'. All Phase 1+2+3+3.5 notification types (question, shift_report, agent_update, agent_blocked, agent_question, agent_done, agent_aborted, proactive_suggestion) got dropped from view. Fix: introduce REPORT_NOTIF_TYPES whitelist with all 8 CODEC-generated types. Reports tab now shows everything. Newest-first sort added. ## 2. LLM skill-name hallucination → unrecoverable block User's forex-briefing agent (agent_07b2b6eeab02) failed at checkpoint 1 because Qwen-3.6 chose to call skill `fetch_url` (doesn't exist) instead of `web_fetch` (in manifest). Permission gate correctly blocked, but there was no recovery path — agent just sat in blocked_on_permission needing a useless /grant call for a non-existent skill. Fix in `codec_agent_runner._execute_checkpoint`: - On `PermissionViolation(reason="skill_not_authorized")`, append a correction-nudge entry to history (lists allowed skills), re-call _qwen_next_action ONCE - If the second action passes the gate, execution continues - If the second action ALSO violates, fall through to the original block path Naming drift recovers in one retry instead of stalling. Defensive: non-skill violations (path, domain) still block immediately because those are scoped, real permission gaps the user must grant. ## Tests +1 test in test_agent_runner.py: test_skill_hallucination_retries_with_corrected_skill_list — fakes Qwen returning fetch_url first, then weather (allowed), then checkpoint_done; asserts only weather actually ran, nudge appears in history, no exception raised. 42 → 43 in test_agent_runner.py. No regressions in other suites.

PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits `domain_not_authorized` and path violations just as often (e.g. plan allows api.exchangerate-api.com, model emits bare exchangerate-api.com). Refactor the retry block in _execute_checkpoint to dispatch on all four PermissionViolation reasons via _build_correction_nudge(): - skill_not_authorized -> list allowed skills - path_not_authorized -> list allowed write_paths globs - read_path_not_authorized -> list allowed read_paths globs - domain_not_authorized -> list allowed network_domains Each nudge appended to history with _skill_correction_nudge marker so _qwen_next_action sees the corrected closed-world allowlist on the retry. SECOND consecutive miss still raises -> blocked_on_permission. Tests (4 total in this slice, 46 total in test_agent_runner.py): - existing skill retry test still green - domain retry test (forex anchor scenario) - write_path retry test - read_path retry test Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AVADSA25 merged commit 6901647 into main May 3, 2026
1 check passed

AVADSA25 mentioned this pull request May 3, 2026

hotfix: retry on path/read_path/domain LLM hallucinations #35

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry#34

hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry#34
AVADSA25 merged 1 commit intomainfrom
fix/notification-types-render

AVADSA25 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 3, 2026

Summary

Issue 1 — notifications counted but invisible

Issue 2 — LLM skill-hallucination → unrecoverable block

Tests

What to do after merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants