Skip to content

hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry#34

Merged
AVADSA25 merged 1 commit intomainfrom
fix/notification-types-render
May 3, 2026
Merged

hotfix: notification visibility (Reports tab filter) + LLM skill-hallucination retry#34
AVADSA25 merged 1 commit intomainfrom
fix/notification-types-render

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented May 3, 2026

Summary

Two related issues from your first real anchor-example run.

Issue 1 — notifications counted but invisible

"I keep getting notifications but I don't see what it is — doesn't show in history or report."

Root cause: codec_tasks.html:913 was filtering the Reports tab to ONLY n.type === 'task_report'. All Phase 1+2+3+3.5 notification types got silently dropped.

Fix: introduce REPORT_NOTIF_TYPES whitelist with all 8 CODEC types:

  • task_report (existing crews)
  • question (Phase 1 Step 3 ask_user)
  • shift_report (Phase 2 Step 7)
  • agent_update, agent_blocked, agent_question, agent_done, agent_aborted (Phase 3 Step 10)
  • proactive_suggestion (Phase 3.5)

Reports tab now shows everything. Sorted newest-first.

Issue 2 — LLM skill-hallucination → unrecoverable block

Your forex-briefing agent (agent_07b2b6eeab02) failed at checkpoint 1 because Qwen-3.6 chose to call skill fetch_url (doesn't exist) instead of web_fetch (which IS in the manifest). The permission gate correctly blocked — but there was no recovery, so the agent stalled in blocked_on_permission waiting for a useless /grant for a non-existent skill.

Fix in codec_agent_runner._execute_checkpoint:

  • On PermissionViolation(reason="skill_not_authorized"), append a correction nudge to history (lists allowed skills) and re-call _qwen_next_action ONCE
  • If the second action passes the gate → execution continues
  • If the second action also violates → falls through to the original block path
  • Path/domain violations still block immediately (those are real scoped gaps the user has to grant)

Naming drift now self-corrects in one retry. Most LLM hallucinations are surface naming — this gives Qwen a chance to course-correct without needing user intervention.

Tests

+1 test: test_skill_hallucination_retries_with_corrected_skill_list

  • Fakes Qwen returning fetch_url (hallucinated) → weather (real) → checkpoint_done
  • Asserts only weather actually ran via _run_skill
  • Asserts correction nudge appears in history
  • No exception raised

test_agent_runner.py: 42 → 43. No regressions.

What to do after merge

cd ~/codec-repo
git pull
pm2 restart codec-dashboard codec-agent-runner

Then:

  1. Abort the stalled agent (in /chat Project mode → click [abort] on the status pill, or curl -X POST http://localhost:8090/api/agents/agent_07b2b6eeab02/abort)
  2. Re-drop the forex briefing prompt — should run further this time (will retry on hallucinations)
  3. Click the bell at top → navigate to /tasks#reports → you'll now see all the agent_update / agent_blocked / agent_done events from this run + previous

🤖 Generated with Claude Code

Two related issues found during user's first real anchor-example run:

## 1. Notifications counted but invisible

User reported: "I keep getting notifications but I don't see what it is,
doesn't show in history or report."

Root cause: codec_tasks.html:913 was filtering the Reports tab to ONLY
n.type === 'task_report'. All Phase 1+2+3+3.5 notification types
(question, shift_report, agent_update, agent_blocked, agent_question,
agent_done, agent_aborted, proactive_suggestion) got dropped from view.

Fix: introduce REPORT_NOTIF_TYPES whitelist with all 8 CODEC-generated
types. Reports tab now shows everything. Newest-first sort added.

## 2. LLM skill-name hallucination → unrecoverable block

User's forex-briefing agent (agent_07b2b6eeab02) failed at
checkpoint 1 because Qwen-3.6 chose to call skill `fetch_url`
(doesn't exist) instead of `web_fetch` (in manifest). Permission
gate correctly blocked, but there was no recovery path — agent
just sat in blocked_on_permission needing a useless /grant call
for a non-existent skill.

Fix in `codec_agent_runner._execute_checkpoint`:
- On `PermissionViolation(reason="skill_not_authorized")`, append
  a correction-nudge entry to history (lists allowed skills),
  re-call _qwen_next_action ONCE
- If the second action passes the gate, execution continues
- If the second action ALSO violates, fall through to the original
  block path

Naming drift recovers in one retry instead of stalling. Defensive:
non-skill violations (path, domain) still block immediately because
those are scoped, real permission gaps the user must grant.

## Tests

+1 test in test_agent_runner.py:
  test_skill_hallucination_retries_with_corrected_skill_list — fakes
  Qwen returning fetch_url first, then weather (allowed), then
  checkpoint_done; asserts only weather actually ran, nudge appears
  in history, no exception raised.

42 → 43 in test_agent_runner.py. No regressions in other suites.
@AVADSA25 AVADSA25 merged commit 6901647 into main May 3, 2026
1 check passed
AVADSA25 added a commit that referenced this pull request May 3, 2026
PR #34 only retried `skill_not_authorized`. Real-world Qwen drift hits
`domain_not_authorized` and path violations just as often (e.g. plan
allows api.exchangerate-api.com, model emits bare exchangerate-api.com).

Refactor the retry block in _execute_checkpoint to dispatch on all
four PermissionViolation reasons via _build_correction_nudge():

  - skill_not_authorized       -> list allowed skills
  - path_not_authorized        -> list allowed write_paths globs
  - read_path_not_authorized   -> list allowed read_paths globs
  - domain_not_authorized      -> list allowed network_domains

Each nudge appended to history with _skill_correction_nudge marker so
_qwen_next_action sees the corrected closed-world allowlist on the
retry. SECOND consecutive miss still raises -> blocked_on_permission.

Tests (4 total in this slice, 46 total in test_agent_runner.py):
- existing skill retry test still green
- domain retry test (forex anchor scenario)
- write_path retry test
- read_path retry test

Co-authored-by: Mickael Farina <farina.mickael@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants