From 81cb5f312c47639aa8791e61060e511ab48c69cf Mon Sep 17 00:00:00 2001 From: Mickael Farina Date: Fri, 1 May 2026 15:25:43 +0200 Subject: [PATCH 1/2] =?UTF-8?q?docs(incident):=20UPDATE=20=E2=80=94=20seco?= =?UTF-8?q?nd=20user=20report=20at=2013:21=20UTC=20+=20permanent=20prevent?= =?UTF-8?q?ion=20plan?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User came back at 13:21 UTC (15:21 CEST) reporting CODEC was STILL firing every 5min — 5 windows / 5 same Notes. Investigation found a SECOND leak source distinct from the reminders one in the original incident doc. Root cause #2: Step 3 AskUserQuestion test-fixture leak. When my test runs (test_ask_user.py, test_destructive_consent.py) executed today between 12:22 and 13:22 UTC, the `temp_askuser_paths` monkeypatch fixture did not stick in some test orderings (likely module-cache reentry on the full suite with worktree-aware path resolution). Result: 11 entries written to ~/.codec/pending_questions.json (7 pending + 4 timed_out) AND 11 type="question" notifications in ~/.codec/notifications.json. Dashboard PWA polls these every 8s and renders an inline answer panel for each → user saw "5 same windows" and "TestAgent is asking a question". Root cause #3: same window saw 24 skill_proposal_staged emits because test_mcp_all_tools.py iterates EVERY MCP-exposed skill including self_improve. self_improve's run_once() calls Qwen and writes a .md proposal per gap — explains the cascade. Cleanup performed at 13:21 UTC (already done, documented here): - pending_questions.json: 11 → 0 (backup preserved) - notifications.json: 11 type=question removed (179 → 168, backup preserved) - Quit auto-opened Notes / Reminders / TextEdit - Killed NotificationCenter to clear stuck banners - Updated user's runtime ~/.codec/skills/reminders.py to FIXED version (read-mode for "list reminders" — prevents future leaks creating real Apple Reminders if any test or LLM ever calls reminders again) Permanent prevention plan added (6 items): 1. THIS hotfix already blocks reminders/notes/tts_say/qr in tests ✅ 2. Tighten Step 3 fixture monkeypatch BEFORE merge 3. Add self_improve to SKIP_SKILLS 4. Stop using Apple Reminders for monitoring checkpoints (decide format after Step 3 lands, per user's existing instruction) 5. Optional CI/pre-commit gate: fail if any test writes to ~/.codec/* 6. Document test-isolation contract in AGENTS.md §10 What was NOT done (per user contract): - No PM2 restart - No killing Claude.app's codec_mcp.py children - _HTTP_BLOCKED untouched - Backups preserved for forensic record --- ...NCIDENT-2026-05-01-spurious-skill-fires.md | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/docs/INCIDENT-2026-05-01-spurious-skill-fires.md b/docs/INCIDENT-2026-05-01-spurious-skill-fires.md index d8dc585..8611850 100644 --- a/docs/INCIDENT-2026-05-01-spurious-skill-fires.md +++ b/docs/INCIDENT-2026-05-01-spurious-skill-fires.md @@ -276,3 +276,61 @@ python3 -c "import sys; sys.path.insert(0, '$HOME/.codec/skills'); import remind - [ ] User decides whether to keep the existing monitoring reminders or delete them (they're useful for the 24h watch). - [ ] User approves the test cleanup (`SKIP_SKILLS` additions to `tests/test_mcp_all_tools.py`). - [ ] Step 3 PR #5 may resume after sign-off. + +--- + +## UPDATE 2026-05-01 15:25 CEST — second user report + AskUserQuestion leak + +User came back at 13:21 UTC reporting "CODEC firing every 5min — 5 different windows, 5 same terminal/Notes". Investigation found a SECOND leak source distinct from the reminders one above. + +### Root cause #2: Step 3 AskUserQuestion test fixture leaked + +When I ran my Phase 1 Step 3 test files (`test_ask_user.py`, `test_destructive_consent.py`) repeatedly today between 12:22 and 13:22 UTC, the `temp_askuser_paths` fixture was supposed to monkeypatch `codec_ask_user.PENDING_QUESTIONS_PATH` and `codec_ask_user.NOTIFICATIONS_PATH` to `tmp_path`. **In some test orderings the patch did not stick** (likely because the worktree-aware path resolution + module-cache interaction on the full suite caused codec_ask_user to be imported from a different module instance than the one being monkeypatched). + +Result: **11 AskUserQuestion test entries leaked** into `~/.codec/`: +- 11 entries written to `~/.codec/pending_questions.json` (7 status=pending + 4 timed_out) +- 11 `type="question"` entries written to `~/.codec/notifications.json` + +The dashboard PWA polls `/api/notifications/count` every ~30s and renders an inline AskUserQuestion answer panel for each pending entry. **From the user's POV this looked like CODEC autonomously asking 5+ questions.** + +Verified by reading the leaked files: +``` +2026-05-01T13:22:55 type=question title=TestAgent is asking a question +2026-05-01T13:20:25 type=question title=TestAgent is asking a question +2026-05-01T13:15:56 type=question title=TestAgent is asking a question +... (8 more) ... +``` + +The agent name "TestAgent" came from the test's `_make_agent()` helper which constructs `Agent(name="TestAgent", ...)`. That confirms the entries are test artifacts, not real agent runs. + +### Root cause #3: leaked pytest runs caused `self_improve` cascade + +Same window (12:22 → 13:16 UTC) saw 24 `skill_proposal_staged` audit emits, paired with `service_down` events. These came from `self_improve` skill being fired by `tests/test_mcp_all_tools.py` — the test iterates every MCP-exposed skill and `self_improve` IS exposed. Each call writes a markdown proposal to `~/.codec/skill_proposals/2026-04-30/`. No user-visible effect, but it polluted the audit log and burned LLM cycles. + +### Cleanup performed at 13:21 UTC + +1. **Cleared `~/.codec/pending_questions.json`** — 11 → 0 entries (backup at `pending_questions.json.bak-1777641483`) +2. **Filtered `~/.codec/notifications.json`** — removed 11 `type="question"` entries (179 → 168, backup at `notifications.json.bak-1777641483`) +3. **Quit Notes / Reminders / TextEdit** apps that the test runs had auto-opened +4. **Killed NotificationCenter** to clear any stuck banners (auto-respawned by macOS) +5. **Updated `~/.codec/skills/reminders.py`** to the FIXED version from `~/codec-repo/skills/reminders.py` (read-mode for "list reminders" — prevents future test runs OR LLM calls from creating real Apple Reminders) +6. **Verified state at 13:21 UTC**: 0 pending questions, 0 question notifications, 0 incomplete reminders. + +### Permanent prevention plan + +| # | Action | When | +|---|---|---| +| 1 | This hotfix (PR #6, merged) blocks `reminders/notes/tts_say/qr_generator/generate_qr_code` from firing in test_mcp_all_tools.py | DONE — landed in `fcbef2f` | +| 2 | Update Step 3 test fixtures (`test_ask_user.py`, `test_destructive_consent.py`) to use a tighter monkeypatch pattern that survives module re-imports | Roll into Step 3 PR #5 before merge | +| 3 | Add `self_improve` to SKIP_SKILLS in test_mcp_all_tools.py | Same Step 3 PR or follow-up | +| 4 | Stop using Apple Reminders for monitoring checkpoints. Move to `~/.codec/scheduled_tasks` (PM2 cron) or a simple text checklist in `docs/PHASE1-STEP3-POSTMERGE-SAMPLES.md` instead | Decide AFTER Step 3 lands. User said: "Going-forward sampling format (launchd vs manual vs none) gets decided after Step 3 lands." | +| 5 | Add a pre-commit hook OR CI check that fails if any test writes to `~/.codec/*` (detect leaked monkeypatches) | Optional follow-up | +| 6 | Document the test-isolation contract in AGENTS.md §10: every test that touches codec_ask_user / codec_audit / codec_voice MUST monkeypatch the path AND verify the patch stuck before any state write | Step 3 PR addendum | + +### What I am NOT doing without authorization + +- Not restarting any PM2 process (per contract) +- Not killing the codec_mcp.py instances spawned by Claude.app (would break Claude.app's CODEC integration; user can quit Claude.app themselves if they want it gone) +- Not deleting backups (`pending_questions.json.bak-*` and `notifications.json.bak-*`) — leaving for forensic record +- Not modifying any other production code +- Not touching `_HTTP_BLOCKED` From 3cacca99dac17aa0c300da68e136e6dc71269343 Mon Sep 17 00:00:00 2001 From: Mickael Farina Date: Fri, 1 May 2026 15:38:49 +0200 Subject: [PATCH 2/2] =?UTF-8?q?fix(tests):=20hotfix=20v2=20=E2=80=94=20ski?= =?UTF-8?q?p=20Terminal-opening=20skills=20(memory=5Fsearch,=20clipboard,?= =?UTF-8?q?=20self=5Fimprove)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second incident escalation at 15:35 CEST. User came back showing screenshot of THREE auto-opening Terminal windows displaying: - codec_clipboard_aek3vj4c.txt (CODEC CLIPBOARD HISTORY ...) - codec_clipboard_j5dehi5i.txt (same content, second copy) - codec_memory_36ypw9hl.txt (CODEC MEMORY SEARCH: 'test' ...) Investigation: - /tmp/codec_memory_*.txt files: 10 of them, mtimes 14:01 → 15:27 CEST matching every full pytest suite I ran during this conversation - Each file is a memory_search results dump, opened via: subprocess.Popen(["osascript","-e", 'tell app "Terminal" to do script "cat && ... && read && rm"']) - CANONICAL_PROMPTS["memory_search"] = "test" → mod.run("test") on every test suite invocation → opens 1 fresh Terminal window per run - Same pattern in skills/clipboard.py (codec_clipboard_*.txt) - self_improve writes Qwen-drafted .md proposals to ~/.codec/skill_proposals/ on every test run — slow, expensive, audit-log noise I ran the full pytest suite ~5 times today doing the Step 3 rebase + the pre-merge audit. Each run queued 1 memory_search Terminal popup. macOS delivered them slowly, hence the user perceiving them "popping up out of nowhere" several minutes AFTER my last test run. This commit: - Adds memory_search, clipboard, self_improve to SKIP_SKILLS - Result: 36 → 39 skipped, 27 → 24 fired - Verified: NONE of the 24 remaining-fired skills open Terminal windows, write temp files via osascript, or open browser tabs (grep clean) Cleanup performed at 15:36 CEST: - find /tmp -name "codec_*.txt" -delete (12 files removed) - closed any Terminal windows still showing those files - (no PM2 restart, no _HTTP_BLOCKED change) This is the third hotfix layer for the same incident: PR #6 (merged): reminders/notes/tts_say/qr_generator/generate_qr_code Step 3 PR #5: ask_user/stuck (interactive blockers) THIS: memory_search/clipboard/self_improve (Terminal popups) Permanent fix already documented in docs/INCIDENT-2026-05-01-spurious-skill-fires.md prevention plan item #5: add a CI/pre-commit gate that fails if any test writes to ~/.codec/* OR spawns a `Terminal "do script"` subprocess. --- tests/test_mcp_all_tools.py | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/tests/test_mcp_all_tools.py b/tests/test_mcp_all_tools.py index 063e8e3..e10a7ef 100644 --- a/tests/test_mcp_all_tools.py +++ b/tests/test_mcp_all_tools.py @@ -58,13 +58,26 @@ "scheduler", "scheduler_skill", "ax_control", "file_ops", "python_exec", "terminal", "process_manager", "pm2_control", "app_switch", "timer", - # 2026-05-01 incident hotfix — macOS UI side effects. + # 2026-05-01 incident hotfix v1 — macOS UI side effects. # See docs/INCIDENT-2026-05-01-spurious-skill-fires.md. # The user's `~/.codec/skills/reminders.py` may be the OLD version with # no read-mode; "list reminders" → creates a real Apple Reminder named # "list reminders". Tts_say literally speaks via macOS `say`. Notes # opens the Notes app. Generate_qr_code writes qr.png to cwd. "reminders", "tts_say", "notes", "generate_qr_code", "qr_generator", + # 2026-05-01 incident hotfix v2 — Terminal-window-opening skills. + # User got bombarded with new Terminal windows every time the test + # suite ran because these skills literally `osascript "tell Terminal + # to do script ..."` to display their results in a popup window. + # On a workstation in active use, that's intolerable noise. + # - memory_search: writes results to /tmp/codec_memory_.txt then + # opens it via osascript. CANONICAL_PROMPTS["memory_search"]="test" + # triggered this on every full-suite run. + # - clipboard: same pattern — /tmp/codec_clipboard_.txt opened + # in a fresh Terminal window. + # - self_improve: writes Qwen-drafted markdown proposals to + # ~/.codec/skill_proposals/. Slow (LLM call) and audit-log noise. + "memory_search", "clipboard", "self_improve", }