From 81cb5f312c47639aa8791e61060e511ab48c69cf Mon Sep 17 00:00:00 2001
From: Mickael Farina <farina.mickael@gmail.com>
Date: Fri, 1 May 2026 15:25:43 +0200
Subject: [PATCH 1/2] =?UTF-8?q?docs(incident):=20UPDATE=20=E2=80=94=20seco?=
 =?UTF-8?q?nd=20user=20report=20at=2013:21=20UTC=20+=20permanent=20prevent?=
 =?UTF-8?q?ion=20plan?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

User came back at 13:21 UTC (15:21 CEST) reporting CODEC was STILL firing
every 5min — 5 windows / 5 same Notes. Investigation found a SECOND leak
source distinct from the reminders one in the original incident doc.

Root cause #2: Step 3 AskUserQuestion test-fixture leak. When my test
runs (test_ask_user.py, test_destructive_consent.py) executed today
between 12:22 and 13:22 UTC, the `temp_askuser_paths` monkeypatch fixture
did not stick in some test orderings (likely module-cache reentry on the
full suite with worktree-aware path resolution). Result: 11 entries
written to ~/.codec/pending_questions.json (7 pending + 4 timed_out)
AND 11 type="question" notifications in ~/.codec/notifications.json.
Dashboard PWA polls these every 8s and renders an inline answer panel
for each → user saw "5 same windows" and "TestAgent is asking a question".

Root cause #3: same window saw 24 skill_proposal_staged emits because
test_mcp_all_tools.py iterates EVERY MCP-exposed skill including
self_improve. self_improve's run_once() calls Qwen and writes a
.md proposal per gap — explains the cascade.

Cleanup performed at 13:21 UTC (already done, documented here):
- pending_questions.json: 11 → 0 (backup preserved)
- notifications.json: 11 type=question removed (179 → 168, backup preserved)
- Quit auto-opened Notes / Reminders / TextEdit
- Killed NotificationCenter to clear stuck banners
- Updated user's runtime ~/.codec/skills/reminders.py to FIXED version
  (read-mode for "list reminders" — prevents future leaks creating real
  Apple Reminders if any test or LLM ever calls reminders again)

Permanent prevention plan added (6 items):
1. THIS hotfix already blocks reminders/notes/tts_say/qr in tests ✅
2. Tighten Step 3 fixture monkeypatch BEFORE merge
3. Add self_improve to SKIP_SKILLS
4. Stop using Apple Reminders for monitoring checkpoints (decide format
   after Step 3 lands, per user's existing instruction)
5. Optional CI/pre-commit gate: fail if any test writes to ~/.codec/*
6. Document test-isolation contract in AGENTS.md §10

What was NOT done (per user contract):
- No PM2 restart
- No killing Claude.app's codec_mcp.py children
- _HTTP_BLOCKED untouched
- Backups preserved for forensic record
---
 ...NCIDENT-2026-05-01-spurious-skill-fires.md | 58 +++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/docs/INCIDENT-2026-05-01-spurious-skill-fires.md b/docs/INCIDENT-2026-05-01-spurious-skill-fires.md
index d8dc585..8611850 100644
--- a/docs/INCIDENT-2026-05-01-spurious-skill-fires.md
+++ b/docs/INCIDENT-2026-05-01-spurious-skill-fires.md
@@ -276,3 +276,61 @@ python3 -c "import sys; sys.path.insert(0, '$HOME/.codec/skills'); import remind
 - [ ] User decides whether to keep the existing monitoring reminders or delete them (they're useful for the 24h watch).
 - [ ] User approves the test cleanup (`SKIP_SKILLS` additions to `tests/test_mcp_all_tools.py`).
 - [ ] Step 3 PR #5 may resume after sign-off.
+
+---
+
+## UPDATE 2026-05-01 15:25 CEST — second user report + AskUserQuestion leak
+
+User came back at 13:21 UTC reporting "CODEC firing every 5min — 5 different windows, 5 same terminal/Notes". Investigation found a SECOND leak source distinct from the reminders one above.
+
+### Root cause #2: Step 3 AskUserQuestion test fixture leaked
+
+When I ran my Phase 1 Step 3 test files (`test_ask_user.py`, `test_destructive_consent.py`) repeatedly today between 12:22 and 13:22 UTC, the `temp_askuser_paths` fixture was supposed to monkeypatch `codec_ask_user.PENDING_QUESTIONS_PATH` and `codec_ask_user.NOTIFICATIONS_PATH` to `tmp_path`. **In some test orderings the patch did not stick** (likely because the worktree-aware path resolution + module-cache interaction on the full suite caused codec_ask_user to be imported from a different module instance than the one being monkeypatched).
+
+Result: **11 AskUserQuestion test entries leaked** into `~/.codec/`:
+- 11 entries written to `~/.codec/pending_questions.json` (7 status=pending + 4 timed_out)
+- 11 `type="question"` entries written to `~/.codec/notifications.json`
+
+The dashboard PWA polls `/api/notifications/count` every ~30s and renders an inline AskUserQuestion answer panel for each pending entry. **From the user's POV this looked like CODEC autonomously asking 5+ questions.**
+
+Verified by reading the leaked files:
+```
+2026-05-01T13:22:55 type=question title=TestAgent is asking a question
+2026-05-01T13:20:25 type=question title=TestAgent is asking a question
+2026-05-01T13:15:56 type=question title=TestAgent is asking a question
+... (8 more) ...
+```
+
+The agent name "TestAgent" came from the test's `_make_agent()` helper which constructs `Agent(name="TestAgent", ...)`. That confirms the entries are test artifacts, not real agent runs.
+
+### Root cause #3: leaked pytest runs caused `self_improve` cascade
+
+Same window (12:22 → 13:16 UTC) saw 24 `skill_proposal_staged` audit emits, paired with `service_down` events. These came from `self_improve` skill being fired by `tests/test_mcp_all_tools.py` — the test iterates every MCP-exposed skill and `self_improve` IS exposed. Each call writes a markdown proposal to `~/.codec/skill_proposals/2026-04-30/`. No user-visible effect, but it polluted the audit log and burned LLM cycles.
+
+### Cleanup performed at 13:21 UTC
+
+1. **Cleared `~/.codec/pending_questions.json`** — 11 → 0 entries (backup at `pending_questions.json.bak-1777641483`)
+2. **Filtered `~/.codec/notifications.json`** — removed 11 `type="question"` entries (179 → 168, backup at `notifications.json.bak-1777641483`)
+3. **Quit Notes / Reminders / TextEdit** apps that the test runs had auto-opened
+4. **Killed NotificationCenter** to clear any stuck banners (auto-respawned by macOS)
+5. **Updated `~/.codec/skills/reminders.py`** to the FIXED version from `~/codec-repo/skills/reminders.py` (read-mode for "list reminders" — prevents future test runs OR LLM calls from creating real Apple Reminders)
+6. **Verified state at 13:21 UTC**: 0 pending questions, 0 question notifications, 0 incomplete reminders.
+
+### Permanent prevention plan
+
+| # | Action | When |
+|---|---|---|
+| 1 | This hotfix (PR #6, merged) blocks `reminders/notes/tts_say/qr_generator/generate_qr_code` from firing in test_mcp_all_tools.py | DONE — landed in `fcbef2f` |
+| 2 | Update Step 3 test fixtures (`test_ask_user.py`, `test_destructive_consent.py`) to use a tighter monkeypatch pattern that survives module re-imports | Roll into Step 3 PR #5 before merge |
+| 3 | Add `self_improve` to SKIP_SKILLS in test_mcp_all_tools.py | Same Step 3 PR or follow-up |
+| 4 | Stop using Apple Reminders for monitoring checkpoints. Move to `~/.codec/scheduled_tasks` (PM2 cron) or a simple text checklist in `docs/PHASE1-STEP3-POSTMERGE-SAMPLES.md` instead | Decide AFTER Step 3 lands. User said: "Going-forward sampling format (launchd vs manual vs none) gets decided after Step 3 lands." |
+| 5 | Add a pre-commit hook OR CI check that fails if any test writes to `~/.codec/*` (detect leaked monkeypatches) | Optional follow-up |
+| 6 | Document the test-isolation contract in AGENTS.md §10: every test that touches codec_ask_user / codec_audit / codec_voice MUST monkeypatch the path AND verify the patch stuck before any state write | Step 3 PR addendum |
+
+### What I am NOT doing without authorization
+
+- Not restarting any PM2 process (per contract)
+- Not killing the codec_mcp.py instances spawned by Claude.app (would break Claude.app's CODEC integration; user can quit Claude.app themselves if they want it gone)
+- Not deleting backups (`pending_questions.json.bak-*` and `notifications.json.bak-*`) — leaving for forensic record
+- Not modifying any other production code
+- Not touching `_HTTP_BLOCKED`

From 3cacca99dac17aa0c300da68e136e6dc71269343 Mon Sep 17 00:00:00 2001
From: Mickael Farina <farina.mickael@gmail.com>
Date: Fri, 1 May 2026 15:38:49 +0200
Subject: [PATCH 2/2] =?UTF-8?q?fix(tests):=20hotfix=20v2=20=E2=80=94=20ski?=
 =?UTF-8?q?p=20Terminal-opening=20skills=20(memory=5Fsearch,=20clipboard,?=
 =?UTF-8?q?=20self=5Fimprove)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Second incident escalation at 15:35 CEST. User came back showing screenshot
of THREE auto-opening Terminal windows displaying:
  - codec_clipboard_aek3vj4c.txt   (CODEC CLIPBOARD HISTORY ...)
  - codec_clipboard_j5dehi5i.txt   (same content, second copy)
  - codec_memory_36ypw9hl.txt      (CODEC MEMORY SEARCH: 'test' ...)

Investigation:
- /tmp/codec_memory_*.txt files: 10 of them, mtimes 14:01 → 15:27 CEST
  matching every full pytest suite I ran during this conversation
- Each file is a memory_search results dump, opened via:
    subprocess.Popen(["osascript","-e",
      'tell app "Terminal" to do script "cat <tmp> && ... && read && rm"'])
- CANONICAL_PROMPTS["memory_search"] = "test" → mod.run("test") on every
  test suite invocation → opens 1 fresh Terminal window per run
- Same pattern in skills/clipboard.py (codec_clipboard_*.txt)
- self_improve writes Qwen-drafted .md proposals to ~/.codec/skill_proposals/
  on every test run — slow, expensive, audit-log noise

I ran the full pytest suite ~5 times today doing the Step 3 rebase + the
pre-merge audit. Each run queued 1 memory_search Terminal popup. macOS
delivered them slowly, hence the user perceiving them "popping up out of
nowhere" several minutes AFTER my last test run.

This commit:
- Adds memory_search, clipboard, self_improve to SKIP_SKILLS
- Result: 36 → 39 skipped, 27 → 24 fired
- Verified: NONE of the 24 remaining-fired skills open Terminal windows,
  write temp files via osascript, or open browser tabs (grep clean)

Cleanup performed at 15:36 CEST:
- find /tmp -name "codec_*.txt" -delete   (12 files removed)
- closed any Terminal windows still showing those files
- (no PM2 restart, no _HTTP_BLOCKED change)

This is the third hotfix layer for the same incident:
  PR #6 (merged): reminders/notes/tts_say/qr_generator/generate_qr_code
  Step 3 PR #5:   ask_user/stuck (interactive blockers)
  THIS:           memory_search/clipboard/self_improve (Terminal popups)

Permanent fix already documented in
docs/INCIDENT-2026-05-01-spurious-skill-fires.md prevention plan item #5:
add a CI/pre-commit gate that fails if any test writes to ~/.codec/* OR
spawns a `Terminal "do script"` subprocess.
---
 tests/test_mcp_all_tools.py | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/test_mcp_all_tools.py b/tests/test_mcp_all_tools.py
index 063e8e3..e10a7ef 100644
--- a/tests/test_mcp_all_tools.py
+++ b/tests/test_mcp_all_tools.py
@@ -58,13 +58,26 @@
     "scheduler", "scheduler_skill", "ax_control", "file_ops",
     "python_exec", "terminal", "process_manager", "pm2_control",
     "app_switch", "timer",
-    # 2026-05-01 incident hotfix — macOS UI side effects.
+    # 2026-05-01 incident hotfix v1 — macOS UI side effects.
     # See docs/INCIDENT-2026-05-01-spurious-skill-fires.md.
     # The user's `~/.codec/skills/reminders.py` may be the OLD version with
     # no read-mode; "list reminders" → creates a real Apple Reminder named
     # "list reminders". Tts_say literally speaks via macOS `say`. Notes
     # opens the Notes app. Generate_qr_code writes qr.png to cwd.
     "reminders", "tts_say", "notes", "generate_qr_code", "qr_generator",
+    # 2026-05-01 incident hotfix v2 — Terminal-window-opening skills.
+    # User got bombarded with new Terminal windows every time the test
+    # suite ran because these skills literally `osascript "tell Terminal
+    # to do script ..."` to display their results in a popup window.
+    # On a workstation in active use, that's intolerable noise.
+    # - memory_search: writes results to /tmp/codec_memory_<hash>.txt then
+    #   opens it via osascript. CANONICAL_PROMPTS["memory_search"]="test"
+    #   triggered this on every full-suite run.
+    # - clipboard: same pattern — /tmp/codec_clipboard_<hash>.txt opened
+    #   in a fresh Terminal window.
+    # - self_improve: writes Qwen-drafted markdown proposals to
+    #   ~/.codec/skill_proposals/. Slow (LLM call) and audit-log noise.
+    "memory_search", "clipboard", "self_improve",
 }