Skip to content

Commit 09b85da

Browse files
committed
docs(phase1-step3): T+0 sample + sampling-commands reference (no reminders)
Two docs, both plain markdown references, NO Apple Reminders / Notes / Calendar events / notifications. PHASE1-STEP3-POSTMERGE-SAMPLES.md: - T+0 captured at 2026-05-01 13:47 UTC after PM2 restart of all 4 codec services with Step 3 code loaded (codec_ask_user.ASKUSER_EVENT_EMIT importable, codec_ask_user.ask importable) - Status=ok (sample sparse — n=0 with duration in 30-min window matches Step 1 + Step 2 T+0 captures; no breach of any revert threshold) - 5 pending sample slots for T+4h through T+20h - All state files clean: 0 pending_questions, 0 question notifications, 0 /tmp/codec_*.txt, 0 Apple Reminders PHASE1-STEP3-SAMPLING-COMMANDS.md (new approach per user request): - Plain checklist user can OPEN if they want — no nagging - Suggested cadence: T+4h / T+8h / T+12h / T+16h / T+20h (informational) - Single paste-able Python one-liner that captures + threshold-checks - Quick-check commands for service health + state-file health - Per-feature kill switch instructions (ASKUSER_ENABLED / STUCK_DETECTION_ENABLED / STEP_BUDGET_ENABLED) - Sign-off contract for marking Step 3 production-stable - Explicit "What this file does NOT do" section: no reminders, no notes, no calendar events, no notifications, no timer Per user instruction: this replaces the Apple Reminders mechanism that caused the 2026-05-01 incident (10 reminders + UI bombardment + Terminal popups). Reference file only.
1 parent 59bfbda commit 09b85da

2 files changed

Lines changed: 210 additions & 0 deletions

File tree

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Phase 1 Step 3 — post-merge samples
2+
3+
Tracker for the 24-hour post-merge watch on `main` after `59bfbda` (Merge PR #5: Step 3 — AskUserQuestion + stuck detection + step budget).
4+
5+
**Anchor baseline (from `docs/PHASE1-STEP3-BASELINE.md`):** Reusing Step 1's 8-day production sample window — `avg=987.96 ms`, `p95=1907.78 ms` over `n=203`.
6+
7+
**Hard-revert thresholds:**
8+
- `p95 > 3815.56 ms` (2× baseline)
9+
- `avg > 1975.92 ms` (2× baseline)
10+
- Step 3 audit-event flood (any of `ask_user_question_*`, `stuck_*`, `step_budget_exhausted` fires `>10×` normal volume)
11+
12+
---
13+
14+
## Sample T+0 — 2026-05-01 13:47 UTC (15:47 CEST)
15+
16+
**Status:** ok
17+
18+
| Metric | Value |
19+
|---|---|
20+
| Records in 30-min window | 15 |
21+
| Records with `duration_ms` | 0 |
22+
| avg duration_ms | n/a (no duration records yet — process restart window) |
23+
| p95 duration_ms | n/a |
24+
| Step 3 audit events emitted | 0 (`ask_user_question_*`, `stuck_*`, `step_budget_exhausted`) |
25+
| Source distribution | codec-heartbeat: 10, codec-agents: 3, codec-self-improve: 2 |
26+
27+
**Service health (all green):**
28+
- `Dashboard /api/health`: HTTP 200
29+
- `MCP-HTTP /health`: HTTP 200
30+
- `codec_audit.ASKUSER_EVENT_EMIT` importable: ✓ (`ask_user_question_emit`)
31+
- `codec_audit.STUCK_EVENT_WARNING` importable: ✓ (`stuck_warning`)
32+
- `codec_audit.STEP_BUDGET_EXHAUSTED` importable: ✓ (`step_budget_exhausted`)
33+
- `codec_ask_user.ask` importable: ✓
34+
- Step 3 production files present: `codec_ask_user.py` (665 LOC), `skills/ask_user.py` (51 LOC), `skills/stuck.py` (43 LOC)
35+
36+
**State files clean:**
37+
- `~/.codec/pending_questions.json`: 0 entries
38+
- `~/.codec/notifications.json` `type="question"`: 0 entries
39+
- `/tmp/codec_*.txt`: 0 files
40+
- Apple Reminders incomplete: 0 (per user request — no reminders for monitoring this time)
41+
42+
**T+0 sample is sparse (n=0 with duration) because:**
43+
1. PM2 restart cycle finished at ~13:47 UTC; window starts at 13:17 UTC
44+
2. Most heartbeat / self_improve / shell_blocked emits don't carry `duration_ms`
45+
3. The hot path (chat / voice / MCP tool calls) hasn't seen substantial traffic in this short window
46+
47+
This shape matches Step 1 + Step 2 T+0 captures (also `n_with_duration=0`). Status: **ok** because there is no breach of any revert threshold — there is no traffic AT ALL to breach.
48+
49+
**Next sample:** T+4h target = 2026-05-01 17:47 UTC (19:47 CEST). See `docs/PHASE1-STEP3-SAMPLING-COMMANDS.md` for the one-liner to capture it. **No Apple Reminder created** — per user instruction.
50+
51+
---
52+
53+
## Sample T+4h — pending
54+
55+
(Append after running the capture command at T+4h.)
56+
57+
---
58+
59+
## Sample T+8h — pending
60+
61+
---
62+
63+
## Sample T+12h — pending
64+
65+
---
66+
67+
## Sample T+16h — pending
68+
69+
---
70+
71+
## Sample T+20h — pending
72+
73+
---
74+
75+
## 24h sign-off
76+
77+
When all six samples are within thresholds AND there is no Step 3 audit-event flood AND no `tests/test_ask_user.py::test_ambiguous_consent_two_strikes_times_out`-style regression on live load: append a single line to `docs/known-issues.md` marking Phase 1 Step 3 as production-stable. Until that line lands, Phase 1 Step 4 (codec_self_improve plugin migration to a `codec_hooks`-based plugin) does not start.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Phase 1 Step 3 — sampling commands (plain reference)
2+
3+
> **No Apple Reminders. No Notes. No Calendar events. No notifications.** This file is a plain checklist the user can open IF they want to capture a sample. Nothing pings or interrupts.
4+
5+
## When to capture (suggested cadence — informational only, no enforcement)
6+
7+
Step 3 merged at **2026-05-01 13:47 UTC (15:47 CEST)** as commit `59bfbda`.
8+
9+
| Label | Target time (UTC) | Target time (CEST) |
10+
|---|---|---|
11+
| T+0 | 2026-05-01 13:47 | 15:47 — DONE |
12+
| T+4h | 2026-05-01 17:47 | 19:47 |
13+
| T+8h | 2026-05-01 21:47 | 23:47 |
14+
| T+12h | 2026-05-02 01:47 | 03:47 |
15+
| T+16h | 2026-05-02 05:47 | 07:47 |
16+
| T+20h | 2026-05-02 09:47 | 11:47 |
17+
18+
If you skip a sample, just capture the next one. The cadence is a guideline, not a contract.
19+
20+
## How to capture a sample (one paste-able command)
21+
22+
```bash
23+
LABEL="T+4h"
24+
cd ~/codec-repo
25+
/Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13 << 'PYEOF'
26+
import json
27+
from datetime import datetime, timezone, timedelta
28+
records = []
29+
with open('/Users/mickaelfarina/.codec/audit.log') as f:
30+
for line in f:
31+
if line.strip():
32+
try: records.append(json.loads(line))
33+
except: pass
34+
records.sort(key=lambda r: r.get('ts',''))
35+
now = datetime.now(timezone.utc)
36+
cutoff = (now - timedelta(minutes=30)).isoformat(timespec='milliseconds')
37+
window = [r for r in records if r.get('ts','') >= cutoff]
38+
durs = [r['duration_ms'] for r in window if r.get('duration_ms') is not None]
39+
print(f'Records in 30-min window: {len(window)}')
40+
print(f' with duration: {len(durs)}')
41+
if durs:
42+
durs_s = sorted(durs)
43+
n = len(durs_s)
44+
avg = sum(durs)/n
45+
p95 = durs_s[int(0.95*(n-1))]
46+
print(f' avg duration_ms: {avg:.2f}')
47+
print(f' p95 duration_ms: {p95:.2f}')
48+
# Threshold check
49+
HARD_AVG = 1975.92
50+
HARD_P95 = 3815.56
51+
if avg > HARD_AVG or p95 > HARD_P95:
52+
print(f'⚠️ HARD-REVERT THRESHOLD BREACHED — investigate immediately')
53+
elif avg > 987.96 * 1.3 or p95 > 1907.78 * 1.3:
54+
print(f'⚠️ INVESTIGATE — over 1.3× baseline')
55+
else:
56+
print('✓ within baseline')
57+
step3_events = {'ask_user_question_emit', 'ask_user_question_answer', 'ask_user_question_timeout',
58+
'stuck_warning', 'stuck_escalated', 'step_budget_exhausted'}
59+
hits = [r for r in window if r.get('event','') in step3_events]
60+
print(f'Step 3 audit events emitted in window: {len(hits)}')
61+
if len(hits) > 50:
62+
print(f'⚠️ Step 3 audit-event flood — >10× normal volume; check for runaway agent / config issue')
63+
PYEOF
64+
```
65+
66+
Expected output for a normal sample:
67+
```
68+
Records in 30-min window: <some_int>
69+
with duration: <some_int>
70+
avg duration_ms: <some_float>
71+
p95 duration_ms: <some_float>
72+
✓ within baseline
73+
Step 3 audit events emitted in window: <small_int>
74+
```
75+
76+
## How to append a sample to the tracker
77+
78+
Open `docs/PHASE1-STEP3-POSTMERGE-SAMPLES.md` in any editor and replace one of the `## Sample T+Xh — pending` blocks with the captured numbers. Format matches the `## Sample T+0` block at the top of that file.
79+
80+
Then `git add docs/PHASE1-STEP3-POSTMERGE-SAMPLES.md && git commit && git push` if you want it on remote.
81+
82+
## Service health quick-check (anytime)
83+
84+
```bash
85+
curl -s -o /dev/null -w "Dashboard: %{http_code}\n" http://127.0.0.1:8090/api/health
86+
curl -s -o /dev/null -w "MCP-HTTP: %{http_code}\n" http://127.0.0.1:8091/health
87+
pm2 list | grep -E "codec-(dashboard|mcp-http|heartbeat|open-codec)"
88+
```
89+
90+
Both endpoints should return `200`. All four PM2 processes should show `online`.
91+
92+
## State-file health quick-check (anytime)
93+
94+
```bash
95+
echo "pending_questions: $(/Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13 -c "import json; print(len(json.load(open('/Users/mickaelfarina/.codec/pending_questions.json')).get('pending_questions', [])))")"
96+
echo "question notifs: $(/Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13 -c "import json; print(len([n for n in json.load(open('/Users/mickaelfarina/.codec/notifications.json')) if n.get('type')=='question']))")"
97+
echo "/tmp/codec_*.txt: $(find /tmp -maxdepth 1 -name 'codec_*.txt' 2>/dev/null | wc -l | tr -d ' ')"
98+
echo "Apple Reminders: $(osascript -e 'tell application "Reminders" to count reminders whose completed is false' 2>&1)"
99+
```
100+
101+
All four should be `0` in normal operation. Non-zero `pending_questions` is OK if a real agent is waiting on a real user answer; non-zero `/tmp/codec_*.txt` is suspicious and points to a regression in the SKIP_SKILLS hotfix.
102+
103+
## Per-feature kill switches (if anything misbehaves)
104+
105+
Set the env var BEFORE PM2 restart of the affected process:
106+
107+
```bash
108+
ASKUSER_ENABLED=false pm2 restart codec-dashboard codec-mcp-http
109+
STUCK_DETECTION_ENABLED=false pm2 restart codec-dashboard
110+
STEP_BUDGET_ENABLED=false pm2 restart codec-dashboard
111+
```
112+
113+
Each disables its feature without modifying production code or restarting other services.
114+
115+
## When to mark Phase 1 Step 3 production-stable
116+
117+
When all six samples (T+0 through T+20h) are within 1.3× baseline AND no Step 3 audit-event flood AND no two-strike consent or stuck-warning regression on live load: append a single line to `docs/known-issues.md`:
118+
119+
```
120+
- Phase 1 Step 3 (commit 59bfbda) — production-stable as of <date>
121+
```
122+
123+
Until that line lands, Phase 1 Step 4 does NOT start.
124+
125+
## What this file does NOT do
126+
127+
- Does not create Apple Reminders.
128+
- Does not create Apple Notes.
129+
- Does not create Calendar events.
130+
- Does not send notifications.
131+
- Does not run on a timer.
132+
133+
It is a plain markdown reference. Open it when you want; ignore it otherwise.

0 commit comments

Comments
 (0)