feat(observer): continuous observation loop (Phase 2 Step 5) by AVADSA25 · Pull Request #9 · AVADSA25/codec

AVADSA25 · 2026-05-01T22:01:41Z

Summary

Phase 2 Step 5 — turns CODEC from "AI assistant that responds to commands" into "AI colleague that observes context." A background PM2 service polls four cheap signals (active window, screenshot OCR, clipboard delta, recent files) into a 10-min RAM-only ring buffer; chat + voice handlers conditionally prepend a ≤200-token summary to the LLM's system prompt, gated per the §X "Observation injection contract" so we don't leak screen content to cloud LLMs on every turn.

Implements docs/PHASE2-STEP5-DESIGN.md as designed. Locked Q1-Q6 user resolutions (60s/5min cadence, 10-min buffer, declarative skill triggers, CGEventSource idle, gated injection, no per-app blocklist) honored verbatim. Q5.1-Q5.7 design-phase open questions all resolved with my recommendations (already approved before implementation began).

Commits (a-e)

sha	what
`eac65c9`	`docs/PHASE2-BLUEPRINT.md` — Phase 2 plan with Q5 override folded in as §X
`cf48384`	`docs/PHASE2-STEP5-DESIGN.md` — Step 5 design doc (§0-§10 + §9 open questions)
`d799e61`	`codec_audit.py` — 4 new event constants + `PHASE2_STEP5_EVENTS` frozenset + 3 extra-field docs
`02b87cb`	`codec_observer.py` NEW — RingBuffer + poll + injection helper + run_daemon + Q5.7 forward-compat API
`09d8bb3`	`tests/test_observer.py` NEW — 30 tests, 100% passing in 0.18s
`62d6a4e`	`codec_dashboard.py` chat handler + `codec_voice.py` voice handler integration + PWA `/api/observer/buffer?debug=1` (Q5.6)
`38b398e`	`ecosystem.config.js` PM2 entry + `AGENTS.md` §3+§6+§10 updates

§X Observation injection contract (Q5 override)

Cheap text-pattern gating, no relevance model:

transport == "local"   → INJECT (always)         reason="always_local"
transport == "mcp"     → SKIP (client owns it)   reason="skipped_no_match"
transport ∈ {"chat",   ┌─ possessive_match*  → INJECT  reason="possessive_match"
            "voice",   ├─ continuation_match → INJECT  reason="continuation_match"
            "http"}    ├─ skill_flag         → INJECT  reason="skill_flag"
                       └─ otherwise          → SKIP    reason="skipped_no_match"

Test plan

pytest tests/test_observer.py → 30 passed in 0.18s
All polling primitives mocked (osascript / pbpaste / Quartz / screencapture) — tests pass on any platform, NO Apple state, NO Terminal popups
METADATA-only audit-emit verified by serializing extra dict and asserting raw content strings absent
Per Step 4 contract: did NOT run full pytest suite (avoids destructive-skill cascade)
Post-test state files clean: pending_questions=0, Apple Reminders=0, /tmp/codec_*.txt=0, ~/.codec/observation_summaries=0

Privacy contract — 4 layers

RAM only. collections.deque(maxlen=N). Process restart wipes. By design.
METADATA-only audit emits. observation_tick carries lengths, counts, content_type tags but NEVER raw window titles, OCR text, clipboard content, or file paths. Audit log is for operators, not for re-deriving screen state.
Cloud-transport injection gating per §X above.
No new system permissions. Reads via existing skills (active_window, screenshot_text) and existing primitives (pbpaste, getmtime, Quartz). No new install-time prompts.

Q5.7 forward-compat API for Steps 6 + 7

Step 5 ships the API surface Steps 6 and 7 will need (per Q5.7 recommendation — locks the contract now, prevents Step 5→6 churn later):

get_global_buffer() -> RingBuffer — Step 6 (Triggers) reads .snapshot() for trigger evaluation
persist_for_shift_report() -> Path | None — Step 7 (Shift Report) calls at assembly time; writes summary to ~/.codec/observation_summaries/<ts>.md (the ONLY persistent observer output)

Cost: ~+30 LOC in codec_observer.py. Worth it.

PWA debug endpoint (Q5.6)

GET /api/observer/buffer?debug=1 (auth-gated by existing dashboard middleware):

Returns metadata-only summary, NOT raw entries (raw contains titles + OCR + clipboard content — too sensitive even for authed callers)
Every call emits observer_buffer_inspected audit event with client_ip + buffer_entries_returned
NOT linked from main UI; only for explicit debugging

Kill switch

OBSERVER_ENABLED=false env var → polling AND injection both no-op. No separate injection kill switch — buffer is always populated when enabled, only injection is gated by §X.

Out of scope (explicit, per intake contract)

No Apple Reminders / Notes / Calendar entries created anywhere in this PR.
No osascript for user-facing UI — observer's osascript usage is bounded to System Events (active window read) and Vision (OCR), both backend probes.
_HTTP_BLOCKED not touched.
Step 6 + Step 7 are not in this PR — Phase 2 ships per-step. Step 6 (Triggers) and Step 7 (Shift Report) will follow as separate PRs after this lands and burns in.

After merge

cp plugin pattern from Phase 1 Step 4 doesn't apply here — this is a PM2 service, not a hook plugin. Just pm2 start ecosystem.config.js --only codec-observer after merge + pull.
pm2 restart codec-dashboard codec-voice so the chat/voice integration loads codec_observer.
Watch ~/.codec/audit.log for first observation_tick (per-poll) and observation_summary_injected (gated inject) — both should appear within 60s of activity.
Per intake: NO Apple Reminders for any post-merge watch. If a watch is needed, launchd plist or skip entirely — your call.

🤖 Generated with Claude Code

Captures the user-supplied Phase 2 plan as a version-controlled design doc, with the Q1-Q6 reviewer resolutions locked at intake. Source: planning session in claude-chat (separate from this Claude Code session); user handed me the blueprint with explicit Q5 override. Three steps: Step 5 — Continuous Observation Loop (codec_observer.py + new PM2 service, ~400 LOC + 30 tests) Step 6 — Trigger System (codec_triggers.py + skill_registry extension, ~300 LOC + 35 tests) Step 7 — Shift Report Crew (built-in crew + assembly skill, ~350 LOC + 20 tests) Locked decisions (§Y resolution log): Q1 Observer cadence: 60s active, 5min idle (CGEventSource) Q2 Ring buffer depth: 10 minutes Q3 Trigger storage: SKILL_OBSERVATION_TRIGGER in skill files Q4 Idle definition: no keyboard/mouse for 30 min Q5 Prompt injection: GATED (§X "Observation injection contract") Q6 Per-app blocklist: none §X Observation injection contract — Q5 override folded in: - Always inject for transport="local" (Qwen, free + private) - For cloud transports: gate on cheap text patterns (possessive-without-context / continuation language / skill-flag SKILL_NEEDS_OBSERVATION=True) - Never inject for transport="mcp" (the LLM client decides its own context) - Skipped injection emits NO audit line; injection emits observation_summary_injected with reason / tokens / transport 7 new audit events extending Step 1 §1.2 schema:1: observation_tick, observation_summary_injected trigger_evaluated, trigger_fired, trigger_blocked shift_report_started, shift_report_completed 3 independent kill switches: OBSERVER_ENABLED, TRIGGERS_ENABLED, SHIFT_REPORT_ENABLED Diff inventory: ~+2,345 (functional + tests). Sequencing: same as Phase 1 — design → review → implement → pre-merge audit → merge → sign off, per step. NO Apple Reminders for any monitoring (per the 2026-05-01 incident contract). Watches run as launchd plists writing silently to file or skipped entirely — user decides per step. Step 5 detailed design follows in docs/PHASE2-STEP5-DESIGN.md (next commit). Steps 6+7 detailed designs deferred until Step 5 implementation lands.

…RingBuffer) Detailed design for Phase 2 Step 5 following the §0-§10 pattern of Phase 1 design docs. Resolves the locked Q1-Q6 reviewer decisions into concrete spec; surfaces 7 new open questions (Q5.1-Q5.7) that need answering before implementation begins. Structure: §0 Why this exists — observer is the foundation Steps 6 + 7 build on §1 Design — PM2 service spec, ring buffer schema, polling primitives, idle classifier, injection contract, summary rendering, buffer reset, configuration §2 Implementation outline — file map, module API, integration points in chat/voice handlers §3 Audit envelope additions — observation_tick + observation_ summary_injected with metadata-only privacy contract §7 Test plan — 30 tests across 5 sub-areas (ring buffer / poll primitives / cadence / injection / kill switch) §8 Rollback plan — three layers (env-var / config / hard revert) §9 Open questions for reviewer (7 items, recommendations included) §10 Diff inventory — ~+880 LOC vs Phase 1 step sizes Key spec decisions folded in: - Per-poll budget refined to <150ms p95 (OCR is the heavy step, blueprint's <50ms was unrealistic — owns this transparently) - Injection contract decision tree fully specified including transport="mcp" → never-inject (MCP clients bring their own context) - Observation_tick audit emit is METADATA-ONLY — no titles, no OCR text, no clipboard content, no file paths. Audit log is for operators, not for re-deriving screen state. Privacy by default at the audit layer too. - Step 6/7 forward-compat API committed in Step 5 (per §9 Q5.7 recommendation) — RingBuffer.snapshot() + render_summary() + persist_for_shift_report() shipped now even though only injection path exercised. 7 open questions for reviewer: Q5.1 OCR retry-on-timeout? Q5.2 Clipboard image handling — redact or OCR? Q5.3 Stop-noun list for possessive-without-context regex Q5.4 Injection-reason audit cardinality — fold into audit_report? Q5.5 Cadence-degradation strategy under poll-overrun load Q5.6 PWA buffer-inspect endpoint — debug-only or always? Q5.7 Step 5 ships full Step 6/7 forward-compat API now? Each with my recommendation + 1-line rationale. Reviewer can approve all-recommended, override any, or push back. Stop point per intake: design phase only, no code. Surfacing §9 open questions for review. Implementation begins after sign-off.

…ary_injected (a) Adds 4 new event-name constants to codec_audit.py for the Continuous Observation Loop introduced in Phase 2 Step 5: OBSERVATION_TICK — every poll cycle (info) OBSERVATION_TICK_SLOW — poll exceeded budget (warning, Q5.5) OBSERVATION_SUMMARY_INJECTED — gated injection fired (info, inherits cid) OBSERVER_BUFFER_INSPECTED — PWA debug-gated buffer read (Q5.6) Plus PHASE2_STEP5_EVENTS frozenset for analyzer / introspection. Plus 3 documentation tuples for extra-namespace fields: OBSERVATION_TICK_EXTRA_FIELDS — METADATA-ONLY (no titles, no OCR content, no clipboard text, no file paths) per design §3 "What we deliberately do NOT emit" OBSERVATION_INJECTION_EXTRA_FIELDS — tokens_used / injection_reason / buffer_entries_summarized OBSERVER_BUFFER_INSPECT_EXTRA_FIELDS — client_ip / buffer_entries_returned No behavior change. Just constants. Step 5 module + tests in next commits.

…tract (b) Phase 2 Step 5 main module (~525 LOC). Implements the Continuous Observation Loop per docs/PHASE2-STEP5-DESIGN.md. Components: RingBuffer - bounded deque, threadsafe append/snapshot/clear/__len__ - render_summary(max_tokens=200) — most-recent-first, char-capped - render_summary truncates middle when over budget poll(buffer=None, cfg=None, emit_audit=True) - one poll cycle: active_window + screenshot OCR + clipboard delta + recent_files + idle_seconds - bypasses run_with_hooks (observer plumbing, not user-driven tool calls — avoids Step 4 plugin self-recursion) - emit_audit=False for tests - emits observation_tick (or observation_tick_slow if duration > poll_slow_threshold_ms — Q5.5) - METADATA-ONLY audit emit (lengths, counts, content_type tags; NO titles, NO OCR text, NO clipboard content, NO file paths) Polling primitives (each independently monkeypatchable for tests): _get_active_window — osascript via System Events _get_clipboard_now — pbpaste subprocess _classify_clipboard_kind — url/json/code/text/image_blob_redacted (Q5.2: image clipboards always redacted) _get_screenshot_ocr — screencapture + Vision OCR with Q5.1 retry (single retry at 200ms after first 100ms timeout) _get_recent_files — getmtime walk over Documents/Downloads/Desktop/ codec-repo, 5min window, max 5 entries _idle_seconds — Quartz CGEventSourceSecondsSinceLastEventType; returns 0.0 on non-mac (graceful degrade) maybe_inject_observation_summary(prompt, transport, skill_name=None, skill_module=None) -> (summary|None, reason) - implements §X observation injection contract (Q5 override) - transport="local" → always inject (cheap + private) - transport="mcp" → never inject (MCP client brings own context) - transport in {"chat","voice","http"} → §X.1 pattern gate: * possessive-without-context regex with stop-noun filter (Q5.3) * continuation language regex * SKILL_NEEDS_OBSERVATION skill flag (highest priority among gated) - emits observation_summary_injected ONLY when summary non-None; skipped paths are silent (no audit-log spam) Q5.7 forward-compat API for Steps 6 + 7: get_global_buffer() - Step 6 reads buffer.snapshot() for trigger evaluation persist_for_shift_report() - Step 7 calls at shift-report assembly; writes summary to ~/.codec/observation_summaries/<ts>.md (the ONLY persistent observer output) run_daemon() - PM2 entry; forever loop with config reload each iteration - kill switch checked at start AND every iteration (env var OBSERVER_ENABLED) — disables poll WITHOUT requiring restart - long-idle buffer reset (config-flagged, default true): wipe buffer when idle > 1800s - cadence: 60s active / 300s idle per Q1 Privacy contract (4 layers): 1. RAM only (deque, process restart wipes) 2. METADATA-ONLY audit emits 3. Cloud-transport injection gating (§X) 4. NO new system permissions (existing skills/primitives only) Module import has NO side effects — no thread spawn, no poll fire, no audit emit, no buffer init. PM2 entry calls run_daemon() explicitly. Buffer is lazy-init on first get_or_init_buffer() call. Quartz import handled gracefully — non-mac CI env returns idle=0 (stays in active cadence) so tests don't break on linux runners. Verified: import codec_observer succeeds on the worktree. 14 names exported in __all__. Tests follow in next commit.

Covers everything in design §7: §7.1 Ring buffer (6): - append under capacity / wraparound drops oldest / snapshot is copy - render_summary under token cap (≤ 200*4 chars) - render_summary middle-truncates when over budget (50-token tiny case) - render_summary includes recency markers ("12s ago") §7.2 Polling primitives (6): - poll() writes snapshot to buffer - poll() emits observation_tick with METADATA-ONLY fields (verified by serializing extra and asserting content strings absent — title, OCR text, clipboard content, file paths all redacted from emit) - clipboard delta only emits when sha1 changes (Q5.2 redaction verified via _classify_clipboard_kind) - OCR timeout sets ocr_skipped=True in both snapshot AND audit emit - recent_files counts surface in audit but paths NEVER leak - _classify_clipboard_kind covers url/json/code/text/empty/ image_blob_redacted §7.3 Idle classifier + cadence (4): - idle < 60s → cadence_used_s=60 (active) - idle ≥ 60s → cadence_used_s=300 (idle) - User config overrides default cadence values - _idle_seconds returns 0.0 when Quartz unavailable (graceful degrade on non-mac CI runners) §7.4 Injection contract §X (10): - transport="local" always injects (reason=always_local) - transport="mcp" never injects, never emits - "my Stripe balance" → possessive_match - "this PR" → possessive_match - "this question" → skipped (question is in stop-noun list) - "continue the email" → continuation_match - "where was I" → continuation_match - SKILL_NEEDS_OBSERVATION=True overrides patterns → skill_flag - skipped path emits ZERO audit events (no observation_summary_injected) - successful inject emits with extra.{injection_reason, tokens_used, buffer_entries_summarized} + top-level transport (transport is _RESERVED_TOP — stripped from extra and set at top level) §7.5 Kill switch + integration (4): - OBSERVER_ENABLED=false → reason="skipped_disabled" - default (env unset) → enabled=True - all off-aliases (false/0/no/off/FALSE/Off) → enabled=False - empty buffer → reason="skipped_empty_buffer", no audit emit Test isolation: - codec_audit._AUDIT_LOG redirected to tmp_path (no real audit writes) - All polling primitives monkeypatched (NO osascript, NO subprocess, NO Quartz, NO screencapture) — tests pass on any platform - _GLOBAL_BUFFER reset to None per test via fixture - NO Apple Reminders / Notes / Calendar entries created during run Result: 30 passed in 0.18s. Zero side effects: pending_questions: 0 Apple Reminders: 0 /tmp/codec_*.txt: 0 ~/.codec/observation_summaries: 0 One failure-then-fix cycle: transport is _RESERVED_TOP (Phase 1 Step 3 gotcha rediscovered) — codec_audit.audit() strips it from extra dict and sets at top level. Test asserts on emits[0]["transport"], not on extra["transport"]. Same pattern as the stuck_warning tool/agent fields.

…ffer-inspect (d) Three integration points for Phase 2 Step 5: 1. codec_dashboard.py chat_completion (~+18 LOC): - Just before messages.insert(0, system) the handler calls maybe_inject_observation_summary(prompt, transport, ...) - Transport detection: "local" if config.llm_base_url contains "localhost", else "chat" (cloud-routed). The §X gating then applies as designed. - Helper handles its own audit emit + decides whether to inject. Caller just appends the returned string to sys_prompt. - try/except wraps the whole thing — observer failure is non-fatal, logs to debug + chat continues without injection. 2. codec_voice.py generate_response (~+18 LOC): - After the memory-context injection block, conditionally appends the observer summary to the in-place system message. - Transport: "voice" if VISION_PROVIDER=="gemini" (cloud), else "local". Local-Qwen voice always injects (cheap + private); cloud-routed voice gates per §X. - Same try/except non-fatal pattern. 3. codec_dashboard.py @app.get("/api/observer/buffer") (~+40 LOC): - Q5.6 debug-gated buffer-inspect endpoint - Requires `?debug=1` query param OR returns a hint message - Auth-gated (dashboard's existing /api/* middleware) - Returns metadata + redacted summary, NOT the raw entries (raw contains titles, OCR text, clipboard content — too sensitive even for the auth'd user inspector) - Every call emits observer_buffer_inspected audit event with client_ip + buffer_entries_returned per the design doc Both integration points are SAFE on import-time failure of codec_observer (lazy import inside the handler; bare except wraps the call so a missing module doesn't break chat/voice). Verified: - codec_dashboard.py + codec_voice.py both AST-parse - tests/test_observer.py 30/30 still passing - No new test failures introduced (observer module changes only; no chat/voice handler tests need updating since the helper is self-contained and observe-only) Net diff: codec_dashboard.py +59 LOC (chat injection + buffer endpoint) codec_voice.py +18 LOC (voice injection)

ecosystem.config.js — new codec-observer service entry: - script: /usr/local/bin/python3.13 - args: -u codec_observer.py - max_memory_restart: 128M (per design budget) - env: { OBSERVER_ENABLED: "true" } — kill switch baseline - autorestart: true, restart_delay: 5000, max_restarts: 10 AGENTS.md updates: §3 — new "Continuous Observation Loop (Phase 2 Step 5)" sub-section: - Architecture summary (4 polled signals, RAM ring buffer) - §X injection contract decision tree (local always / mcp never / cloud-gated) - Privacy contract (4 layers: RAM-only, METADATA-only audit, cloud-gating, no new permissions) - Cadence (60s active / 5min idle / 30min long-idle reset) - Kill switch - 4 audit events - Forward-compat API for Steps 6 + 7 §6 — new "Phase 2 Step 5 audit events" table: - observation_tick / observation_tick_slow / observation_summary_ injected / observer_buffer_inspected - METADATA-only emit policy documented - PHASE2_STEP5_EVENTS frozenset reference §10 — new don't-touch entries: - ~/.codec/observation_summaries/ (Phase 2 Step 5 only-write target, populated by persist_for_shift_report) - OBSERVER_ENABLED env var - ~/.codec/config.json:observer.{...} tunables (don't go below 30s cadence without considering OCR cost) Total Step 5 PR diff (a-e): codec_audit.py ~+49 (event constants + extra-field docs) codec_observer.py ~+810 (new file: RingBuffer + poll + injection + run_daemon + Q5.7 API) tests/test_observer.py ~+494 (30 tests) codec_dashboard.py ~+59 (chat injection + buffer-inspect endpoint) codec_voice.py ~+18 (voice injection) ecosystem.config.js ~+18 (PM2 entry) AGENTS.md ~+44 (§3 + §6 + §10) docs/PHASE2-BLUEPRINT.md ~+257 (committed earlier on this branch) docs/PHASE2-STEP5-DESIGN.md ~+473 (committed earlier on this branch) ──────────────────────── ───── Total ~+2,222 (functional + tests + docs) Note: this is larger than the design's stated ~+880 budget because it includes the design docs themselves which the budget didn't count. Functional + tests alone: ~+1,448, in line with Phase 1 Step 2's ~+1,540 baseline.

…(Steps 5/6/7) Phase 2 (Observer + Triggers + Shift Report) merged and production-stable: - Step 5 (PR #9 824a52f) + hotfix PR #10 (26e6add) — `observer.ocr_enabled` flag - Step 6 (PR #11 2d2ff3f) — Trigger System (matcher + cooldown + consent) - Step 7 (PR #12 0e40687) — end-of-day shift report Net: +91 passing tests (823/20/73), 0 new failures, 0 new skips. Live audit proof captured: shift_report_started+_completed paired emits at 2026-05-02T18:49:40Z with shared cid=5f188e5485e5.

Mikarina13 added 7 commits May 1, 2026 22:55

AVADSA25 merged commit 824a52f into main May 2, 2026
1 check passed

This was referenced May 2, 2026

feat(triggers): Trigger System (Phase 2 Step 6) #11

Merged

feat(shift_report): end-of-day shift report (Phase 2 Step 7) #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observer): continuous observation loop (Phase 2 Step 5)#9

feat(observer): continuous observation loop (Phase 2 Step 5)#9
AVADSA25 merged 7 commits intomainfrom
phase2-step5-observer

AVADSA25 commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 1, 2026

Summary

Commits (a-e)

§X Observation injection contract (Q5 override)

Test plan

Privacy contract — 4 layers

Q5.7 forward-compat API for Steps 6 + 7

PWA debug endpoint (Q5.6)

Kill switch

Out of scope (explicit, per intake contract)

After merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants