Skip to content

feat(observer): continuous observation loop (Phase 2 Step 5)#9

Merged
AVADSA25 merged 7 commits intomainfrom
phase2-step5-observer
May 2, 2026
Merged

feat(observer): continuous observation loop (Phase 2 Step 5)#9
AVADSA25 merged 7 commits intomainfrom
phase2-step5-observer

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented May 1, 2026

Summary

Phase 2 Step 5 — turns CODEC from "AI assistant that responds to commands" into "AI colleague that observes context." A background PM2 service polls four cheap signals (active window, screenshot OCR, clipboard delta, recent files) into a 10-min RAM-only ring buffer; chat + voice handlers conditionally prepend a ≤200-token summary to the LLM's system prompt, gated per the §X "Observation injection contract" so we don't leak screen content to cloud LLMs on every turn.

Implements docs/PHASE2-STEP5-DESIGN.md as designed. Locked Q1-Q6 user resolutions (60s/5min cadence, 10-min buffer, declarative skill triggers, CGEventSource idle, gated injection, no per-app blocklist) honored verbatim. Q5.1-Q5.7 design-phase open questions all resolved with my recommendations (already approved before implementation began).

Commits (a-e)

sha what
eac65c9 docs/PHASE2-BLUEPRINT.md — Phase 2 plan with Q5 override folded in as §X
cf48384 docs/PHASE2-STEP5-DESIGN.md — Step 5 design doc (§0-§10 + §9 open questions)
d799e61 codec_audit.py — 4 new event constants + PHASE2_STEP5_EVENTS frozenset + 3 extra-field docs
02b87cb codec_observer.py NEW — RingBuffer + poll + injection helper + run_daemon + Q5.7 forward-compat API
09d8bb3 tests/test_observer.py NEW — 30 tests, 100% passing in 0.18s
62d6a4e codec_dashboard.py chat handler + codec_voice.py voice handler integration + PWA /api/observer/buffer?debug=1 (Q5.6)
38b398e ecosystem.config.js PM2 entry + AGENTS.md §3+§6+§10 updates

§X Observation injection contract (Q5 override)

Cheap text-pattern gating, no relevance model:

transport == "local"   → INJECT (always)         reason="always_local"
transport == "mcp"     → SKIP (client owns it)   reason="skipped_no_match"
transport ∈ {"chat",   ┌─ possessive_match*  → INJECT  reason="possessive_match"
            "voice",   ├─ continuation_match → INJECT  reason="continuation_match"
            "http"}    ├─ skill_flag         → INJECT  reason="skill_flag"
                       └─ otherwise          → SKIP    reason="skipped_no_match"

* Possessive uses (my|this|that|these|those|the) <noun> filtered against a 25-word stop-noun list (question, time, thing, etc.) — ~/.codec/config.json:observer.stop_nouns overrides.

Test plan

  • pytest tests/test_observer.py30 passed in 0.18s
  • All polling primitives mocked (osascript / pbpaste / Quartz / screencapture) — tests pass on any platform, NO Apple state, NO Terminal popups
  • METADATA-only audit-emit verified by serializing extra dict and asserting raw content strings absent
  • Per Step 4 contract: did NOT run full pytest suite (avoids destructive-skill cascade)
  • Post-test state files clean: pending_questions=0, Apple Reminders=0, /tmp/codec_*.txt=0, ~/.codec/observation_summaries=0

Privacy contract — 4 layers

  1. RAM only. collections.deque(maxlen=N). Process restart wipes. By design.
  2. METADATA-only audit emits. observation_tick carries lengths, counts, content_type tags but NEVER raw window titles, OCR text, clipboard content, or file paths. Audit log is for operators, not for re-deriving screen state.
  3. Cloud-transport injection gating per §X above.
  4. No new system permissions. Reads via existing skills (active_window, screenshot_text) and existing primitives (pbpaste, getmtime, Quartz). No new install-time prompts.

Q5.7 forward-compat API for Steps 6 + 7

Step 5 ships the API surface Steps 6 and 7 will need (per Q5.7 recommendation — locks the contract now, prevents Step 5→6 churn later):

  • get_global_buffer() -> RingBuffer — Step 6 (Triggers) reads .snapshot() for trigger evaluation
  • persist_for_shift_report() -> Path | None — Step 7 (Shift Report) calls at assembly time; writes summary to ~/.codec/observation_summaries/<ts>.md (the ONLY persistent observer output)

Cost: ~+30 LOC in codec_observer.py. Worth it.

PWA debug endpoint (Q5.6)

GET /api/observer/buffer?debug=1 (auth-gated by existing dashboard middleware):

  • Returns metadata-only summary, NOT raw entries (raw contains titles + OCR + clipboard content — too sensitive even for authed callers)
  • Every call emits observer_buffer_inspected audit event with client_ip + buffer_entries_returned
  • NOT linked from main UI; only for explicit debugging

Kill switch

OBSERVER_ENABLED=false env var → polling AND injection both no-op. No separate injection kill switch — buffer is always populated when enabled, only injection is gated by §X.

Out of scope (explicit, per intake contract)

  • No Apple Reminders / Notes / Calendar entries created anywhere in this PR.
  • No osascript for user-facing UI — observer's osascript usage is bounded to System Events (active window read) and Vision (OCR), both backend probes.
  • _HTTP_BLOCKED not touched.
  • Step 6 + Step 7 are not in this PR — Phase 2 ships per-step. Step 6 (Triggers) and Step 7 (Shift Report) will follow as separate PRs after this lands and burns in.

After merge

  1. cp plugin pattern from Phase 1 Step 4 doesn't apply here — this is a PM2 service, not a hook plugin. Just pm2 start ecosystem.config.js --only codec-observer after merge + pull.
  2. pm2 restart codec-dashboard codec-voice so the chat/voice integration loads codec_observer.
  3. Watch ~/.codec/audit.log for first observation_tick (per-poll) and observation_summary_injected (gated inject) — both should appear within 60s of activity.
  4. Per intake: NO Apple Reminders for any post-merge watch. If a watch is needed, launchd plist or skip entirely — your call.

🤖 Generated with Claude Code

Mikarina13 added 7 commits May 1, 2026 22:55
Captures the user-supplied Phase 2 plan as a version-controlled
design doc, with the Q1-Q6 reviewer resolutions locked at intake.
Source: planning session in claude-chat (separate from this Claude
Code session); user handed me the blueprint with explicit Q5
override.

Three steps:
  Step 5 — Continuous Observation Loop (codec_observer.py + new
           PM2 service, ~400 LOC + 30 tests)
  Step 6 — Trigger System (codec_triggers.py + skill_registry
           extension, ~300 LOC + 35 tests)
  Step 7 — Shift Report Crew (built-in crew + assembly skill,
           ~350 LOC + 20 tests)

Locked decisions (§Y resolution log):
  Q1 Observer cadence:   60s active, 5min idle (CGEventSource)
  Q2 Ring buffer depth:  10 minutes
  Q3 Trigger storage:    SKILL_OBSERVATION_TRIGGER in skill files
  Q4 Idle definition:    no keyboard/mouse for 30 min
  Q5 Prompt injection:   GATED (§X "Observation injection contract")
  Q6 Per-app blocklist:  none

§X Observation injection contract — Q5 override folded in:
  - Always inject for transport="local" (Qwen, free + private)
  - For cloud transports: gate on cheap text patterns
    (possessive-without-context / continuation language /
    skill-flag SKILL_NEEDS_OBSERVATION=True)
  - Never inject for transport="mcp" (the LLM client decides
    its own context)
  - Skipped injection emits NO audit line; injection emits
    observation_summary_injected with reason / tokens / transport

7 new audit events extending Step 1 §1.2 schema:1:
  observation_tick, observation_summary_injected
  trigger_evaluated, trigger_fired, trigger_blocked
  shift_report_started, shift_report_completed

3 independent kill switches:
  OBSERVER_ENABLED, TRIGGERS_ENABLED, SHIFT_REPORT_ENABLED

Diff inventory: ~+2,345 (functional + tests).

Sequencing: same as Phase 1 — design → review → implement →
pre-merge audit → merge → sign off, per step. NO Apple
Reminders for any monitoring (per the 2026-05-01 incident
contract). Watches run as launchd plists writing silently to
file or skipped entirely — user decides per step.

Step 5 detailed design follows in docs/PHASE2-STEP5-DESIGN.md
(next commit). Steps 6+7 detailed designs deferred until Step 5
implementation lands.
…RingBuffer)

Detailed design for Phase 2 Step 5 following the §0-§10 pattern of
Phase 1 design docs. Resolves the locked Q1-Q6 reviewer decisions
into concrete spec; surfaces 7 new open questions (Q5.1-Q5.7) that
need answering before implementation begins.

Structure:
  §0  Why this exists — observer is the foundation Steps 6 + 7
      build on
  §1  Design — PM2 service spec, ring buffer schema, polling
      primitives, idle classifier, injection contract, summary
      rendering, buffer reset, configuration
  §2  Implementation outline — file map, module API, integration
      points in chat/voice handlers
  §3  Audit envelope additions — observation_tick + observation_
      summary_injected with metadata-only privacy contract
  §7  Test plan — 30 tests across 5 sub-areas (ring buffer / poll
      primitives / cadence / injection / kill switch)
  §8  Rollback plan — three layers (env-var / config / hard revert)
  §9  Open questions for reviewer (7 items, recommendations included)
  §10 Diff inventory — ~+880 LOC vs Phase 1 step sizes

Key spec decisions folded in:
  - Per-poll budget refined to <150ms p95 (OCR is the heavy step,
    blueprint's <50ms was unrealistic — owns this transparently)
  - Injection contract decision tree fully specified including
    transport="mcp" → never-inject (MCP clients bring their own
    context)
  - Observation_tick audit emit is METADATA-ONLY — no titles,
    no OCR text, no clipboard content, no file paths. Audit log
    is for operators, not for re-deriving screen state. Privacy
    by default at the audit layer too.
  - Step 6/7 forward-compat API committed in Step 5 (per §9 Q5.7
    recommendation) — RingBuffer.snapshot() + render_summary()
    + persist_for_shift_report() shipped now even though only
    injection path exercised.

7 open questions for reviewer:
  Q5.1 OCR retry-on-timeout?
  Q5.2 Clipboard image handling — redact or OCR?
  Q5.3 Stop-noun list for possessive-without-context regex
  Q5.4 Injection-reason audit cardinality — fold into audit_report?
  Q5.5 Cadence-degradation strategy under poll-overrun load
  Q5.6 PWA buffer-inspect endpoint — debug-only or always?
  Q5.7 Step 5 ships full Step 6/7 forward-compat API now?

Each with my recommendation + 1-line rationale. Reviewer can
approve all-recommended, override any, or push back.

Stop point per intake: design phase only, no code. Surfacing
§9 open questions for review. Implementation begins after
sign-off.
…ary_injected (a)

Adds 4 new event-name constants to codec_audit.py for the
Continuous Observation Loop introduced in Phase 2 Step 5:

  OBSERVATION_TICK              — every poll cycle (info)
  OBSERVATION_TICK_SLOW         — poll exceeded budget (warning, Q5.5)
  OBSERVATION_SUMMARY_INJECTED  — gated injection fired (info, inherits cid)
  OBSERVER_BUFFER_INSPECTED     — PWA debug-gated buffer read (Q5.6)

Plus PHASE2_STEP5_EVENTS frozenset for analyzer / introspection.

Plus 3 documentation tuples for extra-namespace fields:
  OBSERVATION_TICK_EXTRA_FIELDS         — METADATA-ONLY (no titles, no OCR
                                          content, no clipboard text, no
                                          file paths) per design §3 "What
                                          we deliberately do NOT emit"
  OBSERVATION_INJECTION_EXTRA_FIELDS    — tokens_used / injection_reason /
                                          buffer_entries_summarized
  OBSERVER_BUFFER_INSPECT_EXTRA_FIELDS  — client_ip / buffer_entries_returned

No behavior change. Just constants. Step 5 module + tests in next commits.
…tract (b)

Phase 2 Step 5 main module (~525 LOC). Implements the Continuous
Observation Loop per docs/PHASE2-STEP5-DESIGN.md.

Components:

RingBuffer
  - bounded deque, threadsafe append/snapshot/clear/__len__
  - render_summary(max_tokens=200) — most-recent-first, char-capped
  - render_summary truncates middle when over budget

poll(buffer=None, cfg=None, emit_audit=True)
  - one poll cycle: active_window + screenshot OCR + clipboard delta
    + recent_files + idle_seconds
  - bypasses run_with_hooks (observer plumbing, not user-driven tool
    calls — avoids Step 4 plugin self-recursion)
  - emit_audit=False for tests
  - emits observation_tick (or observation_tick_slow if
    duration > poll_slow_threshold_ms — Q5.5)
  - METADATA-ONLY audit emit (lengths, counts, content_type tags;
    NO titles, NO OCR text, NO clipboard content, NO file paths)

Polling primitives (each independently monkeypatchable for tests):
  _get_active_window  — osascript via System Events
  _get_clipboard_now  — pbpaste subprocess
  _classify_clipboard_kind — url/json/code/text/image_blob_redacted
                              (Q5.2: image clipboards always redacted)
  _get_screenshot_ocr — screencapture + Vision OCR with Q5.1 retry
                        (single retry at 200ms after first 100ms timeout)
  _get_recent_files   — getmtime walk over Documents/Downloads/Desktop/
                        codec-repo, 5min window, max 5 entries
  _idle_seconds       — Quartz CGEventSourceSecondsSinceLastEventType;
                        returns 0.0 on non-mac (graceful degrade)

maybe_inject_observation_summary(prompt, transport, skill_name=None,
                                  skill_module=None) -> (summary|None, reason)
  - implements §X observation injection contract (Q5 override)
  - transport="local" → always inject (cheap + private)
  - transport="mcp" → never inject (MCP client brings own context)
  - transport in {"chat","voice","http"} → §X.1 pattern gate:
    * possessive-without-context regex with stop-noun filter (Q5.3)
    * continuation language regex
    * SKILL_NEEDS_OBSERVATION skill flag (highest priority among gated)
  - emits observation_summary_injected ONLY when summary non-None;
    skipped paths are silent (no audit-log spam)

Q5.7 forward-compat API for Steps 6 + 7:
  get_global_buffer()        - Step 6 reads buffer.snapshot() for
                                trigger evaluation
  persist_for_shift_report() - Step 7 calls at shift-report assembly;
                                writes summary to
                                ~/.codec/observation_summaries/<ts>.md
                                (the ONLY persistent observer output)

run_daemon()
  - PM2 entry; forever loop with config reload each iteration
  - kill switch checked at start AND every iteration (env var
    OBSERVER_ENABLED) — disables poll WITHOUT requiring restart
  - long-idle buffer reset (config-flagged, default true): wipe
    buffer when idle > 1800s
  - cadence: 60s active / 300s idle per Q1

Privacy contract (4 layers):
  1. RAM only (deque, process restart wipes)
  2. METADATA-ONLY audit emits
  3. Cloud-transport injection gating (§X)
  4. NO new system permissions (existing skills/primitives only)

Module import has NO side effects — no thread spawn, no poll fire,
no audit emit, no buffer init. PM2 entry calls run_daemon() explicitly.
Buffer is lazy-init on first get_or_init_buffer() call.

Quartz import handled gracefully — non-mac CI env returns idle=0
(stays in active cadence) so tests don't break on linux runners.

Verified: import codec_observer succeeds on the worktree. 14 names
exported in __all__. Tests follow in next commit.
Covers everything in design §7:

§7.1 Ring buffer (6):
  - append under capacity / wraparound drops oldest / snapshot is copy
  - render_summary under token cap (≤ 200*4 chars)
  - render_summary middle-truncates when over budget (50-token tiny case)
  - render_summary includes recency markers ("12s ago")

§7.2 Polling primitives (6):
  - poll() writes snapshot to buffer
  - poll() emits observation_tick with METADATA-ONLY fields (verified
    by serializing extra and asserting content strings absent — title,
    OCR text, clipboard content, file paths all redacted from emit)
  - clipboard delta only emits when sha1 changes (Q5.2 redaction
    verified via _classify_clipboard_kind)
  - OCR timeout sets ocr_skipped=True in both snapshot AND audit emit
  - recent_files counts surface in audit but paths NEVER leak
  - _classify_clipboard_kind covers url/json/code/text/empty/
    image_blob_redacted

§7.3 Idle classifier + cadence (4):
  - idle < 60s → cadence_used_s=60 (active)
  - idle ≥ 60s → cadence_used_s=300 (idle)
  - User config overrides default cadence values
  - _idle_seconds returns 0.0 when Quartz unavailable (graceful degrade
    on non-mac CI runners)

§7.4 Injection contract §X (10):
  - transport="local" always injects (reason=always_local)
  - transport="mcp" never injects, never emits
  - "my Stripe balance" → possessive_match
  - "this PR" → possessive_match
  - "this question" → skipped (question is in stop-noun list)
  - "continue the email" → continuation_match
  - "where was I" → continuation_match
  - SKILL_NEEDS_OBSERVATION=True overrides patterns → skill_flag
  - skipped path emits ZERO audit events (no observation_summary_injected)
  - successful inject emits with extra.{injection_reason, tokens_used,
    buffer_entries_summarized} + top-level transport (transport is
    _RESERVED_TOP — stripped from extra and set at top level)

§7.5 Kill switch + integration (4):
  - OBSERVER_ENABLED=false → reason="skipped_disabled"
  - default (env unset) → enabled=True
  - all off-aliases (false/0/no/off/FALSE/Off) → enabled=False
  - empty buffer → reason="skipped_empty_buffer", no audit emit

Test isolation:
  - codec_audit._AUDIT_LOG redirected to tmp_path (no real audit writes)
  - All polling primitives monkeypatched (NO osascript, NO subprocess,
    NO Quartz, NO screencapture) — tests pass on any platform
  - _GLOBAL_BUFFER reset to None per test via fixture
  - NO Apple Reminders / Notes / Calendar entries created during run

Result: 30 passed in 0.18s. Zero side effects:
  pending_questions: 0
  Apple Reminders:   0
  /tmp/codec_*.txt:  0
  ~/.codec/observation_summaries: 0

One failure-then-fix cycle: transport is _RESERVED_TOP (Phase 1 Step 3
gotcha rediscovered) — codec_audit.audit() strips it from extra dict
and sets at top level. Test asserts on emits[0]["transport"], not on
extra["transport"]. Same pattern as the stuck_warning tool/agent fields.
…ffer-inspect (d)

Three integration points for Phase 2 Step 5:

1. codec_dashboard.py chat_completion (~+18 LOC):
   - Just before messages.insert(0, system) the handler calls
     maybe_inject_observation_summary(prompt, transport, ...)
   - Transport detection: "local" if config.llm_base_url contains
     "localhost", else "chat" (cloud-routed). The §X gating then
     applies as designed.
   - Helper handles its own audit emit + decides whether to inject.
     Caller just appends the returned string to sys_prompt.
   - try/except wraps the whole thing — observer failure is non-fatal,
     logs to debug + chat continues without injection.

2. codec_voice.py generate_response (~+18 LOC):
   - After the memory-context injection block, conditionally appends
     the observer summary to the in-place system message.
   - Transport: "voice" if VISION_PROVIDER=="gemini" (cloud), else
     "local". Local-Qwen voice always injects (cheap + private);
     cloud-routed voice gates per §X.
   - Same try/except non-fatal pattern.

3. codec_dashboard.py @app.get("/api/observer/buffer") (~+40 LOC):
   - Q5.6 debug-gated buffer-inspect endpoint
   - Requires `?debug=1` query param OR returns a hint message
   - Auth-gated (dashboard's existing /api/* middleware)
   - Returns metadata + redacted summary, NOT the raw entries (raw
     contains titles, OCR text, clipboard content — too sensitive
     even for the auth'd user inspector)
   - Every call emits observer_buffer_inspected audit event with
     client_ip + buffer_entries_returned per the design doc

Both integration points are SAFE on import-time failure of
codec_observer (lazy import inside the handler; bare except wraps
the call so a missing module doesn't break chat/voice).

Verified:
  - codec_dashboard.py + codec_voice.py both AST-parse
  - tests/test_observer.py 30/30 still passing
  - No new test failures introduced (observer module changes only;
    no chat/voice handler tests need updating since the helper is
    self-contained and observe-only)

Net diff:
  codec_dashboard.py  +59 LOC (chat injection + buffer endpoint)
  codec_voice.py      +18 LOC (voice injection)
ecosystem.config.js — new codec-observer service entry:
  - script: /usr/local/bin/python3.13
  - args: -u codec_observer.py
  - max_memory_restart: 128M (per design budget)
  - env: { OBSERVER_ENABLED: "true" } — kill switch baseline
  - autorestart: true, restart_delay: 5000, max_restarts: 10

AGENTS.md updates:
  §3 — new "Continuous Observation Loop (Phase 2 Step 5)" sub-section:
    - Architecture summary (4 polled signals, RAM ring buffer)
    - §X injection contract decision tree (local always /
      mcp never / cloud-gated)
    - Privacy contract (4 layers: RAM-only, METADATA-only audit,
      cloud-gating, no new permissions)
    - Cadence (60s active / 5min idle / 30min long-idle reset)
    - Kill switch
    - 4 audit events
    - Forward-compat API for Steps 6 + 7

  §6 — new "Phase 2 Step 5 audit events" table:
    - observation_tick / observation_tick_slow / observation_summary_
      injected / observer_buffer_inspected
    - METADATA-only emit policy documented
    - PHASE2_STEP5_EVENTS frozenset reference

  §10 — new don't-touch entries:
    - ~/.codec/observation_summaries/ (Phase 2 Step 5 only-write
      target, populated by persist_for_shift_report)
    - OBSERVER_ENABLED env var
    - ~/.codec/config.json:observer.{...} tunables (don't go
      below 30s cadence without considering OCR cost)

Total Step 5 PR diff (a-e):
  codec_audit.py             ~+49  (event constants + extra-field docs)
  codec_observer.py         ~+810  (new file: RingBuffer + poll +
                                    injection + run_daemon + Q5.7 API)
  tests/test_observer.py    ~+494  (30 tests)
  codec_dashboard.py         ~+59  (chat injection + buffer-inspect endpoint)
  codec_voice.py             ~+18  (voice injection)
  ecosystem.config.js        ~+18  (PM2 entry)
  AGENTS.md                  ~+44  (§3 + §6 + §10)
  docs/PHASE2-BLUEPRINT.md  ~+257  (committed earlier on this branch)
  docs/PHASE2-STEP5-DESIGN.md ~+473 (committed earlier on this branch)
  ────────────────────────  ─────
  Total                     ~+2,222 (functional + tests + docs)

Note: this is larger than the design's stated ~+880 budget because
it includes the design docs themselves which the budget didn't count.
Functional + tests alone: ~+1,448, in line with Phase 1 Step 2's
~+1,540 baseline.
@AVADSA25 AVADSA25 merged commit 824a52f into main May 2, 2026
1 check passed
AVADSA25 pushed a commit that referenced this pull request May 2, 2026
…(Steps 5/6/7)

Phase 2 (Observer + Triggers + Shift Report) merged and production-stable:
- Step 5 (PR #9 824a52f) + hotfix PR #10 (26e6add) — `observer.ocr_enabled` flag
- Step 6 (PR #11 2d2ff3f) — Trigger System (matcher + cooldown + consent)
- Step 7 (PR #12 0e40687) — end-of-day shift report

Net: +91 passing tests (823/20/73), 0 new failures, 0 new skips.
Live audit proof captured: shift_report_started+_completed paired emits at
2026-05-02T18:49:40Z with shared cid=5f188e5485e5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants