Skip to content
39 changes: 37 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,29 @@ CODEC agents can pause and ask the user a structured question, self-detect when

**Audit envelope**: all Step 3 events use `outcome="warning"`, `level="warning"`. They are NOT `outcome="error"` because each is an operational signal, not an operation failure (same Q4 tightening as Step 2's `hook_error`). `correlation_id` inherits from the wrapping operation per Step 1 §1.4.

### Other known gaps (tracked for Phase 2)
### Continuous Observation Loop (Phase 2 Step 5)

CODEC has a background process (`codec-observer` PM2 service, `codec_observer.py`) that polls four cheap signals — frontmost window, screenshot OCR, clipboard delta, recent file changes — and keeps the last 10 minutes of state in a RAM-only ring buffer. On every chat / voice request, an injection helper decides whether to prepend a ≤200-token summary to the LLM's system prompt, gated per the §X "Observation injection contract":

- **`transport="local"`** (local Qwen) → always inject. Cheap + private.
- **`transport="mcp"`** → never inject. The MCP client (claude.ai, Claude.app) brings its own context.
- **`transport in {"chat", "voice", "http"}`** → gated on cheap text-pattern checks: possessive-without-context (`"my X"`/`"this Y"` filtered against a stop-noun list), continuation language (`"continue"`, `"where was I"`), or skill-flag (`SKILL_NEEDS_OBSERVATION = True` on a resolved skill module).

**Privacy contract**: 4 layers. (1) RAM only — `collections.deque` wiped on process restart. (2) Audit emits are METADATA-ONLY: lengths, counts, `content_type` tags, but NEVER raw window titles, OCR text, clipboard content, or file paths. (3) Cloud-transport injection gating per §X. (4) NO new system permissions — uses existing skills + primitives (osascript, pbpaste, Quartz, getmtime).

**Cadence**: 60s when active (`CGEventSourceSecondsSinceLastEventType < 60s`); drops to 5min when idle. Long-idle reset wipes buffer at 30min idle.

**Kill switch**: `OBSERVER_ENABLED=false` env var disables polling AND injection.

**Audit events** (4 new): `observation_tick` (per poll, info), `observation_tick_slow` (poll > 150ms, warning), `observation_summary_injected` (gated inject fired, info, inherits cid), `observer_buffer_inspected` (debug-gated PWA read).

**Forward-compat API for Steps 6 + 7**: `get_global_buffer()` exposes the live ring buffer (Step 6 Triggers reads `.snapshot()` for trigger evaluation); `persist_for_shift_report()` writes a summary to `~/.codec/observation_summaries/<ts>.md` (the only persistent observer output, called by Step 7 shift-report assembly).

Implementation: `codec_observer.py` (RingBuffer + poll + injection helper + run_daemon), wired into `codec_dashboard.py:chat_completion` and `codec_voice.py:generate_response`. Debug PWA endpoint at `GET /api/observer/buffer?debug=1` returns metadata-only summary (raw entries never exposed even to authed callers; emits `observer_buffer_inspected` per call).

### Other known gaps (tracked for Phase 2 follow-on)
- No formal teammate / sub-agent recursion — Crew is the only multi-agent primitive
- Self-improve agent doesn't yet emit memory facts on Phase 1 events (Step 4 work)
- Step 6 (Triggers) and Step 7 (Shift Report Crew) — Phase 2 Steps still pending

## 4. Skill system

Expand Down Expand Up @@ -230,6 +250,18 @@ Six new event names exported from `codec_audit.py` as module constants. All `out

The constants are also exposed as frozensets for analyzer / introspection: `ASKUSER_EVENTS`, `STUCK_EVENTS`, `STEP3_EVENTS`. `audit_report.py` ingests them as additive event types — no schema bump.

### Phase 2 Step 5 audit events (continuous observation)
Four new event names exported from `codec_audit.py` for the Continuous Observation Loop. All inherit `correlation_id` per §1.4 (the inject event reuses the wrapping chat/voice op's cid; the tick events generate per-poll cids).

| Event | Source | level | extra fields |
|---|---|---|---|
| `observation_tick` | `codec-observer` | info | METADATA-ONLY: `active_app`, `active_title_len`, `ocr_chars`, `ocr_skipped`, `clipboard_changed`, `clipboard_kind`, `recent_files_count`, `idle_seconds`, `cadence_used_s`, `buffer_depth`, `poll_duration_ms` |
| `observation_tick_slow` | `codec-observer` | warning | Same as `observation_tick` — emitted instead when `poll_duration_ms > poll_slow_threshold_ms` (default 150ms). Q5.5 flag for visibility, no behavior change. |
| `observation_summary_injected` | `codec-observer` | info | `tokens_used`, `injection_reason` (`always_local`\|`possessive_match`\|`continuation_match`\|`skill_flag`), `buffer_entries_summarized`. `transport` is top-level (reserved). |
| `observer_buffer_inspected` | `codec-dashboard` | info | `client_ip`, `buffer_entries_returned`. Q5.6 PWA `?debug=1` audit. |

`PHASE2_STEP5_EVENTS` frozenset exposed for analyzer breakdown. `observation_tick` is METADATA-ONLY by design — no titles, no OCR text, no clipboard content, no file paths leak to `~/.codec/audit.log`.

### Notifications (`~/.codec/notifications.json`)
Four sources can produce notifications: scheduler (crew completion), heartbeat (threshold alert), autopilot (ambient trigger), and Phase 1 Step 3's AskUserQuestion (`type="question"`). All write through `routes/_shared.py:51-127` except AskUserQuestion which writes via `codec_ask_user._write_question_notification`.

Expand Down Expand Up @@ -368,6 +400,9 @@ These zones break running infrastructure if changed without coordination. NEVER
- `~/.codec/voice_session.json` (Phase 1 Step 3) — voice-session active-marker; `VoicePipeline.run` owns its lifecycle.
- Phase 1 Step 3 feature-flag env vars — `ASKUSER_ENABLED`, `STUCK_DETECTION_ENABLED`, `STEP_BUDGET_ENABLED` (default true). Set to `false` to disable a feature in production; tests use these to bypass during isolated unit testing. Don't toggle them globally without coordinating — they alter agent behavior across all paths (chat / voice / crew / MCP).
- `~/.codec/config.json:ask_user.{timeout_seconds, consent_strict_max_attempts}` and `:stuck.{window, repeat_threshold, escalation_action}` and `:step_budget.{chat, voice}` — Phase 1 Step 3 tunables. Bumping `step_budget.chat` to 8 or 10 is the documented "tune up before tuning out" pressure-relief valve, but don't touch the others without referencing the design doc rationale (§1.2 Q1, §1.7, §2.3, §3.2).
- `~/.codec/observation_summaries/` (Phase 2 Step 5) — populated only by `codec_observer.persist_for_shift_report()`. Do not add files manually; the Step 7 shift-report assembly relies on the time-stamped naming convention. Safe to delete the whole directory if you want to wipe the persisted history.
- `OBSERVER_ENABLED` env var (Phase 2 Step 5, default `true`). Setting `false` disables both the polling loop AND the prompt injection. No separate injection kill switch — the buffer is always populated when enabled, only injection is gated.
- `~/.codec/config.json:observer.{...}` — Phase 2 Step 5 tunables (cadence_active_s, cadence_idle_s, idle_threshold_s, buffer_depth_min, ocr_timeout_ms, ocr_retry_timeout_ms, reset_on_long_idle, reset_idle_threshold_s, summary_max_tokens, poll_slow_threshold_ms, stop_nouns). Don't tune the cadences below 30s without considering OCR cost.

## 11. Working with this repo as a coding agent

Expand Down
49 changes: 49 additions & 0 deletions codec_audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,55 @@
)


# ── Phase 2 Step 5 event names (Continuous Observation Loop) ──────────────────
# Per docs/PHASE2-STEP5-DESIGN.md §3. `observation_tick` is `level="info"`
# (operational signal, fires once per poll cycle). `observation_summary_injected`
# is `level="info"` and inherits the wrapping chat/voice operation's
# correlation_id (per Step 1 §1.4 — this emit is part of that op, not new).
# `observation_tick_slow` (Q5.5) is `level="warning"` to flag poll-overrun
# without changing behavior. `observer_buffer_inspected` (Q5.6) audits any
# debug-gated read of the live buffer state via the PWA endpoint.
OBSERVATION_TICK = "observation_tick"
OBSERVATION_TICK_SLOW = "observation_tick_slow" # Q5.5
OBSERVATION_SUMMARY_INJECTED = "observation_summary_injected"
OBSERVER_BUFFER_INSPECTED = "observer_buffer_inspected" # Q5.6

PHASE2_STEP5_EVENTS = frozenset({
OBSERVATION_TICK, OBSERVATION_TICK_SLOW,
OBSERVATION_SUMMARY_INJECTED, OBSERVER_BUFFER_INSPECTED,
})

# Step 5 event-specific extra-field reservations.
# observation_tick / observation_tick_slow are METADATA-ONLY by design —
# no titles, no OCR text, no clipboard content, no file paths.
# (See design §3 "What we deliberately do NOT emit".)
OBSERVATION_TICK_EXTRA_FIELDS = (
"active_app", # str — e.g. "Google Chrome"
"active_title_len", # int — length only
"ocr_chars", # int — length of OCR result
"ocr_skipped", # bool — true if OCR timed out
"clipboard_changed", # bool
"clipboard_kind", # "url" | "text" | "code" | "json" | "image_blob_redacted"
"recent_files_count", # int
"idle_seconds", # float — at time of poll
"cadence_used_s", # int — 60 or 300, selected per Q1
"buffer_depth", # int — current ring buffer length
"poll_duration_ms", # float — for OBSERVATION_TICK_SLOW threshold
)

OBSERVATION_INJECTION_EXTRA_FIELDS = (
"tokens_used", # int
"injection_reason", # "always_local" | "possessive_match" |
# "continuation_match" | "skill_flag"
"buffer_entries_summarized", # int
)

OBSERVER_BUFFER_INSPECT_EXTRA_FIELDS = (
"client_ip", # str — who hit the debug endpoint
"buffer_entries_returned", # int
)


# ── Helpers ────────────────────────────────────────────────────────────────────
def _truncate(s, max_len: int = _PREVIEW_MAX) -> str:
"""Truncate a string to `max_len` chars. None/non-str → ''. Never raises."""
Expand Down
64 changes: 64 additions & 0 deletions codec_dashboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,50 @@ async def status():
}


# Phase 2 Step 5 §Q5.6 — debug-gated buffer-inspect endpoint.
# Anyone with PWA auth can call this with `?debug=1`. Every call emits
# an `observer_buffer_inspected` audit event so privileged reads are
# observable in the audit log. NOT linked from the main UI.
@app.get("/api/observer/buffer")
async def observer_buffer(request: Request, debug: int = 0):
"""Return the current ring buffer state. Q5.6 design: debug-only,
auth-gated (covered by the dashboard's existing /api/* auth
middleware), audit-emitting."""
if int(debug) != 1:
return {"error": "set ?debug=1 to read live observer buffer"}
try:
from codec_observer import get_global_buffer
from codec_audit import OBSERVER_BUFFER_INSPECTED, log_event as _le
buf = get_global_buffer()
snap = buf.snapshot()
try:
client_ip = request.client.host if request.client else "unknown"
except Exception:
client_ip = "unknown"
try:
_le(
OBSERVER_BUFFER_INSPECTED, "codec-dashboard",
f"observer buffer inspected via /api/observer/buffer",
extra={
"client_ip": client_ip,
"buffer_entries_returned": len(snap),
},
outcome="ok", level="info",
)
except Exception:
pass
# Return only the metadata + a redacted summary, NOT the raw entries
# (raw entries contain titles + OCR text + clipboard content).
return {
"buffer_depth": len(snap),
"summary": buf.render_summary(),
"oldest_ts": snap[0].get("ts") if snap else None,
"newest_ts": snap[-1].get("ts") if snap else None,
}
except Exception as e:
return {"error": f"observer not available: {e}"}


def _mask_sensitive(value: str) -> str:
"""Mask sensitive field values, showing only last 4 characters."""
if not value or not isinstance(value, str):
Expand Down Expand Up @@ -2557,6 +2601,26 @@ async def _skill_stream():
"DO NOT emit [SKILL:...] tool-calling tags in this response — "
"the answer IS the rewritten text, no tools needed."
)
# Phase 2 Step 5 — Observer summary injection (gated per §X).
# Local Qwen always injects; cloud transports (this chat path uses
# local-by-default but may be cloud-routed by the user — pass the
# detected transport tag) gate on possessive / continuation /
# skill-flag patterns. Returns (summary_or_None, reason); audit
# emit fires inside the helper ONLY when summary non-None.
try:
from codec_observer import maybe_inject_observation_summary
_obs_transport = "local" if "localhost" in (config.get("llm_base_url") or "") else "chat"
_obs_summary, _obs_reason = maybe_inject_observation_summary(
user_prompt=last_user_text or "",
transport=_obs_transport,
skill_name=None, # post-LLM tag path, no skill resolved yet
skill_module=None,
)
if _obs_summary:
sys_prompt += f"\n\n{_obs_summary}"
except Exception as _e:
log.debug(f"[observer] injection failed (non-fatal): {_e}")

# Prepend system message (or replace existing one)
if messages and messages[0].get("role") == "system":
messages[0]["content"] = sys_prompt + "\n\n" + messages[0]["content"]
Expand Down
Loading
Loading