Skip to content

Commit 9579697

Browse files
authored
Merge pull request #19 from AVADSA25/feat/phase3-step9-implementation
feat(phase3-step9): Background Execution + Permission Gate
2 parents 59f7726 + 1f30bc6 commit 9579697

7 files changed

Lines changed: 1601 additions & 11 deletions

File tree

AGENTS.md

Lines changed: 65 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,10 +231,51 @@ Drop-a-project planning layer. User describes a project; Qwen-3.6 drafts a struc
231231

232232
Implementation: `codec_agent_plan.py` (~640 LOC), `routes/agents.py` (~250 LOC of new endpoints).
233233

234+
### Background Execution + Permission Gate (Phase 3 Step 9)
235+
236+
`codec_agent_runner.py` is the runtime layer. PM2-managed daemon `codec-agent-runner` polls `~/.codec/agents/*/state.json` every 5s, picks up `status=approved` plans, executes their checkpoints autonomously via Qwen-3.6 ↔ skill loops. **Permission gate** enforces the manifest on every action; outside-manifest = `blocked_on_permission` + `ask_user` notification.
237+
238+
**Per-checkpoint loop** (inside `_execute_checkpoint`):
239+
1. `_qwen_next_action()` returns either `Action(kind="skill_call", ...)` or `Action(kind="checkpoint_done")`
240+
2. `permission_gate(action, agent_grants, global_grants)` raises `PermissionViolation` if outside manifest
241+
3. If `action.is_destructive`: `_enforce_destructive_gate()` calls Step 3 §1.7 strict-consent (literal verb-match required, generic "yes" rejected)
242+
4. `_run_skill()` dispatches via `codec_dispatch.run_skill` (Step 1+2 hooks fire automatically)
243+
5. Append result to history, loop until `checkpoint_done` OR `step_budget` cap reached
244+
245+
**Resume policy (Q5):** after PM2 restart, daemon scans for `status=running` agents. Any with no live thread = crashed. Marks `crashed_resumed`, then transitions back to `running` and respawns. Worst case: one operation re-fires from the last atomic checkpoint save (idempotent skills are safe; destructive ops re-hit strict-consent).
246+
247+
**Multi-agent concurrency (Q6, Q8):** default `MAX_CONCURRENT=3`, env var `AGENT_RUNNER_MAX_CONCURRENT`. Blocked agents (any `blocked_*` state) **occupy a slot** — trade-off: 3 simultaneous overnight blocks = no new agent can start until you grant.
248+
249+
**Plan-hash tamper detection (Q13):** at run start, `_run_agent` verifies `manifest.plan_hash == sha256(plan.json)`. Mismatch → `aborted(plan_tampered)`.
250+
251+
**Public API (`codec_agent_runner`):**
252+
- `_run_agent(agent_id)` — main per-agent thread function (called by daemon)
253+
- `_daemon_one_tick()` — synchronous test-only wrapper
254+
- `run_daemon()` — production entry point (PM2 `codec-agent-runner`)
255+
- `permission_gate(action, agent_grants, global_grants)` — synchronous gate check
256+
- Dataclasses: `Action`, `ConsentResult`
257+
- Exceptions: `PermissionViolation`, `DestructiveOpRejected`, `StepBudgetExhausted`, `QwenUnavailableError`
258+
259+
**PWA endpoints (`routes/agents.py` Step 9 additions):** `POST /api/agents/{id}/abort`, `/pause`, `/resume`, `/grant` (body: `kind`, `value` — adds to per-agent grants, unblocks if `blocked_on_permission`).
260+
261+
**Service supervision:** PM2's built-in `autorestart: true` provides crash recovery (no separate heartbeat HTTP probe needed — `codec-agent-runner` is a daemon, not an HTTP service). PM2 max_memory_restart=256M and max_restarts=10.
262+
263+
**8 audit events** (paired correlation_id per `agent_started` operation envelope per Step 1 §1.4): `agent_started`, `agent_checkpoint_started`, `_completed`, `agent_paused`, `agent_resumed`, `agent_blocked_on_permission`, `agent_completed`, `agent_aborted`.
264+
265+
**Kill switches:**
266+
- `AGENT_RUNNER_ENABLED=false` — daemon idles (still scans, never spawns threads)
267+
- Per-agent: `POST /api/agents/{id}/abort` (atomic state write)
268+
- Per-agent: `POST /api/agents/{id}/pause` / `/resume`
269+
270+
**Reuses (no new infrastructure):** Step 1 audit envelope · Step 2 plugin lifecycle hooks (every `run_skill` wrapped automatically) · Step 3 `ask_user` (outside-manifest pause) · Step 3 §1.7 strict-consent (universal floor for destructive ops) · Step 5 observer (passively records agent activity) · Step 7 shift_report (agent activity surfaces in daily summary).
271+
272+
Implementation: `codec_agent_runner.py` (~700 LOC), `routes/agents.py` (+120 for Step 9 endpoints), `ecosystem.config.js` (+22 for PM2 entry).
273+
234274
### Other known gaps (tracked for Phase 3 follow-on)
235-
- Step 8 ships planning ONLY — no execution (Step 9 picks that up)
275+
- No UI yet — Step 10 ships chat mode dropdown + status pills + agent timeline
276+
- No proactive messaging from agent → user (Step 10)
236277
- No formal teammate / sub-agent recursion — Crew is the only multi-agent primitive
237-
- (Phase 3 complete after Steps 9 + 10 ship)
278+
- (Phase 3 complete after Step 10 ships)
238279

239280
## 4. Skill system
240281

@@ -399,6 +440,23 @@ Six event names. All `level="info"` except `_rejected` (warning). Each is a sing
399440

400441
`PHASE3_STEP8_EVENTS` frozenset exposed.
401442

443+
#### Phase 3 Step 9 events — agent runtime lifecycle
444+
445+
Eight event names. `agent_started` opens the per-agent operation envelope; subsequent events all share that single correlation_id (multi-emit op per Step 1 §1.4). `agent_blocked_on_permission` and `agent_paused` are warning level; `agent_aborted` is error or warning depending on cause; the rest are info.
446+
447+
| Event | Source | level | extra fields |
448+
|---|---|---|---|
449+
| `agent_started` | `codec-agent-runner` | info | `agent_id`, `checkpoint_count`, `starting_at` (resume idx) |
450+
| `agent_checkpoint_started` | `codec-agent-runner` | info | `agent_id`, `checkpoint_id`, `checkpoint_idx` |
451+
| `agent_checkpoint_completed` | `codec-agent-runner` | info | `agent_id`, `checkpoint_id`, `checkpoint_idx`, `steps_used` |
452+
| `agent_paused` | `codec-agent-runner` | warning | `agent_id`, `checkpoint_id`, `reason` |
453+
| `agent_resumed` | `codec-agent-runner` | info | `agent_id`, `recovery` (true=PM2-restart) |
454+
| `agent_blocked_on_permission` | `codec-agent-runner` | warning | `agent_id`, `checkpoint_id`, `reason`, `needed` |
455+
| `agent_completed` | `codec-agent-runner` | info | `agent_id`, `total_steps` |
456+
| `agent_aborted` | `codec-agent-runner` | error\|warning | `agent_id`, `reason` |
457+
458+
`PHASE3_STEP9_EVENTS` frozenset exposed.
459+
402460
### Notifications (`~/.codec/notifications.json`)
403461
Four sources can produce notifications: scheduler (crew completion), heartbeat (threshold alert), autopilot (ambient trigger), and Phase 1 Step 3's AskUserQuestion (`type="question"`). All write through `routes/_shared.py:51-127` except AskUserQuestion which writes via `codec_ask_user._write_question_notification`.
404462

@@ -552,6 +610,11 @@ These zones break running infrastructure if changed without coordination. NEVER
552610
- `~/.codec/agent_global_grants.json` (Phase 3 Step 8) — cross-agent allowlist. Modify only via `add_global_grant()` / `remove_global_grant()` or the `/api/agent_global_grants` endpoints. Atomic-write contract.
553611
- `AGENT_PLANNING_ENABLED` env var (Phase 3 Step 8, default `true`). Setting `false` blocks plan drafting; existing approved plans are untouched.
554612
- `MAX_CLARIFYING_ROUNDS` constant in `codec_agent_plan.py` (default 3) — caps the vague-description clarifying loop. Tune up cautiously; users can get stuck in long Q&A loops if too high.
613+
- `codec_agent_runner.py` (Phase 3 Step 9) — runtime daemon. Don't refactor without re-running the PHASE3-STEP9 design gate. The `MAX_CONCURRENT` constant and `_active_threads` global are mutated under `_threads_lock`; no other code may touch them.
614+
- `_VALID_TRANSITIONS` in `codec_agent_plan.py` (Phase 3 Step 9 extension) — state machine map. Never remove a transition; only add. Step 10 will extend with paused-with-message states.
615+
- `AGENT_RUNNER_ENABLED` and `AGENT_RUNNER_MAX_CONCURRENT` env vars (Phase 3 Step 9, defaults `true` / `3`). `AGENT_RUNNER_ENABLED=false` idles the daemon.
616+
- PM2 `codec-agent-runner` service (Phase 3 Step 9). Stop/restart through PM2; `autorestart: true` provides crash recovery automatically. Don't add HTTP heartbeat probes — daemon doesn't expose HTTP by design.
617+
- `~/.codec/agents/<id>/state.json` after Step 9 deploy — read/written by `codec_agent_runner._run_agent` mid-checkpoint. Manual edits while an agent is `running` will desync the resume mechanism. To pause an agent: `POST /api/agents/{id}/pause`.
555618

556619
## 11. Working with this repo as a coding agent
557620

codec_agent_plan.py

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -490,16 +490,34 @@ class InvalidStatusTransition(ValueError):
490490
"""Disallowed status transition attempted."""
491491

492492

493-
# Step 8 only manages: draft_pending → awaiting_approval → approved/rejected/revised.
494-
# Step 9 introduces: approved → running → checkpoint_completed/blocked_*/aborted/completed.
495-
# This map will be EXTENDED in Step 9.
493+
# Step 8 manages: draft_pending → awaiting_approval → approved/rejected/revised.
494+
# Step 9 adds: approved → running → checkpoint_completed/blocked_*/aborted/completed.
496495
_VALID_TRANSITIONS: Dict[str, frozenset] = {
497-
"draft_pending": frozenset({"awaiting_approval", "plan_failed"}),
498-
"awaiting_approval": frozenset({"approved", "rejected", "revised"}),
499-
"revised": frozenset({"awaiting_approval"}),
500-
"approved": frozenset({"rejected"}), # Step 9 will add: running
501-
"rejected": frozenset(),
502-
"plan_failed": frozenset({"draft_pending"}), # retry path
496+
"draft_pending": frozenset({"awaiting_approval", "plan_failed"}),
497+
"awaiting_approval": frozenset({"approved", "rejected", "revised"}),
498+
"revised": frozenset({"awaiting_approval"}),
499+
# `approved → aborted` (review fix C1): a plan-hash mismatch or
500+
# missing-hash check at run-start fires while the agent is still in
501+
# `approved` status (before transitioning to `running`). We must allow
502+
# that abort path; otherwise the tamper-detection code raises
503+
# InvalidStatusTransition and the bare-except handler papers over it.
504+
# Plan deviation from PHASE3-STEP9-PLAN.md Task 2 — intentional,
505+
# security-critical addition.
506+
"approved": frozenset({"rejected", "running", "aborted"}),
507+
"rejected": frozenset(),
508+
"plan_failed": frozenset({"draft_pending"}), # retry path
509+
510+
# Step 9 runtime states
511+
"running": frozenset({"completed", "aborted", "paused",
512+
"blocked_on_permission",
513+
"blocked_on_destructive",
514+
"crashed_resumed"}),
515+
"paused": frozenset({"running", "aborted"}),
516+
"blocked_on_permission": frozenset({"running", "aborted"}),
517+
"blocked_on_destructive": frozenset({"running", "aborted"}),
518+
"crashed_resumed": frozenset({"running", "aborted"}),
519+
"completed": frozenset(),
520+
"aborted": frozenset(),
503521
}
504522

505523

0 commit comments

Comments
 (0)