fix(phase3-step9): consolidated review fast-follow (I2 + I4 + M1 + M2 + M4) by AVADSA25 · Pull Request #20 · AVADSA25/codec

AVADSA25 · 2026-05-03T11:12:24Z

Summary

Consolidated fast-follow PR addressing all 5 deferred items from the Phase 3 Step 9 code review (PR #19). Merged into a single PR per user request to keep workflow simple.

What this fixes

ID	Issue	Fix
I2	`step_budget_exhausted` mapped to `blocked_on_permission` (semantic mismatch — there's no permission to grant)	Now transitions to `paused` with reason. New `POST /api/agents/{id}/extend_budget` endpoint bumps the current checkpoint's budget via `state.json` overrides (plan stays immutable; tamper check intact).
I4	Daemon crash-recovery `AGENT_RESUMED` emit had no `correlation_id`, orphaned from subsequent `_run_agent` chain	`_run_agent(agent_id, cid=None)` now accepts optional cid; daemon mints `recovery_cid` once and threads it through both the audit emit AND the new `_run_agent` invocation.
M1	`permission_gate` had no test for `domain_not_authorized` path	Added `test_permission_gate_blocks_domain_not_in_grants`.
M2	Plan's Task 10 Step 2 said "add codec-agent-runner to heartbeat" but heartbeat is HTTP-only and codec-observer (sibling daemon) isn't probed either — the plan was wrong	Added 4-line note in `check_system_health()` docstring documenting that PM2 `autorestart: true` is the supervision contract for daemons. No behavior change.
M4	`PermissionManifest.read_paths` declared but `permission_gate` only enforces `write_paths` (asymmetry undocumented)	Added 6-line inline comment explaining the deliberate scoping (full read enforcement would need new `Action` field + LLM prompt update — out of scope for Step 9).

Tests

test_agent_runner.py: 33 → 36 tests (+1 from M1, +1 from I4, +3 from I2 minus 2 not added)
Wait — actual delta is +1 M1 (domain), +1 I4 (cid match), +3 I2 (paused, extend_budget happy, extend_budget 409) = +5 tests, total 36
Full test_agent_runner.py + test_agent_plan.py suite: 67 passed, 0 new failures
All baseline 20/73 preserved

Files changed

File	Lines	What
`codec_agent_runner.py`	+37 −10	I2 budget-pause, I4 cid threading, M4 comment
`tests/test_agent_runner.py`	+85	M1, I4, I2 tests
`routes/agents.py`	+60	`/extend_budget` endpoint + `ExtendBudgetBody` model
`codec_heartbeat.py`	+6 −1	M2 docstring note

Provenance

This PR consolidates outputs from 5 side-chat sessions that the user spawned via review-feedback chips. The user requested I wrap the work cleanly in this main session — so all 5 side-chats' output is folded here, with I2 (which paused awaiting design confirmation) implemented per side-chat #3's recommended option 2 (state.json override, plan stays immutable).

Test plan

🧪 tests/test_agent_runner.py + test_agent_plan.py → 67 passed
All 5 review issues verified fixed via dedicated tests

Post-merge deploy:

cd ~/codec-repo
git pull
pm2 restart codec-dashboard  # picks up /extend_budget endpoint
pm2 restart codec-agent-runner  # picks up I2 + I4 fixes

🤖 Generated with Claude Code

Adds 5th permission_gate test exercising the network_domains branch (codec_agent_runner.py:109-113), which previously had no coverage. Asserts PermissionViolation with reason='domain_not_authorized' and needed=<domain> when action.network_call=True and the domain is absent from both per-agent and global grants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PHASE3-STEP9-PLAN.md Task 10 Step 2 prescribed adding codec-agent-runner to a _MONITORED_SERVICES list, but that list does not exist — heartbeat probes are HTTP-only and codec-observer (also a PM2 daemon) is already absent from the existing services dict. AGENTS.md §3 documents the intentional deferral; this docstring closes the loop in code so future agents reading codec_heartbeat.py see the rationale without rediscovering it from the plan. No behavior change; PM2 autorestart remains the supervision contract for codec-agent-runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…RESUMED chains with subsequent run emits

… D-3) Closes the two write-path criticals from Phase 1 Audit D (see docs/audits/PHASE-1-SECURITY.md): D-2 /api/forge — URL fetch → LLM → write skill, no review gate D-3 /api/save_skill — direct write with only a substring blocker Both endpoints are removed entirely. Skill creation now routes exclusively through /api/skill/review → /api/skill/approve (the human-review-and-approve flow). The URL-fetch capability is intentionally dropped per Mickael decision Q1 — anyone wanting to import skill code from a URL pastes the source into the editor and goes through the review-and-approve flow like any other skill. Defense in depth with PR-1A: even if a malicious file reaches ~/.codec/skills/ via some other path, SkillRegistry.load refuses it at load time via the manifest + AST gate. Live proof of D-2: the initial RED-phase test_post_forge_returns_404 actually called the live LLM, fetched example.com, AND wrote a real fetch_example_domain_html.py to skills/ before the endpoint was removed. The endpoint was a working SSRF + RCE-write primitive on this machine right up to the moment it was deleted. Changes: Code: - routes/skills.py: * delete async def save_skill (D-3 handler) * delete async def forge_skill (D-2 handler) * module docstring updated to reflect the review-and-approve-only contract; unused imports (DASHBOARD_DIR, CONFIG_PATH) dropped; import style PEP8'd - codec_vibe.html: * remove the entire Forge Modal (HTML markup + overlay + tabs) * remove .fm-* CSS block * remove Skill + Forge toolbar buttons (kept Test button — it calls the independent /api/run_code, not the removed endpoints) * remove JS: doSkill, doForge, submitForge, closeForgeModal, setForgeMode, _forgeMode, fmOverlay event handlers * update Vibe system prompt — no more mention of Skill Forge - scripts/feature_audit.py: * features 10 and 12 now assert 404 (proof of removal) instead of "endpoint responsive" Docs / UI references: - AGENTS.md §7: new "Skill creation flow — review-and-approve only" subsection above the Skill load-time safety gate subsection. Documents D-2 + D-3 removal trail. - codec_cortex.html: CODEC Vibe panel rewritten — "Skill Forge" removed from subtitle, features, description. - codec_chat.html: Vibe blurb updated. - README.md: 5 mentions updated (table row, section heading, "Writes its own plugins" comparison row, IDE comparison row, Project Structure description). - FEATURES.md: Vibe feature list renumbered — Skill Forge items removed; the human-review approval workflow stays as item 17. - setup_codec.py:457: Vibe install description. - docs/API.md: /api/forge section removed; /api/skill/review + /api/skill/approve are now the documented contract. Tests: - NEW tests/test_skill_routes.py — 9 tests covering removal + replacement: * test_save_skill_handler_removed * test_forge_skill_handler_removed * test_no_route_decorator_strings_for_removed_endpoints * test_replacement_handlers_present * test_post_save_skill_returns_404 * test_post_forge_returns_404 * test_post_skill_review_still_accepts_valid_body * test_skill_review_rejects_empty_body * test_skill_approve_writes_only_after_review (full review → approve flow exercised via FastAPI TestClient, asserts no-disk-write at review step and disk-write at approve step) - tests/test_dashboard.py: * test_forge_requires_input replaced with test_forge_endpoint_removed (asserts 404) * added test_save_skill_endpoint_removed - tests/test_full_product_audit.py: * test_forge_endpoint replaced with test_forge_endpoint_removed - tests/test_critical_fixes.py: * TestSkillForgeBlocklist class removed entirely. The substring blocker it tested is gone with the endpoints. Dangerous-pattern coverage now lives in tests/test_skill_registry.py (load-time AST gate) and routes/skills.py:skill_approve (write-time AST). * Note: this class was already failing on main per docs/PHASE1-STEP1-PREMERGE-AUDIT.md row #4 because it referenced codec_dashboard.save_skill which never existed on main; the deletion is a positive net for the test suite. - tests/test_security.py: * test_save_skill_validates_content removed. Same root cause as above — referenced codec_dashboard.save_skill, was already failing on main per the pre-merge audit row #20. Audit closure: - docs/audits/PHASE-1-SECURITY.md: * D-2 closure footnote appended * D-3 closure footnote appended - docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md: * D-2 row status: W1 → W1 — CLOSED (PR-1B) * D-3 row status: W1 → W1 — CLOSED (PR-1B) Verification: - pytest tests/test_skill_routes.py — 9 passed (RED watched, then GREEN after endpoint removal) - pytest tests/test_skill_registry.py — 13 passed (PR-1A tests still green) - pytest tests/test_skill_contracts.py — 1 passed - pytest tests/test_oauth_provider.py + test_retry.py — passed - python3 tests/test_skill_imports.py — 76 skills parsed, 0 errors - python3 tools/generate_skill_manifest.py --check — manifest current - ruff check routes/skills.py tests/test_skill_routes.py — clean - Full regression against PR-1A baseline (stash-and-rerun): same 12 pre-existing failures, no new regressions. Net failures count drops by 2 (the deleted broken tests). Out of scope (later PRs): - D-4 file_write block-roots expansion (PR-1C) - D-5 permission_gate realpath + path-blocklist (PR-1D) - D-17 positive-allowlist for is_dangerous_skill_code (optional PR-1E) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(security): remove /api/save_skill and /api/forge endpoints (D-2 + D-3) Closes the two write-path criticals from Phase 1 Audit D (see docs/audits/PHASE-1-SECURITY.md): D-2 /api/forge — URL fetch → LLM → write skill, no review gate D-3 /api/save_skill — direct write with only a substring blocker Both endpoints are removed entirely. Skill creation now routes exclusively through /api/skill/review → /api/skill/approve (the human-review-and-approve flow). The URL-fetch capability is intentionally dropped per Mickael decision Q1 — anyone wanting to import skill code from a URL pastes the source into the editor and goes through the review-and-approve flow like any other skill. Defense in depth with PR-1A: even if a malicious file reaches ~/.codec/skills/ via some other path, SkillRegistry.load refuses it at load time via the manifest + AST gate. Live proof of D-2: the initial RED-phase test_post_forge_returns_404 actually called the live LLM, fetched example.com, AND wrote a real fetch_example_domain_html.py to skills/ before the endpoint was removed. The endpoint was a working SSRF + RCE-write primitive on this machine right up to the moment it was deleted. Changes: Code: - routes/skills.py: * delete async def save_skill (D-3 handler) * delete async def forge_skill (D-2 handler) * module docstring updated to reflect the review-and-approve-only contract; unused imports (DASHBOARD_DIR, CONFIG_PATH) dropped; import style PEP8'd - codec_vibe.html: * remove the entire Forge Modal (HTML markup + overlay + tabs) * remove .fm-* CSS block * remove Skill + Forge toolbar buttons (kept Test button — it calls the independent /api/run_code, not the removed endpoints) * remove JS: doSkill, doForge, submitForge, closeForgeModal, setForgeMode, _forgeMode, fmOverlay event handlers * update Vibe system prompt — no more mention of Skill Forge - scripts/feature_audit.py: * features 10 and 12 now assert 404 (proof of removal) instead of "endpoint responsive" Docs / UI references: - AGENTS.md §7: new "Skill creation flow — review-and-approve only" subsection above the Skill load-time safety gate subsection. Documents D-2 + D-3 removal trail. - codec_cortex.html: CODEC Vibe panel rewritten — "Skill Forge" removed from subtitle, features, description. - codec_chat.html: Vibe blurb updated. - README.md: 5 mentions updated (table row, section heading, "Writes its own plugins" comparison row, IDE comparison row, Project Structure description). - FEATURES.md: Vibe feature list renumbered — Skill Forge items removed; the human-review approval workflow stays as item 17. - setup_codec.py:457: Vibe install description. - docs/API.md: /api/forge section removed; /api/skill/review + /api/skill/approve are now the documented contract. Tests: - NEW tests/test_skill_routes.py — 9 tests covering removal + replacement: * test_save_skill_handler_removed * test_forge_skill_handler_removed * test_no_route_decorator_strings_for_removed_endpoints * test_replacement_handlers_present * test_post_save_skill_returns_404 * test_post_forge_returns_404 * test_post_skill_review_still_accepts_valid_body * test_skill_review_rejects_empty_body * test_skill_approve_writes_only_after_review (full review → approve flow exercised via FastAPI TestClient, asserts no-disk-write at review step and disk-write at approve step) - tests/test_dashboard.py: * test_forge_requires_input replaced with test_forge_endpoint_removed (asserts 404) * added test_save_skill_endpoint_removed - tests/test_full_product_audit.py: * test_forge_endpoint replaced with test_forge_endpoint_removed - tests/test_critical_fixes.py: * TestSkillForgeBlocklist class removed entirely. The substring blocker it tested is gone with the endpoints. Dangerous-pattern coverage now lives in tests/test_skill_registry.py (load-time AST gate) and routes/skills.py:skill_approve (write-time AST). * Note: this class was already failing on main per docs/PHASE1-STEP1-PREMERGE-AUDIT.md row #4 because it referenced codec_dashboard.save_skill which never existed on main; the deletion is a positive net for the test suite. - tests/test_security.py: * test_save_skill_validates_content removed. Same root cause as above — referenced codec_dashboard.save_skill, was already failing on main per the pre-merge audit row #20. Audit closure: - docs/audits/PHASE-1-SECURITY.md: * D-2 closure footnote appended * D-3 closure footnote appended - docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md: * D-2 row status: W1 → W1 — CLOSED (PR-1B) * D-3 row status: W1 → W1 — CLOSED (PR-1B) Verification: - pytest tests/test_skill_routes.py — 9 passed (RED watched, then GREEN after endpoint removal) - pytest tests/test_skill_registry.py — 13 passed (PR-1A tests still green) - pytest tests/test_skill_contracts.py — 1 passed - pytest tests/test_oauth_provider.py + test_retry.py — passed - python3 tests/test_skill_imports.py — 76 skills parsed, 0 errors - python3 tools/generate_skill_manifest.py --check — manifest current - ruff check routes/skills.py tests/test_skill_routes.py — clean - Full regression against PR-1A baseline (stash-and-rerun): same 12 pre-existing failures, no new regressions. Net failures count drops by 2 (the deleted broken tests). Out of scope (later PRs): - D-4 file_write block-roots expansion (PR-1C) - D-5 permission_gate realpath + path-blocklist (PR-1D) - D-17 positive-allowlist for is_dangerous_skill_code (optional PR-1E) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(audits): record PR-1A merge commit hash in D-1 closure footnote PR-1A (#42) merged to main as squash commit 48ec5d5. Update the D-1 closure footnote in docs/audits/PHASE-1-SECURITY.md and the D-1 row in docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md to reference the PR number + commit hash, replacing the branch-name placeholder. This commit lands on the PR-1B branch (fix/pr1b-remove-save-skill-and-forge) so the citation travels with PR-1B's next merge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(config): handle headless ImportError on pynput gracefully CI failure root cause for PR-1A (#42) and PR-1B (#43): the smoke job on Ubuntu runners failed at test collection because importing codec_skill_registry → codec_config → `from pynput import keyboard` raised ImportError on a headless display: ImportError: this platform is not supported: ('failed to acquire X connection: Bad display name ""') pynput needs a real display (X11 / AppKit / win32). GitHub Actions Linux runners are headless. Other modules import codec_config for its constants and is_dangerous_skill_code — none of them need the keyboard subsystem. Fix: wrap the pynput import in try/except so codec_config is import- safe in headless contexts. The two `getattr(keyboard.Key, ...)` call sites in `_resolve_key` now early-return None when keyboard is None (the resulting KEY_TOGGLE / KEY_VOICE / KEY_TEXT module constants are None on headless systems — only matters if the live keyboard listener actually runs, which it doesn't in CI). This was triggered by the new PR-1A test `tests/test_skill_registry.py` which CI now runs per the new `Skill registry load-time AST gate tests (D-1)` step. Before PR-1A, no CI-gated test imported codec_config, so the pynput crash was latent. Production paths (codec.py daemon, dashboard, etc.) all run on macOS where pynput imports fine. Verified locally: - pytest tests/test_skill_routes.py tests/test_skill_registry.py tests/test_skill_contracts.py tests/test_oauth_provider.py tests/test_retry.py → 28 passed - python3 tests/test_skill_imports.py → 76 skills parsed, 0 errors - python3 tools/generate_skill_manifest.py --check → ok - Headless simulation: monkey-patch builtins.__import__ to raise on pynput, then `from codec_config import is_dangerous_skill_code` → succeeds, returns (True, 'Dangerous import: subprocess') correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Mickael Farina <farina.mickael@gmail.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Mikarina13 and others added 5 commits May 3, 2026 13:08

fix(agent_runner): I4 — propagate cid through daemon resume so AGENT_…

a1b45b2

…RESUMED chains with subsequent run emits

fix(agent_runner): M4 — document read_paths asymmetry in permission_gate

e14985b

fix(agent_runner): I2 — paused on step_budget + /extend_budget endpoint

0d609f8

AVADSA25 merged commit ec14697 into main May 3, 2026
1 check passed

AVADSA25 mentioned this pull request May 3, 2026

docs(phase3): closeout — PHASE3-COMPLETE.md #23

Merged

3 tasks

AVADSA25 mentioned this pull request May 17, 2026

fix(security): D-2 + D-3 — remove /api/save_skill and /api/forge #43

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(phase3-step9): consolidated review fast-follow (I2 + I4 + M1 + M2 + M4)#20

fix(phase3-step9): consolidated review fast-follow (I2 + I4 + M1 + M2 + M4)#20
AVADSA25 merged 5 commits into
mainfrom
feat/phase3-step9-fastfollow

AVADSA25 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 3, 2026

Summary

What this fixes

Tests

Files changed

Provenance

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants