fix(phase3-step9): consolidated review fast-follow (I2 + I4 + M1 + M2 + M4)#20
Merged
Conversation
Adds 5th permission_gate test exercising the network_domains branch (codec_agent_runner.py:109-113), which previously had no coverage. Asserts PermissionViolation with reason='domain_not_authorized' and needed=<domain> when action.network_call=True and the domain is absent from both per-agent and global grants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PHASE3-STEP9-PLAN.md Task 10 Step 2 prescribed adding codec-agent-runner to a _MONITORED_SERVICES list, but that list does not exist — heartbeat probes are HTTP-only and codec-observer (also a PM2 daemon) is already absent from the existing services dict. AGENTS.md §3 documents the intentional deferral; this docstring closes the loop in code so future agents reading codec_heartbeat.py see the rationale without rediscovering it from the plan. No behavior change; PM2 autorestart remains the supervision contract for codec-agent-runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…RESUMED chains with subsequent run emits
3 tasks
10 tasks
AVADSA25
pushed a commit
that referenced
this pull request
May 17, 2026
… D-3)
Closes the two write-path criticals from Phase 1 Audit D (see
docs/audits/PHASE-1-SECURITY.md):
D-2 /api/forge — URL fetch → LLM → write skill, no review gate
D-3 /api/save_skill — direct write with only a substring blocker
Both endpoints are removed entirely. Skill creation now routes
exclusively through /api/skill/review → /api/skill/approve (the
human-review-and-approve flow). The URL-fetch capability is
intentionally dropped per Mickael decision Q1 — anyone wanting to
import skill code from a URL pastes the source into the editor and
goes through the review-and-approve flow like any other skill.
Defense in depth with PR-1A: even if a malicious file reaches
~/.codec/skills/ via some other path, SkillRegistry.load refuses
it at load time via the manifest + AST gate.
Live proof of D-2: the initial RED-phase test_post_forge_returns_404
actually called the live LLM, fetched example.com, AND wrote a real
fetch_example_domain_html.py to skills/ before the endpoint was
removed. The endpoint was a working SSRF + RCE-write primitive on
this machine right up to the moment it was deleted.
Changes:
Code:
- routes/skills.py:
* delete async def save_skill (D-3 handler)
* delete async def forge_skill (D-2 handler)
* module docstring updated to reflect the review-and-approve-only
contract; unused imports (DASHBOARD_DIR, CONFIG_PATH) dropped;
import style PEP8'd
- codec_vibe.html:
* remove the entire Forge Modal (HTML markup + overlay + tabs)
* remove .fm-* CSS block
* remove Skill + Forge toolbar buttons (kept Test button — it
calls the independent /api/run_code, not the removed endpoints)
* remove JS: doSkill, doForge, submitForge, closeForgeModal,
setForgeMode, _forgeMode, fmOverlay event handlers
* update Vibe system prompt — no more mention of Skill Forge
- scripts/feature_audit.py:
* features 10 and 12 now assert 404 (proof of removal) instead
of "endpoint responsive"
Docs / UI references:
- AGENTS.md §7: new "Skill creation flow — review-and-approve only"
subsection above the Skill load-time safety gate subsection.
Documents D-2 + D-3 removal trail.
- codec_cortex.html: CODEC Vibe panel rewritten — "Skill Forge"
removed from subtitle, features, description.
- codec_chat.html: Vibe blurb updated.
- README.md: 5 mentions updated (table row, section heading,
"Writes its own plugins" comparison row, IDE comparison row,
Project Structure description).
- FEATURES.md: Vibe feature list renumbered — Skill Forge items
removed; the human-review approval workflow stays as item 17.
- setup_codec.py:457: Vibe install description.
- docs/API.md: /api/forge section removed; /api/skill/review +
/api/skill/approve are now the documented contract.
Tests:
- NEW tests/test_skill_routes.py — 9 tests covering removal +
replacement:
* test_save_skill_handler_removed
* test_forge_skill_handler_removed
* test_no_route_decorator_strings_for_removed_endpoints
* test_replacement_handlers_present
* test_post_save_skill_returns_404
* test_post_forge_returns_404
* test_post_skill_review_still_accepts_valid_body
* test_skill_review_rejects_empty_body
* test_skill_approve_writes_only_after_review (full review →
approve flow exercised via FastAPI TestClient, asserts
no-disk-write at review step and disk-write at approve step)
- tests/test_dashboard.py:
* test_forge_requires_input replaced with
test_forge_endpoint_removed (asserts 404)
* added test_save_skill_endpoint_removed
- tests/test_full_product_audit.py:
* test_forge_endpoint replaced with test_forge_endpoint_removed
- tests/test_critical_fixes.py:
* TestSkillForgeBlocklist class removed entirely. The substring
blocker it tested is gone with the endpoints. Dangerous-pattern
coverage now lives in tests/test_skill_registry.py (load-time
AST gate) and routes/skills.py:skill_approve (write-time AST).
* Note: this class was already failing on main per
docs/PHASE1-STEP1-PREMERGE-AUDIT.md row #4 because it referenced
codec_dashboard.save_skill which never existed on main; the
deletion is a positive net for the test suite.
- tests/test_security.py:
* test_save_skill_validates_content removed. Same root cause as
above — referenced codec_dashboard.save_skill, was already
failing on main per the pre-merge audit row #20.
Audit closure:
- docs/audits/PHASE-1-SECURITY.md:
* D-2 closure footnote appended
* D-3 closure footnote appended
- docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md:
* D-2 row status: W1 → W1 — CLOSED (PR-1B)
* D-3 row status: W1 → W1 — CLOSED (PR-1B)
Verification:
- pytest tests/test_skill_routes.py — 9 passed (RED watched,
then GREEN after endpoint removal)
- pytest tests/test_skill_registry.py — 13 passed (PR-1A tests
still green)
- pytest tests/test_skill_contracts.py — 1 passed
- pytest tests/test_oauth_provider.py + test_retry.py — passed
- python3 tests/test_skill_imports.py — 76 skills parsed, 0 errors
- python3 tools/generate_skill_manifest.py --check — manifest current
- ruff check routes/skills.py tests/test_skill_routes.py — clean
- Full regression against PR-1A baseline (stash-and-rerun): same
12 pre-existing failures, no new regressions. Net failures count
drops by 2 (the deleted broken tests).
Out of scope (later PRs):
- D-4 file_write block-roots expansion (PR-1C)
- D-5 permission_gate realpath + path-blocklist (PR-1D)
- D-17 positive-allowlist for is_dangerous_skill_code (optional PR-1E)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AVADSA25
added a commit
that referenced
this pull request
May 17, 2026
* fix(security): remove /api/save_skill and /api/forge endpoints (D-2 + D-3)
Closes the two write-path criticals from Phase 1 Audit D (see
docs/audits/PHASE-1-SECURITY.md):
D-2 /api/forge — URL fetch → LLM → write skill, no review gate
D-3 /api/save_skill — direct write with only a substring blocker
Both endpoints are removed entirely. Skill creation now routes
exclusively through /api/skill/review → /api/skill/approve (the
human-review-and-approve flow). The URL-fetch capability is
intentionally dropped per Mickael decision Q1 — anyone wanting to
import skill code from a URL pastes the source into the editor and
goes through the review-and-approve flow like any other skill.
Defense in depth with PR-1A: even if a malicious file reaches
~/.codec/skills/ via some other path, SkillRegistry.load refuses
it at load time via the manifest + AST gate.
Live proof of D-2: the initial RED-phase test_post_forge_returns_404
actually called the live LLM, fetched example.com, AND wrote a real
fetch_example_domain_html.py to skills/ before the endpoint was
removed. The endpoint was a working SSRF + RCE-write primitive on
this machine right up to the moment it was deleted.
Changes:
Code:
- routes/skills.py:
* delete async def save_skill (D-3 handler)
* delete async def forge_skill (D-2 handler)
* module docstring updated to reflect the review-and-approve-only
contract; unused imports (DASHBOARD_DIR, CONFIG_PATH) dropped;
import style PEP8'd
- codec_vibe.html:
* remove the entire Forge Modal (HTML markup + overlay + tabs)
* remove .fm-* CSS block
* remove Skill + Forge toolbar buttons (kept Test button — it
calls the independent /api/run_code, not the removed endpoints)
* remove JS: doSkill, doForge, submitForge, closeForgeModal,
setForgeMode, _forgeMode, fmOverlay event handlers
* update Vibe system prompt — no more mention of Skill Forge
- scripts/feature_audit.py:
* features 10 and 12 now assert 404 (proof of removal) instead
of "endpoint responsive"
Docs / UI references:
- AGENTS.md §7: new "Skill creation flow — review-and-approve only"
subsection above the Skill load-time safety gate subsection.
Documents D-2 + D-3 removal trail.
- codec_cortex.html: CODEC Vibe panel rewritten — "Skill Forge"
removed from subtitle, features, description.
- codec_chat.html: Vibe blurb updated.
- README.md: 5 mentions updated (table row, section heading,
"Writes its own plugins" comparison row, IDE comparison row,
Project Structure description).
- FEATURES.md: Vibe feature list renumbered — Skill Forge items
removed; the human-review approval workflow stays as item 17.
- setup_codec.py:457: Vibe install description.
- docs/API.md: /api/forge section removed; /api/skill/review +
/api/skill/approve are now the documented contract.
Tests:
- NEW tests/test_skill_routes.py — 9 tests covering removal +
replacement:
* test_save_skill_handler_removed
* test_forge_skill_handler_removed
* test_no_route_decorator_strings_for_removed_endpoints
* test_replacement_handlers_present
* test_post_save_skill_returns_404
* test_post_forge_returns_404
* test_post_skill_review_still_accepts_valid_body
* test_skill_review_rejects_empty_body
* test_skill_approve_writes_only_after_review (full review →
approve flow exercised via FastAPI TestClient, asserts
no-disk-write at review step and disk-write at approve step)
- tests/test_dashboard.py:
* test_forge_requires_input replaced with
test_forge_endpoint_removed (asserts 404)
* added test_save_skill_endpoint_removed
- tests/test_full_product_audit.py:
* test_forge_endpoint replaced with test_forge_endpoint_removed
- tests/test_critical_fixes.py:
* TestSkillForgeBlocklist class removed entirely. The substring
blocker it tested is gone with the endpoints. Dangerous-pattern
coverage now lives in tests/test_skill_registry.py (load-time
AST gate) and routes/skills.py:skill_approve (write-time AST).
* Note: this class was already failing on main per
docs/PHASE1-STEP1-PREMERGE-AUDIT.md row #4 because it referenced
codec_dashboard.save_skill which never existed on main; the
deletion is a positive net for the test suite.
- tests/test_security.py:
* test_save_skill_validates_content removed. Same root cause as
above — referenced codec_dashboard.save_skill, was already
failing on main per the pre-merge audit row #20.
Audit closure:
- docs/audits/PHASE-1-SECURITY.md:
* D-2 closure footnote appended
* D-3 closure footnote appended
- docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md:
* D-2 row status: W1 → W1 — CLOSED (PR-1B)
* D-3 row status: W1 → W1 — CLOSED (PR-1B)
Verification:
- pytest tests/test_skill_routes.py — 9 passed (RED watched,
then GREEN after endpoint removal)
- pytest tests/test_skill_registry.py — 13 passed (PR-1A tests
still green)
- pytest tests/test_skill_contracts.py — 1 passed
- pytest tests/test_oauth_provider.py + test_retry.py — passed
- python3 tests/test_skill_imports.py — 76 skills parsed, 0 errors
- python3 tools/generate_skill_manifest.py --check — manifest current
- ruff check routes/skills.py tests/test_skill_routes.py — clean
- Full regression against PR-1A baseline (stash-and-rerun): same
12 pre-existing failures, no new regressions. Net failures count
drops by 2 (the deleted broken tests).
Out of scope (later PRs):
- D-4 file_write block-roots expansion (PR-1C)
- D-5 permission_gate realpath + path-blocklist (PR-1D)
- D-17 positive-allowlist for is_dangerous_skill_code (optional PR-1E)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(audits): record PR-1A merge commit hash in D-1 closure footnote
PR-1A (#42) merged to main as squash commit 48ec5d5. Update the
D-1 closure footnote in docs/audits/PHASE-1-SECURITY.md and the
D-1 row in docs/audits/PHASE-1-CONSOLIDATED-TRIAGE.md to reference
the PR number + commit hash, replacing the branch-name placeholder.
This commit lands on the PR-1B branch (fix/pr1b-remove-save-skill-and-forge)
so the citation travels with PR-1B's next merge.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(config): handle headless ImportError on pynput gracefully
CI failure root cause for PR-1A (#42) and PR-1B (#43): the smoke job
on Ubuntu runners failed at test collection because importing
codec_skill_registry → codec_config → `from pynput import keyboard`
raised ImportError on a headless display:
ImportError: this platform is not supported:
('failed to acquire X connection: Bad display name ""')
pynput needs a real display (X11 / AppKit / win32). GitHub Actions
Linux runners are headless. Other modules import codec_config for
its constants and is_dangerous_skill_code — none of them need the
keyboard subsystem.
Fix: wrap the pynput import in try/except so codec_config is import-
safe in headless contexts. The two `getattr(keyboard.Key, ...)` call
sites in `_resolve_key` now early-return None when keyboard is None
(the resulting KEY_TOGGLE / KEY_VOICE / KEY_TEXT module constants
are None on headless systems — only matters if the live keyboard
listener actually runs, which it doesn't in CI).
This was triggered by the new PR-1A test
`tests/test_skill_registry.py` which CI now runs per the new
`Skill registry load-time AST gate tests (D-1)` step. Before PR-1A,
no CI-gated test imported codec_config, so the pynput crash was
latent. Production paths (codec.py daemon, dashboard, etc.) all
run on macOS where pynput imports fine.
Verified locally:
- pytest tests/test_skill_routes.py tests/test_skill_registry.py
tests/test_skill_contracts.py tests/test_oauth_provider.py
tests/test_retry.py → 28 passed
- python3 tests/test_skill_imports.py → 76 skills parsed, 0 errors
- python3 tools/generate_skill_manifest.py --check → ok
- Headless simulation: monkey-patch builtins.__import__ to raise on
pynput, then `from codec_config import is_dangerous_skill_code` →
succeeds, returns (True, 'Dangerous import: subprocess') correctly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Mickael Farina <farina.mickael@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidated fast-follow PR addressing all 5 deferred items from the Phase 3 Step 9 code review (PR #19). Merged into a single PR per user request to keep workflow simple.
What this fixes
step_budget_exhaustedmapped toblocked_on_permission(semantic mismatch — there's no permission to grant)pausedwith reason. NewPOST /api/agents/{id}/extend_budgetendpoint bumps the current checkpoint's budget viastate.jsonoverrides (plan stays immutable; tamper check intact).AGENT_RESUMEDemit had nocorrelation_id, orphaned from subsequent_run_agentchain_run_agent(agent_id, cid=None)now accepts optional cid; daemon mintsrecovery_cidonce and threads it through both the audit emit AND the new_run_agentinvocation.permission_gatehad no test fordomain_not_authorizedpathtest_permission_gate_blocks_domain_not_in_grants.check_system_health()docstring documenting that PM2autorestart: trueis the supervision contract for daemons. No behavior change.PermissionManifest.read_pathsdeclared butpermission_gateonly enforceswrite_paths(asymmetry undocumented)Actionfield + LLM prompt update — out of scope for Step 9).Tests
test_agent_runner.py: 33 → 36 tests (+1 from M1, +1 from I4, +3 from I2 minus 2 not added)test_agent_runner.py + test_agent_plan.pysuite: 67 passed, 0 new failuresFiles changed
codec_agent_runner.pytests/test_agent_runner.pyroutes/agents.py/extend_budgetendpoint +ExtendBudgetBodymodelcodec_heartbeat.pyProvenance
This PR consolidates outputs from 5 side-chat sessions that the user spawned via review-feedback chips. The user requested I wrap the work cleanly in this main session — so all 5 side-chats' output is folded here, with I2 (which paused awaiting design confirmation) implemented per side-chat #3's recommended option 2 (state.json override, plan stays immutable).
Test plan
tests/test_agent_runner.py + test_agent_plan.py→ 67 passed🤖 Generated with Claude Code