Skip to content

safety: implement PLAN_AGENT_SAFETY_NOW#62

Merged
davidahmann merged 1 commit intomainfrom
codex/adhoc-agent-safety-now
Feb 24, 2026
Merged

safety: implement PLAN_AGENT_SAFETY_NOW#62
davidahmann merged 1 commit intomainfrom
codex/adhoc-agent-safety-now

Conversation

@davidahmann
Copy link
Collaborator

Problem

Implement product/PLAN_AGENT_SAFETY_NOW.md end-to-end: emergency stop preemption, destructive safety budgets/phases, scoped approvals, compaction-resilient invariants, and stop-latency SLO hardening.

Changes

  • Added job emergency-stop runtime status/reason, blocked-dispatch recording, and helper APIs.
  • Added gait job stop CLI path and MCP emergency-stop preemption with stable reason codes.
  • Added plan/apply phase semantics in gate intent and MCP normalization.
  • Enforced destructive apply approvals and destructive budget checks with deterministic reason codes.
  • Extended approval token schema/logic with max_targets and max_ops claims.
  • Persisted safety invariant ledger/hash in job/runpack/pack schemas and artifacts.
  • Added knowledge-worker-safe and updated baseline-highrisk policy examples and docs sync across README/docs/docs-site llm mirrors.
  • Added integration/e2e stop-latency SLO tests and wired chaos/hardening/release smoke script coverage.
  • Added targeted runtime branch tests for new path/invariant/dispatch validation branches.

Validation

  • ./gait doctor --json
  • Story command suite:
    • go test ./core/jobruntime ./core/mcp ./cmd/gait -run 'Stop|Cancel|Resume|Preempt' -count=1
    • go test ./core/gate -run 'Budget|RateLimit|Destructive|ReasonCode' -count=1
    • go test ./core/gate ./cmd/gait -run 'Policy|Template|RequireApproval' -count=1
    • go test ./core/gate ./core/mcp ./cmd/gait -run 'Plan|Apply|ApprovalToken' -count=1
    • go test ./core/gate ./cmd/gait -run 'Approval|Scope|Bulk|TTL' -count=1
    • go test ./core/jobruntime ./core/runpack ./core/pack -run 'Invariant|Compaction|Resume|Determinism' -count=1
    • go test ./internal/integration ./internal/e2e -run 'StopLatency|EmergencyStop' -count=1
  • Risk/acceptance/docs/release lanes:
    • make test-runtime-slo
    • make test-chaos
    • make bench-check
    • make test-v2-5-acceptance
    • make test-contracts
    • make test-context-conformance
    • make test-context-chaos
    • make test-hardening-acceptance
    • make test-docs-consistency
    • make test-release-smoke
  • Ship gate:
    • make prepush-full

All commands above passed locally.

@davidahmann davidahmann merged commit 52fe6ca into main Feb 24, 2026
7 checks passed
@davidahmann davidahmann deleted the codex/adhoc-agent-safety-now branch February 24, 2026 13:06
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9375d485d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

result.Violations = mergeUniqueSorted(result.Violations, []string{"rate_limit_exceeded"})
}
}
if outcome.DestructiveBudget.Requests > 0 && gate.IntentContainsDestructiveTarget(intent.Targets) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce destructive budget for script step targets

runGateEval only applies outcome.DestructiveBudget when IntentContainsDestructiveTarget(intent.Targets) is true, but script evaluations put destructive operations in intent.script.steps[*].targets and often leave top-level intent.targets empty. In that common script context, the destructive budget is computed by policy (evaluateScriptPolicyDetailed aggregates it) but never enforced here, so repeated destructive script executions can bypass the configured destructive budget guard.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant