Skip to content
This repository was archived by the owner on Mar 26, 2026. It is now read-only.

docs: HITL gates design spec#296

Open
ciaranRoche wants to merge 2 commits intojsell-rh:mainfrom
ciaranRoche:docs/hitl-gates-spec
Open

docs: HITL gates design spec#296
ciaranRoche wants to merge 2 commits intojsell-rh:mainfrom
ciaranRoche:docs/hitl-gates-spec

Conversation

@ciaranRoche
Copy link
Copy Markdown

Summary

Design document proposing a unified Human-in-the-Loop (HITL) gate system for OpenDispatch. This is a spec for review — no code changes.

What it proposes

A Gate is a checkpoint that blocks an action until a human resolves it. Four gate types:

  • Lifecycle gates — require operator approval before spawn/restart/stop (prevents unsupervised agent self-replication)
  • Task transition gates — configurable rules like "review→done requires approval" (prevents agents closing work without sign-off)
  • Decision gates — structured request→poll→receive flow via new request_gate/check_gate MCP tools (replaces the fragile request_decision → tmux paste loop)
  • Tool approval upgrade — models existing tool approval as a gate, adding deny path + reason capture

Key design decisions

  • Opt-in by default — gates only fire when HITL policy is configured (zero breaking changes)
  • MCP tools never block — return gate_id immediately, agents poll check_gate
  • Backend-agnostic — works for tmux and ambient
  • Configurable via fleet YAMLhitl: section on space and agent definitions
  • Timeout/escalation — configurable auto-approve, auto-deny, or escalate on timeout

Open questions for reviewers

  1. Should gate resolution trigger a nudge to the agent?
  2. Should gates support multiple approvers?
  3. Should tool approval deny send Escape or just record the denial?
  4. How should the ignition prompt reference active HITL policies?
  5. Should denied lifecycle gates be retryable?
  6. Should the Approvals tab replace InterruptTracker.vue or run alongside it?

Files

  • docs/design-docs/hitl-gates.md — the full design spec

🤖 Generated with Claude Code

Copy link
Copy Markdown
Owner

@jsell-rh jsell-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough design — this addresses real, compounding problems in the current oversight story. The fact that tmuxApprove sent Enter while request_decision pasted text and hoped for C-m to submit it is exactly the kind of brittleness a unified gate model removes. Commenting on the six open questions and a few implementation notes.

Open Questions

1. Should gate resolution nudge the agent?
Agree with the doc's recommendation: no. Nudging injects tmux-paste back into a flow that was specifically designed to escape it. Polling check_gate is correct. If we want an agent to react faster, the ignition prompt should tell it to poll on a reasonable cadence (10–30s) rather than wait for a push.

2. Multiple approvers?
Phase 1: no. The audit trail the Gate model gives us is sufficient. If multi-approver becomes a requirement, the data model already supports adding an approvals []GateApproval relation without breaking the existing resolved_by field. Leave it open rather than designing it now.

3. Tool approval deny: Escape vs. record-only?
Send Escape. Record-only silently leaves Claude Code in a blocked state waiting for the tool call to be answered — that's worse than the retry risk. The denial reason captured in gate.resolution is returned by check_gate, so the agent can see it next time it checks in. Worth noting: Escape behavior may vary across Claude Code versions, so this path should be exercised in CI the same way tmuxApprove is tested.

4. Ignition prompt format?
Brief summary as shown in §12.2 is right. Agents don't need the full rule table at boot — just awareness that gates exist and the two tool names (request_gate, check_gate). Full policy is accessible via boss://protocol or a new boss://hitl-policy/{space} MCP resource. Consider auto-injecting the summary only when HITLPolicy is non-empty to avoid noisy ignition prompts for unconfigured spaces.

5. Denied lifecycle gates retryable?
Yes — new request, new gate. The old gate stays in the audit trail. This is also more consistent with how decision gates work: an agent that receives denied for a spawn can make an architectural adjustment and ask again. The alternative (one-strike denial) creates confusing dead ends.

6. Approvals tab vs. InterruptTracker.vue?
Run both in parallel through Phase 1. The interrupts table isn't going away immediately, and existing operators are familiar with the tracker. Phase 2 unification is the right call. One practical note: when gates supersede an interrupt record, mark the interrupt resolved server-side so InterruptTracker.vue doesn't show it as a duplicate pending action.


Implementation Notes

Gate ID namespace collision. The spec shows "gate_<timestamp>". Millisecond timestamps will collide under concurrent requests. Suggest "gate_" + ulid() or "gate_" + uuid() — the same approach used for agent IDs. Very low cost to fix before any code is written.

check_gate without polling interval guidance. The doc says "10–30 second poll interval is reasonable" but this is hidden in §8.4. The MCP tool response for request_gate should include a poll_interval_sec hint (e.g., 15) so agents don't have to read the spec to know the right cadence.

gateTimeoutLoop and liveness loop coupling. §11.1 shows the timeout loop stopping on s.stopLiveness. This is correct for the cleanup path, but name the channel more generically (s.stopBackground) or give it its own done channel — liveness and gate timeouts are separate concerns and shouldn't share stop signals.

Cascade on space delete. When a space is deleted, its pending gates should either be cancelled or cascade-deleted. The DB schema should include ON DELETE CASCADE on space_name (referencing the spaces table), or the space delete handler needs to explicitly cancel pending gates before teardown.

Fleet YAML dry-run output. The example output in §10.3 is exactly the right level of detail. Make sure the diff logic treats "no HITL policy" → "HITL policy set" as a change and "HITL policy set" → "no HITL policy" as a removal — same pattern as how description diffs work in fleet.go.


What's well-done

  • The backward-compat guarantees in §12 are solid. Zero breaking changes on unconfigured spaces is the right default.
  • The phased rollout (§13) is realistic. Phase 1b (MCP tools + decision gates) is the highest-value, lowest-risk starting point.
  • Colocating HITLPolicy in existing agents.config JSON avoids a schema migration for the common case.
  • The sequence diagrams in §8 are the clearest part of the doc — should stay through implementation.

Approving as a design spec. Ready to move to Phase 1a once the gate ID format is clarified. Great contribution.

@jsell-rh
Copy link
Copy Markdown
Owner

Operator feedback (additional review pass):


§4.1 — Decision gates: polling is unreliable

Poll check_gate to check status

The doc recommends agents poll at a 10–30s interval. In practice, agents don't always reliably poll — they get busy with other work and miss the window, or restart and lose the polling loop entirely. The nudge-on-message pattern already exists and works: when a message is delivered to an agent's inbox, the coordinator sends a tmux nudge prompting the agent to check in.

Recommendation: On gate resolution, fire the existing nudge mechanism to the requesting agent in addition to setting gate.status = approved/denied. This makes resolution a push, not a pull. Polling via check_gate remains as a fallback for agents that want to query explicit state.


§4.2 — Task transition gates: no notification on successful execution

When the operator approves a task gate, the coordinator moves the task server-side — but there is no notification to the agent that the move happened. The agent either has to poll check_gate or notice the task status change on its own.

Recommendation: Pair the server-side task move with a message to the requesting agent: "Task TASK-42 moved from review → done (approved by operator)." Follow it with a nudge so the agent picks it up promptly. This closes the feedback loop without requiring polling.


§4.3 — Decision gates: rely on message + nudge instead of polling

Same as §4.1 — the polling model is fragile. The message + nudge system is the established, reliable pattern.

Recommendation: When a decision gate is resolved, deliver the resolution as a message to the requesting agent's inbox and nudge the agent. The agent reads it in its next check-in cycle — same as any other message. check_gate remains available for agents that want to verify state explicitly (e.g., after a restart), but the primary delivery path should be push, not pull.


§4.4 — Tool approval deny: message instead of Escape

Should deny send Escape to session?

Sending Escape stops the tool call but leaves the agent without context — it just sees the call rejected with no explanation. The agent may retry blindly or get confused about why the action was blocked.

Alternative recommendation: On deny, don't send Escape. Instead:

  1. Record the denial rationale in gate.resolution (as the doc already proposes).
  2. Deliver the rationale as a message to the agent's inbox: "Tool call [Bash] denied by operator. Reason: [rationale]." + nudge.
  3. Let the agent decide how to continue — it has the information it needs.

This keeps the agent operational rather than stopping it dead in the water. If the operator wants to cancel the specific tool call they can note that in the rationale. The trade-off is that the tmux tool prompt remains pending — the agent will need to handle or re-navigate it. Worth calling out explicitly in the design whether that's acceptable.


§12.2 — Ignition prompt: same polling concern

The ignition prompt as drafted tells agents to "poll check_gate" as the primary resolution mechanism. Given the feedback above, this should be updated to reflect the push model: agents should expect a message + nudge when a gate resolves, and use check_gate only to query current state or recover after a restart.

@jsell-rh
Copy link
Copy Markdown
Owner

RE: 4.4

Further, perhaps the deny path has an optional checkbox that will stop the agent execution.

@jsell-rh
Copy link
Copy Markdown
Owner

Operator Feedback — Structured Review

Good design overall. Main cross-cutting concern: polling is unreliable — we've seen agents not poll consistently, especially when busy. The spec should lean on the existing nudge-on-message system for gate resolution rather than requiring agents to poll.


§4.2 Task Transition Gates

Missing: notification to agent on gate resolution.

When the gate fires, the task stays blocked and the agent gets a gate_id back. But how does the agent know when the gate resolves? The current flow has no server-side push — the agent would need to poll check_gate repeatedly. In practice agents don't do this reliably.

Suggestion: On gate resolution (approve or deny), the server should send a message to the agent (same path as send_message) and nudge its tmux session. The agent receives a message like:

"Gate gate_abc resolved: TASK-42 has been moved to done."

This means agents don't need to poll at all — they just handle the resolution like any other inbound message.


§4.3 Decision Gates — Polling Concern

The spec relies on agents polling check_gate — this is the same fragile pattern we're trying to replace.

The current request_decision flow is fragile because agents don't reliably receive the answer. This spec replaces the mechanism (tmux paste → structured JSON) but keeps the same agent-side behavior (poll and hope).

Suggestion: When the operator resolves a decision gate, the backend should:

  1. Update the gate status in SQLite
  2. Send a message to the requesting agent with the resolution
  3. Nudge the agent's tmux session

Agents can optionally call check_gate for a structured response, but the primary delivery path should be message + nudge — not polling. This aligns with how the rest of the system works.


§4.4 Tool Approval Gates — Deny Behavior

Sending Escape on deny may stop the agent dead in the water.

The current deny path sends Escape to cancel the tool call. This works for tmux, but it interrupts the agent's flow without explanation. The agent has no idea why it was denied or what to do next.

Suggestion: On a deny:

  1. Record the rationale in the gate record (audit trail)
  2. Send the rationale as a message to the agent rather than (or in addition to) sending Escape
  3. This lets the agent understand the denial and continue operating with context

Example message: "Tool bash denied: do not run network scans in this environment. Please use an alternative approach."

The agent can then adapt rather than being stopped cold. If Escape must be sent (e.g. Claude Code won't proceed without it), consider sending it after the message is delivered.

Open question inherited from §14.3: For ambient backends, Escape isn't applicable — the message-based path is the only option anyway. Worth clarifying this in the spec.


§12.2 Ignition Prompt — Polling Instruction

The current draft tells agents to check_gate to poll for resolution. Given the polling reliability concerns above, the ignition prompt should instead instruct agents to await a message when a gate fires:

## HITL Policy (Active)

When an action requires approval, the MCP tool returns a `gate_id` with
status `pending_approval`. **Do not poll for the resolution.**
Continue working on other tasks. When the gate resolves, you will
receive a message with the outcome — treat it like any other inbound
message and act accordingly.

Summary of Requested Changes

Section Issue Suggested Fix
§4.2 No agent notification on gate resolution Server sends message + nudge on resolve
§4.3 Polling is unreliable primary path Message + nudge as primary delivery; check_gate as optional read
§4.4 Deny sends Escape, no context for agent Send denial rationale as message before/instead of Escape
§12.2 Ignition prompt teaches polling Update to "await a message" pattern

ciaranRoche added a commit to ciaranRoche/agent-boss that referenced this pull request Mar 26, 2026
Key changes based on reviewer feedback:

- Resolution delivery is now push (message + nudge), not pull (polling).
  check_gate remains as a fallback for restart recovery and explicit
  state queries. This aligns with the established nudge-on-message
  pattern that already works reliably across the system.

- Tool approval deny delivers rationale as a message rather than
  sending Escape by default. Optional "interrupt agent" checkbox
  sends Escape when explicitly requested. Ambient backend note added.

- Gate IDs use ULID format to avoid collision under concurrent requests.

- request_gate response includes poll_interval_sec hint.

- Gate timeout loop uses dedicated stop channel (not s.stopLiveness).

- Space deletion cascades to pending gates.

- Interrupt dedup: when a gate supersedes an interrupt, mark the
  interrupt resolved server-side.

- Ignition prompt updated to "await a message" pattern.

- Open questions Q1 and Q3 resolved per reviewer decisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ciaranRoche and others added 2 commits March 26, 2026 05:26
Introduces a design document for a Human-in-the-Loop (HITL) gate system
that provides configurable safety guardrails for agent actions. The spec
covers lifecycle gates, task transition gates, structured decision gates,
and tool approval upgrades.

Design doc only — no code changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key changes based on reviewer feedback:

- Resolution delivery is now push (message + nudge), not pull (polling).
  check_gate remains as a fallback for restart recovery and explicit
  state queries. This aligns with the established nudge-on-message
  pattern that already works reliably across the system.

- Tool approval deny delivers rationale as a message rather than
  sending Escape by default. Optional "interrupt agent" checkbox
  sends Escape when explicitly requested. Ambient backend note added.

- Gate IDs use ULID format to avoid collision under concurrent requests.

- request_gate response includes poll_interval_sec hint.

- Gate timeout loop uses dedicated stop channel (not s.stopLiveness).

- Space deletion cascades to pending gates.

- Interrupt dedup: when a gate supersedes an interrupt, mark the
  interrupt resolved server-side.

- Ignition prompt updated to "await a message" pattern.

- Open questions Q1 and Q3 resolved per reviewer decisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ciaranRoche ciaranRoche force-pushed the docs/hitl-gates-spec branch from f243140 to e12690c Compare March 26, 2026 05:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants