You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of #302 (Phase 1). This is the ratifying decision for the whole epic — do this first.
Decision to make
How does a policy engine (prompt decoration, response gating/amendment, drift scoring per CLLAMA_SPEC §4B–D) attach to a pod? Three candidate shapes:
Policy sidecar consulted by cllama via hooks (recommended) — cllama gains an optional PolicyEvaluator; a separate policy service implements a versioned HTTP contract.
Second proxy image stacked in front of/behind cllama — conformant with "swappable proxy image" framing, but requires re-implementing dual ingress, key custody, tool mediation, feeds, history, and telemetry in the policy proxy, AND making multi-proxy stacking real (currently fails fast at runtime).
In-tree policy engine inside cllama — couples policy intelligence to the transport reference, contradicting the manifesto's swappable-governance stance.
Why (1) is recommended
The request lifecycle in cllama/internal/proxy/handler.go has five natural interception points: pre-flight gate (after identity + context load) → tool filter (after manifest load) → prompt decoration (before dispatch) → response gate (after upstream response) → drift/score log (after recording).
cllama keeps everything it is good at; the policy service stays language-agnostic behind HTTP; either side can be swapped independently.
Nil evaluator = bit-identical passthrough, preserving every existing deployment.
Deliverables
docs/decisions/025-policy-plane.md
New CLLAMA_SPEC section: the versioned policy contract — endpoints (e.g. POST /policy/decorate, /policy/gate-request, /policy/gate-response, /policy/score), inputs (agent identity, contract/rules references from the context mount, messages, tool manifest), outputs (message mutations, allow/deny/amend verdicts with reasons, telemetry annotations).
Open questions the ADR must answer
Fail-open vs fail-closed per hook (gates likely fail-closed, decoration fail-open?) and operator override.
Latency budget per hook; total added-overhead cap; timeouts.
Streaming: response gating vs SSE — buffer-and-gate, gate-on-complete, or first-token policy? This is the hardest design point.
Caching of decoration results across turns.
Telemetry surface: policy verdicts as intervention values (policy_denied, policy_amended, policy_decorated)?
Does the policy service see managed-tool round results inside the mediation loop, or only ingress/egress?
Part of #302 (Phase 1). This is the ratifying decision for the whole epic — do this first.
Decision to make
How does a policy engine (prompt decoration, response gating/amendment, drift scoring per CLLAMA_SPEC §4B–D) attach to a pod? Three candidate shapes:
PolicyEvaluator; a separate policy service implements a versioned HTTP contract.Why (1) is recommended
cllama/internal/proxy/handler.gohas five natural interception points: pre-flight gate (after identity + context load) → tool filter (after manifest load) → prompt decoration (before dispatch) → response gate (after upstream response) → drift/score log (after recording).Deliverables
docs/decisions/025-policy-plane.mdPOST /policy/decorate,/policy/gate-request,/policy/gate-response,/policy/score), inputs (agent identity, contract/rules references from the context mount, messages, tool manifest), outputs (message mutations, allow/deny/amend verdicts with reasons, telemetry annotations).Open questions the ADR must answer
interventionvalues (policy_denied,policy_amended,policy_decorated)?