ADR-025: policy plane architecture and policy service contract

Part of #302 (Phase 1). This is the ratifying decision for the whole epic — do this first.

## Decision to make

How does a policy engine (prompt decoration, response gating/amendment, drift scoring per CLLAMA_SPEC §4B–D) attach to a pod? Three candidate shapes:

1. **Policy sidecar consulted by cllama via hooks** (recommended) — cllama gains an optional `PolicyEvaluator`; a separate policy service implements a versioned HTTP contract.
2. **Second proxy image stacked in front of/behind cllama** — conformant with "swappable proxy image" framing, but requires re-implementing dual ingress, key custody, tool mediation, feeds, history, and telemetry in the policy proxy, AND making multi-proxy stacking real (currently fails fast at runtime).
3. **In-tree policy engine inside cllama** — couples policy intelligence to the transport reference, contradicting the manifesto's swappable-governance stance.

## Why (1) is recommended

- The request lifecycle in `cllama/internal/proxy/handler.go` has five natural interception points: pre-flight gate (after identity + context load) → tool filter (after manifest load) → prompt decoration (before dispatch) → response gate (after upstream response) → drift/score log (after recording).
- cllama keeps everything it is good at; the policy service stays language-agnostic behind HTTP; either side can be swapped independently.
- Nil evaluator = bit-identical passthrough, preserving every existing deployment.

## Deliverables

- `docs/decisions/025-policy-plane.md`
- New CLLAMA_SPEC section: the versioned **policy contract** — endpoints (e.g. `POST /policy/decorate`, `/policy/gate-request`, `/policy/gate-response`, `/policy/score`), inputs (agent identity, contract/rules references from the context mount, messages, tool manifest), outputs (message mutations, allow/deny/amend verdicts with reasons, telemetry annotations).

## Open questions the ADR must answer

- Fail-open vs fail-closed per hook (gates likely fail-closed, decoration fail-open?) and operator override.
- Latency budget per hook; total added-overhead cap; timeouts.
- **Streaming**: response gating vs SSE — buffer-and-gate, gate-on-complete, or first-token policy? This is the hardest design point.
- Caching of decoration results across turns.
- Telemetry surface: policy verdicts as `intervention` values (`policy_denied`, `policy_amended`, `policy_decorated`)?
- Does the policy service see managed-tool round results inside the mediation loop, or only ingress/egress?
- Where do rules come from (see the enforce/guide compilation issue under #302) and how do runtime rule changes propagate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-025: policy plane architecture and policy service contract #306

Decision to make

Why (1) is recommended

Deliverables

Open questions the ADR must answer

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ADR-025: policy plane architecture and policy service contract #306

Description

Decision to make

Why (1) is recommended

Deliverables

Open questions the ADR must answer

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions