Skip to content

ADR-025: policy plane architecture and policy service contract #306

@mostlydev

Description

@mostlydev

Part of #302 (Phase 1). This is the ratifying decision for the whole epic — do this first.

Decision to make

How does a policy engine (prompt decoration, response gating/amendment, drift scoring per CLLAMA_SPEC §4B–D) attach to a pod? Three candidate shapes:

  1. Policy sidecar consulted by cllama via hooks (recommended) — cllama gains an optional PolicyEvaluator; a separate policy service implements a versioned HTTP contract.
  2. Second proxy image stacked in front of/behind cllama — conformant with "swappable proxy image" framing, but requires re-implementing dual ingress, key custody, tool mediation, feeds, history, and telemetry in the policy proxy, AND making multi-proxy stacking real (currently fails fast at runtime).
  3. In-tree policy engine inside cllama — couples policy intelligence to the transport reference, contradicting the manifesto's swappable-governance stance.

Why (1) is recommended

  • The request lifecycle in cllama/internal/proxy/handler.go has five natural interception points: pre-flight gate (after identity + context load) → tool filter (after manifest load) → prompt decoration (before dispatch) → response gate (after upstream response) → drift/score log (after recording).
  • cllama keeps everything it is good at; the policy service stays language-agnostic behind HTTP; either side can be swapped independently.
  • Nil evaluator = bit-identical passthrough, preserving every existing deployment.

Deliverables

  • docs/decisions/025-policy-plane.md
  • New CLLAMA_SPEC section: the versioned policy contract — endpoints (e.g. POST /policy/decorate, /policy/gate-request, /policy/gate-response, /policy/score), inputs (agent identity, contract/rules references from the context mount, messages, tool manifest), outputs (message mutations, allow/deny/amend verdicts with reasons, telemetry annotations).

Open questions the ADR must answer

  • Fail-open vs fail-closed per hook (gates likely fail-closed, decoration fail-open?) and operator override.
  • Latency budget per hook; total added-overhead cap; timeouts.
  • Streaming: response gating vs SSE — buffer-and-gate, gate-on-complete, or first-token policy? This is the hardest design point.
  • Caching of decoration results across turns.
  • Telemetry surface: policy verdicts as intervention values (policy_denied, policy_amended, policy_decorated)?
  • Does the policy service see managed-tool round results inside the mediation loop, or only ingress/egress?
  • Where do rules come from (see the enforce/guide compilation issue under Policy plane: external policy service support for cllama (epic) #302) and how do runtime rule changes propagate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions