Releases · mostlydev/cllama

09 Apr 01:07

mostlydev

v0.3.3

b56deba

v0.3.3 — native Google Gemini provider support Latest

Latest

Highlights

Native google provider — adds first-class Google Gemini provider support to the cllama provider registry. Operators can now route models through google/<model> refs (e.g. google/gemini-2.5-flash) using a native GEMINI_API_KEY (with GOOGLE_API_KEY accepted as a lower-priority alias) instead of going through OpenRouter. Default API format is OpenAI-compatible against the Gemini OpenAI endpoint.
Gemini cost tracking — pricing entries for Gemini 2.5 Flash and Gemini 2.5 Pro are now wired into the cost telemetry path so direct-Google routing carries accurate per-token accounting alongside the existing OpenRouter route.

Companion to mostlydev/clawdapus#119.

Assets 2

08 Apr 19:31

mostlydev

v0.3.2

732597c

v0.3.2 — Anthropic prompt cache fix

Fixes

Cache-friendly prefix ordering for feed injection — Feeds and timestamps are now appended after the system prompt instead of prepended before it. Anthropic prompt caching is prefix-matched with a 5-min TTL, so prepending dynamic content was invalidating the cache on every request. A 3-agent pod was paying ~$60/week in unnecessary cache_creation costs; this fix eliminates that. Fixes mostlydev/clawdapus#122.

Assets 2

05 Apr 19:11

mostlydev

v0.3.1

86f6071

v0.3.1 — managed tool manifest state observability

Highlights

log managed tool manifest state on every request: proxy telemetry now emits manifest_present (bool) and tools_count (int) so operators can verify at runtime whether a per-agent tool manifest was loaded and how many tools it contained

This closes an observability gap that made it hard to diagnose cases where compiled tools.json existed on disk but tools were not being injected into upstream requests — there was no runtime signal telling operators which agents had live tool manifests.

Artifacts

container image: `ghcr.io/mostlydev/cllama:v0.3.1`
rolling tag: `ghcr.io/mostlydev/cllama:latest`

Validation

`go test ./...`

Assets 2

05 Apr 19:10

mostlydev

v0.3.0

58c02b9

v0.3.0 — managed tool mediation + memory plane + scoped history API

Highlights

Managed tool mediation

load and inject compiled tools.json manifests into upstream LLM requests
execute managed tools via HTTP against declared services (OpenAI-compatible format)
Anthropic-format tool mediation (parallel path to OpenAI)
cross-turn continuity: replay hidden tool rounds into subsequent upstream requests so the LLM sees the transcript that produced each runner-visible reply
re-stream final text as synthetic SSE after mediated loops complete; keepalive comments prevent runner timeouts during long loops
budget limits: max rounds, per-tool timeout, total timeout, result size truncation with explicit truncated: true flag
body_key execution: wrap tool arguments as {body_key: args} when declared in the tool descriptor
sanitize managed tool names for provider compatibility

Memory plane

pre-turn recall and post-turn best-effort retain hooks
memory_op telemetry events with recall/retain outcome, latency, block count, injected bytes, policy-removal counts
secret-shaped value scrubbing on both retain payloads and recalled blocks
tightened memory recall auth and history auth handling

Session history API

scoped history read API for agents querying their own transcripts
dedicated replay auth tokens separate from agent bearer tokens
stable per-entry IDs
index replay for after queries (no full-rescan)

Provider fixes

xAI env seeding regression coverage

Artifacts

container image: ghcr.io/mostlydev/cllama:v0.3.0
rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

go test ./...

Assets 2

28 Mar 00:20

mostlydev

v0.2.5

2ff644e

v0.2.5

Highlights

enforce declared per-agent model policy in cllama
normalize runner model requests against the compiled allowlist
restrict provider failover to the pod-declared fallback chain
add xAI routing/policy fixes needed for direct xai/... model refs

Artifacts

container image: ghcr.io/mostlydev/cllama:v0.2.5
rolling tag: ghcr.io/mostlydev/cllama:latest

Validation

go test ./...

Assets 2

26 Mar 02:33

mostlydev

v0.2.3

4090ccf

v0.2.3

Changes

Unpriced request tracking: requests where the upstream provider returns no cost data are now counted separately as unpriced_requests in the cost API response and surfaced in the dashboard UI
Reported cost passthrough: CostInfo.CostUSD is now *float64 (nil = unpriced, not zero); provider-reported cost fields are propagated through the proxy
Timezone context: time_context.go injects timezone-aware current time for agents that declare a TZ environment variable
Dashboard: total_requests and unpriced_requests exposed in the costs API endpoint

Assets 2

25 Mar 02:15

mostlydev

v0.2.2

b20e7e1

v0.2.2 — provider token pool + runtime provider add

What's new

Provider token pool: Multi-key pool per provider with states ready/cooldown/dead/disabled. Proxy retries across keys on 401/429/5xx with failure classification and Retry-After support.
Runtime provider add: POST /providers/add UI route — add a new provider (name, base URL, auth type, API key) at runtime with no restart. Persists to .claw-auth/providers.json with source: runtime.
ProviderState.Source: New field (seed/runtime) survives JSON round-trips.
UI bearer auth: All routes gated by CLLAMA_UI_TOKEN when configured.
Key management routes: POST /keys/add and POST /keys/delete.
Webhook alerts: CLLAMA_ALERT_WEBHOOKS and CLLAMA_ALERT_MENTIONS for pool events.

Assets 2

22 Mar 16:39

mostlydev

v0.2.1

a5f68b4

v0.2.1 — Feed Auth

What's new

Feed authentication: FeedEntry now supports an auth field. When present, the feed fetcher sets an Authorization: Bearer header on the fetch request. This enables authenticated feeds from services like claw-api that require bearer token auth.

Backward compatible — existing feeds.json without auth fields work unchanged.

Assets 2

22 Mar 15:00

mostlydev

v0.2.0

94ae8a3

v0.2.0 — Feed Injection (ADR-013 Milestone 2)

What's Changed

Features

Feed injection — The proxy now supports runtime feed injection into LLM requests. Feeds defined in agent context manifests are fetched with TTL-based caching and injected as system message content before forwarding to the upstream provider. Both OpenAI (messages[]) and Anthropic (top-level system) formats are supported.
- internal/feeds/manifest.go — feed manifest parsing from agent context
- internal/feeds/fetcher.go — HTTP fetcher with TTL-based caching
- internal/feeds/inject.go — system message injection for OpenAI and Anthropic formats
Agent context extensions — AgentContext now exposes ContextDir for feed manifest discovery and service auth loading.
Proxy handler — New WithFeeds option wires feed injection into the proxy pipeline, gated by pod name.
Cost logging improvements — Better tracking in the logging layer.

Docker Image

ghcr.io/mostlydev/cllama:v0.3.4 — multi-arch (linux/amd64 + linux/arm64)
ghcr.io/mostlydev/cllama:latest

Test Coverage

Feed fetcher, injection, and manifest parsing
Proxy handler tests for both OpenAI and Anthropic feed injection paths
Agent context service auth loading

Assets 2

19 Mar 16:22

mostlydev

v0.1.0

395a3a1

v0.1.0

cllama v0.1.0 — First Release

OpenAI-compatible governance proxy for AI agent pods. Zero external dependencies, ~15 MB distroless image.

Features

OpenAI-compatible proxy on :8080 — POST /v1/chat/completions with streaming
Anthropic Messages bridge — POST /v1/messages with native format translation
Multi-provider registry — OpenAI, Anthropic, OpenRouter, Ollama with automatic routing
Per-agent bearer token auth — agents never see real provider API keys
Real-time operator dashboard on :8081 — SSE-powered live view of all LLM calls with agent ID, model, tokens, cost, latency
Cost tracking — per-agent, per-model, per-provider usage extraction from upstream responses
Vendor-prefixed model fallback — routes anthropic/claude-* etc. through OpenRouter when direct provider key is unavailable

Container Image

docker pull ghcr.io/mostlydev/cllama:latest

Published publicly at ghcr.io/mostlydev/cllama:latest.

Assets 2

Releases: mostlydev/cllama

v0.3.3 — native Google Gemini provider support

Highlights

Uh oh!

v0.3.2 — Anthropic prompt cache fix

Fixes

Uh oh!

v0.3.1 — managed tool manifest state observability

Highlights

Artifacts

Validation

Uh oh!

v0.3.0 — managed tool mediation + memory plane + scoped history API

Highlights

Managed tool mediation

Memory plane

Session history API

Provider fixes

Artifacts

Validation

Uh oh!

v0.2.5

Highlights

Artifacts

Validation

Uh oh!

v0.2.3

Changes

Uh oh!

v0.2.2 — provider token pool + runtime provider add

What's new

Uh oh!

v0.2.1 — Feed Auth

What's new

Uh oh!

v0.2.0 — Feed Injection (ADR-013 Milestone 2)

What's Changed

Features

Docker Image

Test Coverage

Uh oh!

v0.1.0

cllama v0.1.0 — First Release

Features

Container Image

Uh oh!