Osaurus Router is the hosted inference path used by Osaurus accounts. It is implemented as an OpenAI-compatible remote provider with a few Router-only contracts for billing, request deduplication, and upstream compatibility.
This document captures the invariants that keep Router behavior reliable in the chat UI and agent loop.
Router availability depends on the local Osaurus identity. When identity is available, the Router provider is injected into the remote provider list and is eligible for the same model picker and chat paths as other remote providers.
Startup and recovery should stay automatic:
- App launch connects auto-connect providers, including Router.
- Identity changes trigger Router reinjection/reconnect instead of waiting for a manual Dashboard refresh.
- App activation retries discovery so a wake, sign-in, or delayed network state can recover without user action.
- Connect-phase transient failures can retry, but authentication, bad-request, and provider contract errors should surface as terminal errors.
Router requests use the OpenAI Chat Completions shape, then apply Router-only
normalization in RemoteProviderService.buildChatRequest.
Router-specific fields and transforms:
idempotency_keyis sent only to Router. Other OpenAI-compatible providers do not receive it because some reject unknown fields.clamp_to_balanceis explicitly set tofalsefor Router.- User multimodal content parts are preserved.
- Assistant history is normalized to string
contentbecause several upstreams reject assistant content arrays or omitted assistant content on tool-call turns. - A trailing plain assistant prefill is dropped for Router. Tool-call assistant
turns are preserved because the following tool result must stay grounded in
the prior
tool_calls. - If chat leaves
max_tokensimplicit, Router receives the chat engine default instead of relying on an upstream default. This prevents upstream 1024-token caps from turning long agent tasks into billed empty or truncated responses.
The request path should not add prompt coercion, fake model-family behavior, or provider-specific output filters. If an upstream model has an incompatible contract, fix the request shape or surface the provider error.
Router streaming goes through the shared OpenAI-compatible parser in
OpenAICompatibleStreamParser.swift. Provider-specific behavior stays outside
that parser; the shared layer owns framing, event decoding, and tool-call
accumulation.
Shared parser responsibilities:
- Tokenize SSE bytes on CR, LF, and CRLF only.
- Preserve JSON string content that contains other Unicode newline separators.
- Join compliant multi-line
data:fields per the SSE spec. - Optionally recover Router-compatible raw JSON bodies and proxy-split JSON payloads when policy allows it.
- Accumulate streaming tool calls by index, including continuation chunks that omit an index.
- Validate final tool-call arguments as JSON and classify truncated arguments as stream errors.
Router enables the compatibility policy for raw JSON fallback and split-data repair. Other OpenAI-compatible providers should stay on the strict path unless they prove they need the same tolerance.
If a stream ends with finish_reason=length and no visible text, reasoning, or
tool call was emitted, the parser treats it as an error. That state usually
means the provider spent output tokens without producing usable assistant
content, so it must not silently look like a successful empty answer.
Router billing metadata is carried through the stream separately from visible assistant text.
- Router summary frames become
RouterBillingSummary. - The stream yields a
StreamingBillingHintsentinel prefixed withU+FFFE. The UI filters this sentinel out of visible output and token counting. - The active assistant
ChatTurnstoresrouterBillingso a billed turn survives chat reloads. - If a billed turn finishes with no visible text, the chat renders an explicit empty-response notice instead of deleting the assistant bubble.
- Retry keys use a stable logical step key such as
<runId>:<attempt>, so connect-phase retries can be deduped server-side. A user-initiated Retry starts a new logical run and can bill normally.
The billing path is metadata-only. Prompt text, response text, tool arguments, and tool results must not be written to Router billing records.
Router charges are also persisted to an encrypted local ledger so support can debug "I was charged but saw nothing" reports without storing transcripts on Osaurus servers.
Ledger properties:
- File:
~/.osaurus/billing/ledger.sqlite - Encryption: SQLCipher via the shared storage key
- Retention: newest 10,000 rows and at most 365 days
- Export: metadata-only diagnostics from the Dashboard
- Correlation: request id, session id, assistant turn id, model, token counts, cost, status, app version, and rendered outcome
Outcomes are classified as rendered, toolOnly, reasoningOnly, empty,
error, or cancelled. This mirrors what the user saw in chat and lets support
distinguish a truly empty billed response from a tool-only or reasoning-only
turn.
Router has a low-volume diagnostic path for terminal streams that produce no visible text, reasoning, or tool calls. These logs are intentionally sanitized and do not include request bodies or generated text.
The log prefix is:
[Osaurus][Router][EmptyStream]
Useful fields include:
kind: terminal classification such asraw-empty,summary-only,usage-only,unrecognized-events, orempty-after-eventsfinish_reason: provider finish reason, includinglengthinputTokens/outputTokens: usage reported by the providervisibleDeltas,reasoningDeltas,toolHints,billingHints: what the UI actually receivedidempotency_suffix: last characters of the idempotency key for local correlation without printing the full keyrouterTransforms: whether Router-specific outbound transforms were applied
When investigating a billed empty response, pair the EmptyStream log with the
local billing diagnostics export. The log explains what the stream did; the
ledger explains what was charged and how the turn rendered.
Keep tests close to the contract:
RemoteChatRequestEncodingTestscovers Router-only request fields, message normalization, idempotency keys, and implicitmax_tokens.OpenAICompatibleStreamParserTestscovers shared SSE framing, raw JSON fallback, split-data repair, streaming tool-call accumulation, andfinish_reason=lengthhandling.OsaurusRouterProviderTestscovers Router adapter behavior such as billing summary frames and empty-stream diagnostics.RouterBillingDatabaseTests,RouterBillingLedgerTests, andRouterBillingOutcomeTestscover local metadata persistence and outcome classification.
When a regression is shared by multiple OpenAI-compatible providers, add it to the shared parser tests first. Router-specific tests should only cover Router adapter behavior: billing, diagnostics, request transforms, and policy selection.