Skip to content

Token / cost visibility in /agents#79

Merged
TroyHernandez merged 2 commits into
mainfrom
agents-token-cost
May 14, 2026
Merged

Token / cost visibility in /agents#79
TroyHernandez merged 2 commits into
mainfrom
agents-token-cost

Conversation

@TroyHernandez
Copy link
Copy Markdown
Contributor

Summary

`/agents` (and the underlying `subagent_list()` / `format_subagent_list()`) now show model, age, live context size, cumulative input/output tokens, and cumulative cost per subagent. Dense single-line output, one line per agent.

Before:
```
[1] demo task (15.0 min remaining) idle stub-12345678
```

After:
```
[1] demo task (moonshot-v1-8k · 2s · ctx 617/128.0K · 741 in / 5 out · ?) (30.0 min remaining) idle 6777ea48
```

What changed

  • `subagent_turn_prompt()` now returns `list(reply, usage)` instead of just the reply string. Two internal callers updated (sync `subagent_query()`, async `subagent_collect()`); saber::blast_radius confirmed nothing external.
  • Each registry entry gains cumulative usage counters and a `query_count`. Both wait paths accumulate via the new `subagent_accumulate_usage()` helper. Cost is NA when the provider doesn't surface it (moonshot/ollama); distinguished from $0.
  • `subagent_list()` returns the new fields plus a best-effort live token count via `info$session$run()`. Busy agents (with a pending async call) get `NA` — callr can't stack queries on a session with an outstanding call.
  • `format_subagent_list()` renders the dense one-line layout. Cost shows `?` when NA, ctx shows `ctx ?` when live tokens are unavailable.
  • New helpers in `R/context-budget.R`: `format_age()` and `format_live_ctx()`.

Test plan

  • 39 new offline tests in `test_agents_visibility.R` cover the format helpers, `subagent_accumulate_usage()` (NULL-safe, partial usage, running cost), and the full formatter across idle / busy / with-cost shapes.
  • `tinytest::test_package("corteza")` — 1555/1555 OK.
  • End-to-end against moonshot: spawn → one sync + one async query → cumulative tokens grow correctly, live ctx updates, age advances.

Open follow-ups (not in this PR)

  • Cost computation for providers that don't return it natively (would need a per-model pricing table; out of scope here).
  • Per-tool-call latency breakdown if that proves useful later.

Each subagent's registry entry now tracks cumulative usage:
cumulative_input_tokens, cumulative_output_tokens,
cumulative_total_tokens, cumulative_cost, query_count. Both sync
subagent_query() and async subagent_collect() accumulate after
each successful turn. Cost is captured only when the provider
returns it (Anthropic typically does; moonshot/ollama don't), and
the cumulative starts as NA so a missing cost stays distinguishable
from $0.

subagent_turn_prompt() now returns list(reply, usage) instead of
just the reply string so the parent can read both. Only two
internal callers; saber::blast_radius confirmed.

subagent_list() gains: model, age_seconds, live_tokens,
context_limit, cumulative_*, query_count. live_tokens is computed
per /agents call via info$session$run() against the child — best-
effort, NA for busy agents (callr can't query a session with a
pending call).

format_subagent_list() renders a dense single line per agent:
  [1] task (model · age · ctx N/limit · X in / Y out · $Z)
      (T min remaining) idle <short-id>

Cost shows '?' when not provided. live ctx shows 'ctx ?' when the
child is busy. Two new helpers in R/context-budget.R: format_age()
and format_live_ctx().

39 offline tests in test_agents_visibility.R cover the format
helpers, subagent_accumulate_usage() (NULL safe, partial usage,
running cost), and the full format_subagent_list() output across
idle/busy/with-cost shapes.

Verified end-to-end against moonshot:
  [1] demo (moonshot-v1-8k · 2s · ctx 617/128.0K · 741 in / 5 out · ?)
      (30.0 min remaining) idle <id>
Cumulative tokens grow across two queries; one sync, one async +
collect, both accumulate correctly.
Subagents spawned without an explicit model used to show the
provider name as the model field and 'ctx ?' as the live context
limit, because the limit lookup had no key. The child still ran
with the provider's real default model, so the display was
misleading.

Add default_provider_model() in R/context-budget.R as a single
source of truth, mirroring the CLI's default_provider_model() and
matching .resolve_model() in R/turn.R for moonshot. Use it in:

  - subagent_live_token_count(): when sess$model_map$cloud is
    NULL, fall back to the provider default before looking up the
    context limit. Returns the resolved model so the parent can
    display the same identity.
  - subagent_list(): show the resolved model name in the agents
    listing (info$model > live$model > default_provider_model() >
    provider name > '?').
  - maybe_compact_turn_session(): replace the inline switch with
    the shared helper. Same defaults; this also corrects an old
    drift where compaction used 'moonshot-v1-8k' while the rest of
    the package uses 'kimi-k2.6'.

Verified end-to-end: spawning a moonshot subagent with no explicit
model now shows '(kimi-k2.6 · 0s · ctx 565/128.0K · ...)' instead
of '(moonshot · 0s · ctx ?  · ...)'.

8 new tests cover default_provider_model() lookups and confirm
each resolved default has a context-limit entry. 1563/1563 OK.
@TroyHernandez TroyHernandez merged commit ca46c91 into main May 14, 2026
4 checks passed
@TroyHernandez TroyHernandez deleted the agents-token-cost branch May 14, 2026 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant