Summary
Token usage tracking system shipped in #2734 via #2770 #2800 #2838 #2841 #2845. This issue tracks remaining gaps found
during code audit.
Closes #2734
Bugs
1. model_name never persisted to RunRow
RunRow.model_name is used in aggregate_tokens_by_thread but never written — get_completion_data() omits it,
update_run_completion() doesn't set it. Result: by_model API always returns {"unknown": {...}}.
Fix: extract from serialized["kwargs"]["model"] in on_chat_model_start, return in get_completion_data().
2. SubagentTokenCollector returns early on multi-generation
token_collector.py:59 has a return after the first valid generation. Parent RunJournal.on_llm_end iterates all
messages. Behavior inconsistent.
Fix: remove the return, let dedup handle it.
3. llm_call_count excludes sub-agent calls
record_external_llm_usage_records adds to total_tokens but not _llm_call_count. Per-call average becomes
misleading.
Fix: increment count in record_external_llm_usage_records, or add separate counter.
UX
4. Sub-agent tokens invisible during streaming
Header shows accumulateUsage(messages) during active run. Sub-agent tokens aren't on any message's usage_metadata
— only visible after run completes and /token-usage is re-fetched.
Fix: emit periodic token usage SSE event during run.
Design
5. by_model granularity insufficient
Single run can use multiple models (lead/subagent/title). Current aggregation groups by one model_name per run.
Sub-agent model cost is invisible.
6. Sub-agent collector doesn't record model name
SubagentTokenCollector records have caller but no model_name. Combined with #5, per-model cost breakdown is
impossible.
Priority
| P |
Issue |
Effort |
| 0 |
#1 model_name |
Small |
| 1 |
#4 real-time display |
Medium |
| 1 |
#2 multi-gen fix |
Small |
| 2 |
#3 call count |
Small |
| 2 |
#5 #6 by_model redesign |
Large |
Summary
Token usage tracking system shipped in #2734 via #2770 #2800 #2838 #2841 #2845. This issue tracks remaining gaps found
during code audit.
Closes #2734
Bugs
1.
model_namenever persisted to RunRowRunRow.model_nameis used inaggregate_tokens_by_threadbut never written —get_completion_data()omits it,update_run_completion()doesn't set it. Result:by_modelAPI always returns{"unknown": {...}}.Fix: extract from
serialized["kwargs"]["model"]inon_chat_model_start, return inget_completion_data().2. SubagentTokenCollector returns early on multi-generation
token_collector.py:59has areturnafter the first valid generation. ParentRunJournal.on_llm_enditerates allmessages. Behavior inconsistent.
Fix: remove the
return, let dedup handle it.3.
llm_call_countexcludes sub-agent callsrecord_external_llm_usage_recordsadds tototal_tokensbut not_llm_call_count. Per-call average becomesmisleading.
Fix: increment count in
record_external_llm_usage_records, or add separate counter.UX
4. Sub-agent tokens invisible during streaming
Header shows
accumulateUsage(messages)during active run. Sub-agent tokens aren't on any message'susage_metadata— only visible after run completes and
/token-usageis re-fetched.Fix: emit periodic token usage SSE event during run.
Design
5.
by_modelgranularity insufficientSingle run can use multiple models (lead/subagent/title). Current aggregation groups by one
model_nameper run.Sub-agent model cost is invisible.
6. Sub-agent collector doesn't record model name
SubagentTokenCollectorrecords havecallerbut nomodel_name. Combined with #5, per-model cost breakdown isimpossible.
Priority