token-usage: remaining accounting gaps after initial system

## Summary                                                                                                            
                                                                                          
  Token usage tracking system shipped in #2734 via #2770 #2800 #2838 #2841 #2845. This issue tracks remaining gaps found
   during code audit.                                                                                                   
                                                                                                                        
  Closes #2734                                                                                                          
                                                                                                                        
  ---                                                                                                                   
                                                                  
  ## Bugs                                                                                                               
  
  ### 1. `model_name` never persisted to RunRow                                                                         
                                                                  
  `RunRow.model_name` is used in `aggregate_tokens_by_thread` but never written — `get_completion_data()` omits it,     
  `update_run_completion()` doesn't set it. Result: `by_model` API always returns `{"unknown": {...}}`.                 
  
  Fix: extract from `serialized["kwargs"]["model"]` in `on_chat_model_start`, return in `get_completion_data()`.        
                                                                  
  ### 2. SubagentTokenCollector returns early on multi-generation                                                       
                                                                  
  `token_collector.py:59` has a `return` after the first valid generation. Parent `RunJournal.on_llm_end` iterates all  
  messages. Behavior inconsistent.                                
                                                                                                                        
  Fix: remove the `return`, let dedup handle it.                                                                        
  
  ### 3. `llm_call_count` excludes sub-agent calls                                                                      
                                                                  
  `record_external_llm_usage_records` adds to `total_tokens` but not `_llm_call_count`. Per-call average becomes        
  misleading.                                                     
                                                                                                                        
  Fix: increment count in `record_external_llm_usage_records`, or add separate counter.                                 
  
  ---                                                                                                                   
                                                                  
  ## UX

  ### 4. Sub-agent tokens invisible during streaming                                                                    
  
  Header shows `accumulateUsage(messages)` during active run. Sub-agent tokens aren't on any message's `usage_metadata` 
  — only visible after run completes and `/token-usage` is re-fetched.
                                                                                                                        
  Fix: emit periodic token usage SSE event during run.                                                                  
  
  ---                                                                                                                   
                                                                  
  ## Design

  ### 5. `by_model` granularity insufficient                                                                            
  
  Single run can use multiple models (lead/subagent/title). Current aggregation groups by one `model_name` per run.     
  Sub-agent model cost is invisible.                                                                                    
  
  ### 6. Sub-agent collector doesn't record model name                                                                  
                                                                  
  `SubagentTokenCollector` records have `caller` but no `model_name`. Combined with #5, per-model cost breakdown is     
  impossible.                                                     
                                                                                                                        
  ---                                                             

  ## Priority                                                                                                           
  
  | P | Issue | Effort |                                                                                                
  |---|-------|--------|                                          
  | 0 | #1 model_name | Small |                                                                                         
  | 1 | #4 real-time display | Medium |                                                                                 
  | 1 | #2 multi-gen fix | Small |                                                                                      
  | 2 | #3 call count | Small |                                                                                         
  | 2 | #5 #6 by_model redesign | Large |                                                                               


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

token-usage: remaining accounting gaps after initial system #2874

Summary

Bugs

1. `model_name` never persisted to RunRow

2. SubagentTokenCollector returns early on multi-generation

3. `llm_call_count` excludes sub-agent calls

UX

4. Sub-agent tokens invisible during streaming

Design

5. `by_model` granularity insufficient

6. Sub-agent collector doesn't record model name

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

P	Issue	Effort
0	#1 model_name	Small
1	#4 real-time display	Medium
1	#2 multi-gen fix	Small
2	#3 call count	Small
2	#5 #6 by_model redesign	Large

token-usage: remaining accounting gaps after initial system #2874

Description

Summary

Bugs

1. model_name never persisted to RunRow

2. SubagentTokenCollector returns early on multi-generation

3. llm_call_count excludes sub-agent calls

UX

4. Sub-agent tokens invisible during streaming

Design

5. by_model granularity insufficient

6. Sub-agent collector doesn't record model name

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `model_name` never persisted to RunRow

3. `llm_call_count` excludes sub-agent calls

5. `by_model` granularity insufficient