feat(ai): record total billed tokens per ai run#1222
Conversation
Aggregate the real billed token usage (from call_llm's response.usage + litellm completion_cost) across every LLM call an ai task makes into a per-run total persisted on the runner context as context.ai_tokens (int, cumulative) and context.ai_cost (float). This is the AI analog of context.scan_hours: the cloud billing chore reads context.ai_tokens off the task doc to bill/quota AI usage. - _account_usage() sums call_llm usage onto self.context; missing/None usage counts as 0 so accounting never crashes the run. - Main loop call, intent-detection call, and history summarization call are all counted exactly once. ChatHistory.compact() accrues its own summarization usage which the task drains into context per iteration. - Subagent/batch ai tasks are separate runners with their own task doc and their own context.ai_tokens, so the chore sums across docs without double-counting. - Works identically in chat and attack modes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis change adds billed token and cost tracking for AI history compaction and task execution. ChangesAI billing accounting
Sequence Diagram(s)sequenceDiagram
participant RunLoop as secator.tasks.ai._run_loop
participant History as secator.ai.history.ChatHistory.compact
participant LLM as call_llm
participant Account as secator.tasks.ai._account_usage
participant Drain as secator.tasks.ai._drain_history_usage
participant Detect as secator.tasks.ai._detect_mode
RunLoop->>History: compact() when history is summarized
History->>LLM: summarize messages
LLM-->>History: result with usage
History-->>RunLoop: billed_tokens and billed_cost updated
RunLoop->>Drain: move history billing into context
RunLoop->>Account: add response usage after each LLM call
Detect->>Account: add intent-detection usage
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@secator/tasks/ai.py`:
- Around line 800-814: The usage drain in _drain_history_usage only accounts and
clears history.billed_tokens and history.billed_cost when tokens is truthy,
which can drop cost-only usage. Update the condition so the drain runs when
either billed_tokens or billed_cost is present, and make sure both
history.billed_tokens and history.billed_cost are reset after calling
self._account_usage, even when tokens is zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b6b9a15e-08c4-474c-a52d-b9baf8237797
📒 Files selected for processing (3)
secator/ai/history.pysecator/tasks/ai.pytests/unit/test_ai_tokens.py
| def _drain_history_usage(self): | ||
| """Roll billed usage accrued by history summarization into context.ai_tokens. | ||
|
|
||
| `ChatHistory.compact` makes its own LLM calls and stashes their billed | ||
| usage on the history object; drain it here so it is counted exactly once. | ||
| """ | ||
| history = getattr(self, "history", None) | ||
| if history is None: | ||
| return | ||
| tokens = getattr(history, "billed_tokens", 0) or 0 | ||
| cost = getattr(history, "billed_cost", 0.0) or 0.0 | ||
| if tokens: | ||
| self._account_usage({"tokens": tokens, "cost": cost}) | ||
| history.billed_tokens = 0 | ||
| history.billed_cost = 0.0 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Cost can be silently dropped when billed_tokens is 0 but billed_cost is non-zero.
The drain only fires (and resets the counters) when tokens is truthy. If a summarization call reports a cost with zero/missing tokens, that cost is neither accounted nor reset on this iteration. It would only be picked up on a later drain that happens to have non-zero tokens, and is lost entirely if that never occurs. Gate on either value.
🛠️ Proposed fix
tokens = getattr(history, "billed_tokens", 0) or 0
cost = getattr(history, "billed_cost", 0.0) or 0.0
- if tokens:
+ if tokens or cost:
self._account_usage({"tokens": tokens, "cost": cost})
history.billed_tokens = 0
history.billed_cost = 0.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _drain_history_usage(self): | |
| """Roll billed usage accrued by history summarization into context.ai_tokens. | |
| `ChatHistory.compact` makes its own LLM calls and stashes their billed | |
| usage on the history object; drain it here so it is counted exactly once. | |
| """ | |
| history = getattr(self, "history", None) | |
| if history is None: | |
| return | |
| tokens = getattr(history, "billed_tokens", 0) or 0 | |
| cost = getattr(history, "billed_cost", 0.0) or 0.0 | |
| if tokens: | |
| self._account_usage({"tokens": tokens, "cost": cost}) | |
| history.billed_tokens = 0 | |
| history.billed_cost = 0.0 | |
| def _drain_history_usage(self): | |
| """Roll billed usage accrued by history summarization into context.ai_tokens. | |
| `ChatHistory.compact` makes its own LLM calls and stashes their billed | |
| usage on the history object; drain it here so it is counted exactly once. | |
| """ | |
| history = getattr(self, "history", None) | |
| if history is None: | |
| return | |
| tokens = getattr(history, "billed_tokens", 0) or 0 | |
| cost = getattr(history, "billed_cost", 0.0) or 0.0 | |
| if tokens or cost: | |
| self._account_usage({"tokens": tokens, "cost": cost}) | |
| history.billed_tokens = 0 | |
| history.billed_cost = 0.0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@secator/tasks/ai.py` around lines 800 - 814, The usage drain in
_drain_history_usage only accounts and clears history.billed_tokens and
history.billed_cost when tokens is truthy, which can drop cost-only usage.
Update the condition so the drain runs when either billed_tokens or billed_cost
is present, and make sure both history.billed_tokens and history.billed_cost are
reset after calling self._account_usage, even when tokens is zero.
Extend the single per-run billed-token accumulator so every call_llm contributes its prompt/completion breakdown, not just the total. call_llm now reads response.usage.prompt_tokens / completion_tokens (alongside total_tokens) and _account_usage rolls each into a dedicated cumulative context key: context.ai_prompt_tokens / context.ai_completion_tokens (context.ai_tokens remains the billed total). The history-compaction path (ChatHistory.compact -> billed_* -> _drain_history_usage) threads the split through too, so summarization is split-billed alongside the main loop and intent-detection calls. Accounting stays at the source: every successful call_llm (tool-only main turn, _detect_mode intent call, and history compaction) is counted exactly once regardless of whether the turn produced display content. Missing/None usage adds 0 and never raises. There is one accounting path — the old "sum ai_type==response findings" approach is not used. Tests (tests/unit/test_ai_tokens.py): add a tool-only-turn test (proves a content-less turn is billed), a combined main+intent+compaction sum test, and prompt/completion-split accumulation tests. 13/13 pass; flake8 clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
…tering The platform metering chore prices a run's consumed tokens against a model registry (free vs paid, per-million in/out/cached rates), so it needs to know WHICH model produced the tokens. Record the resolved run model id on `context.ai_model` in `_init_options`, alongside the existing `context.ai_tokens` accounting seeds. This is the configured model for the run; a mid-session model switch is out of scope (the configured model is recorded). Tests: TestAiModelRecording drives _init_options with collaborators stubbed and asserts context.ai_model == the resolved model (paid + free ids). 15/15 pass; flake8 clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
Goal
Add per-run AI-token accounting to the
aitask so the cloud platform can bill/quota it. Theaitask already gets real billed token usage per LLM call (call_llm→response.usage.total_tokens+ litellmcompletion_cost) but never aggregated it. This persists the per-run total on the runner so the billing chore can read it — the AI analog of howcontext.scan_hoursis used for run-hours billing.What changed
secator/tasks/ai.py_init_optionsseedscontext["ai_tokens"](int) andcontext["ai_cost"](float) so the field lands on the task doc even with zero LLM calls._account_usage(usage)sums a singlecall_llmusage dict ontoself.context. Missing/None/malformed usage counts as 0 and never raises.call_llm(before any empty-responsecontinue, so each billed call is counted once) and after the intent-detectioncall_llm._drain_history_usage()rolls billed usage accrued by history summarization intocontext.ai_tokensonce per iteration.secator/ai/history.py—ChatHistory.compact()now records its summarization call's billed usage onbilled_tokens/billed_cost(the task drains it). Missing usage = 0.self.contextis copied onto every item's_contextand persisted to MongoDB, socontext.ai_tokenslands on the task doc. Subagent/batchaitasks are separate runners with their own task doc + owncontext.ai_tokens, so the chore sums across docs without double-counting. Consistent in chat and attack modes.The field the billing chore reads
context.ai_tokens(cumulative billed tokens, int).context.ai_cost(float) is also persisted.Tests
New
tests/unit/test_ai_tokens.py(9 tests, all green): N-call sum, missing/None usage = 0, malformed usage doesn't crash, exact persisted key, summarization drain-once,compact()records usage / handles missing usage, and two end-to-end_run_loopruns assertingcontext.ai_tokens== sum (and 0 with no usage). flake8 clean.🤖 Generated with Claude Code
https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
Summary by CodeRabbit
New Features
Bug Fixes
Tests