Skip to content

feat(ai): record total billed tokens per ai run#1222

Open
ocervell wants to merge 3 commits into
mainfrom
feat/ai-token-quota
Open

feat(ai): record total billed tokens per ai run#1222
ocervell wants to merge 3 commits into
mainfrom
feat/ai-token-quota

Conversation

@ocervell

@ocervell ocervell commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Goal

Add per-run AI-token accounting to the ai task so the cloud platform can bill/quota it. The ai task already gets real billed token usage per LLM call (call_llmresponse.usage.total_tokens + litellm completion_cost) but never aggregated it. This persists the per-run total on the runner so the billing chore can read it — the AI analog of how context.scan_hours is used for run-hours billing.

What changed

  • secator/tasks/ai.py
    • _init_options seeds context["ai_tokens"] (int) and context["ai_cost"] (float) so the field lands on the task doc even with zero LLM calls.
    • _account_usage(usage) sums a single call_llm usage dict onto self.context. Missing/None/malformed usage counts as 0 and never raises.
    • Wired in after the main loop call_llm (before any empty-response continue, so each billed call is counted once) and after the intent-detection call_llm.
    • _drain_history_usage() rolls billed usage accrued by history summarization into context.ai_tokens once per iteration.
  • secator/ai/history.pyChatHistory.compact() now records its summarization call's billed usage on billed_tokens/billed_cost (the task drains it). Missing usage = 0.

self.context is copied onto every item's _context and persisted to MongoDB, so context.ai_tokens lands on the task doc. Subagent/batch ai tasks are separate runners with their own task doc + own context.ai_tokens, so the chore sums across docs without double-counting. Consistent in chat and attack modes.

The field the billing chore reads

context.ai_tokens (cumulative billed tokens, int). context.ai_cost (float) is also persisted.

Tests

New tests/unit/test_ai_tokens.py (9 tests, all green): N-call sum, missing/None usage = 0, malformed usage doesn't crash, exact persisted key, summarization drain-once, compact() records usage / handles missing usage, and two end-to-end _run_loop runs asserting context.ai_tokens == sum (and 0 with no usage). flake8 clean.

🤖 Generated with Claude Code

https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm

Summary by CodeRabbit

  • New Features

    • Added tracking of billed AI token and cost usage during runs.
    • Run summaries now include accumulated usage from both direct model calls and history compaction.
  • Bug Fixes

    • Improved handling of missing or invalid usage data so accounting stays accurate.
    • Ensured history-related usage is counted only once and then cleared from temporary storage.
  • Tests

    • Added coverage for usage tracking, invalid usage values, and end-to-end run accounting.

Aggregate the real billed token usage (from call_llm's response.usage +
litellm completion_cost) across every LLM call an ai task makes into a
per-run total persisted on the runner context as context.ai_tokens (int,
cumulative) and context.ai_cost (float). This is the AI analog of
context.scan_hours: the cloud billing chore reads context.ai_tokens off
the task doc to bill/quota AI usage.

- _account_usage() sums call_llm usage onto self.context; missing/None
  usage counts as 0 so accounting never crashes the run.
- Main loop call, intent-detection call, and history summarization call
  are all counted exactly once. ChatHistory.compact() accrues its own
  summarization usage which the task drains into context per iteration.
- Subagent/batch ai tasks are separate runners with their own task doc
  and their own context.ai_tokens, so the chore sums across docs without
  double-counting.
- Works identically in chat and attack modes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 79307a56-f1f3-4a5b-aa40-56e178227c60

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This change adds billed token and cost tracking for AI history compaction and task execution. ChatHistory stores summarization usage, and AI task runs accumulate per-call and drained history usage into context. New tests cover helper and loop accounting.

Changes

AI billing accounting

Layer / File(s) Summary
History billing fields and compaction
secator/ai/history.py
ChatHistory gains billed token/cost fields, and compact() stores summarization usage totals on the instance.
Run-loop billing accumulation
secator/tasks/ai.py
_init_options seeds context totals, _run_loop drains history usage and accounts each LLM response, _detect_mode records intent-detection usage, and the new helpers accumulate or reset billing counters.
Billing accounting tests
tests/unit/test_ai_tokens.py
Unit and loop tests cover helper accumulation, history drainage, compact() usage recording, and end-to-end totals.

Sequence Diagram(s)

sequenceDiagram
  participant RunLoop as secator.tasks.ai._run_loop
  participant History as secator.ai.history.ChatHistory.compact
  participant LLM as call_llm
  participant Account as secator.tasks.ai._account_usage
  participant Drain as secator.tasks.ai._drain_history_usage
  participant Detect as secator.tasks.ai._detect_mode

  RunLoop->>History: compact() when history is summarized
  History->>LLM: summarize messages
  LLM-->>History: result with usage
  History-->>RunLoop: billed_tokens and billed_cost updated
  RunLoop->>Drain: move history billing into context
  RunLoop->>Account: add response usage after each LLM call
  Detect->>Account: add intent-detection usage
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A bunny counted tokens by moonlight glow,
Then tucked cost crumbs where the tallies grow.
Hop, hop — the history keeps its score,
And the loop adds up a little more.
With whiskers twitching, I cheer and say:
“Carrot-ledgers make a bright AI day!”

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: per-run AI billing accounting for billed tokens.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ai-token-quota

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@secator/tasks/ai.py`:
- Around line 800-814: The usage drain in _drain_history_usage only accounts and
clears history.billed_tokens and history.billed_cost when tokens is truthy,
which can drop cost-only usage. Update the condition so the drain runs when
either billed_tokens or billed_cost is present, and make sure both
history.billed_tokens and history.billed_cost are reset after calling
self._account_usage, even when tokens is zero.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6b9a15e-08c4-474c-a52d-b9baf8237797

📥 Commits

Reviewing files that changed from the base of the PR and between 8d8ec83 and 258fe14.

📒 Files selected for processing (3)
  • secator/ai/history.py
  • secator/tasks/ai.py
  • tests/unit/test_ai_tokens.py

Comment thread secator/tasks/ai.py
Comment on lines +800 to +814
def _drain_history_usage(self):
"""Roll billed usage accrued by history summarization into context.ai_tokens.

`ChatHistory.compact` makes its own LLM calls and stashes their billed
usage on the history object; drain it here so it is counted exactly once.
"""
history = getattr(self, "history", None)
if history is None:
return
tokens = getattr(history, "billed_tokens", 0) or 0
cost = getattr(history, "billed_cost", 0.0) or 0.0
if tokens:
self._account_usage({"tokens": tokens, "cost": cost})
history.billed_tokens = 0
history.billed_cost = 0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Cost can be silently dropped when billed_tokens is 0 but billed_cost is non-zero.

The drain only fires (and resets the counters) when tokens is truthy. If a summarization call reports a cost with zero/missing tokens, that cost is neither accounted nor reset on this iteration. It would only be picked up on a later drain that happens to have non-zero tokens, and is lost entirely if that never occurs. Gate on either value.

🛠️ Proposed fix
 		tokens = getattr(history, "billed_tokens", 0) or 0
 		cost = getattr(history, "billed_cost", 0.0) or 0.0
-		if tokens:
+		if tokens or cost:
 			self._account_usage({"tokens": tokens, "cost": cost})
 			history.billed_tokens = 0
 			history.billed_cost = 0.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _drain_history_usage(self):
"""Roll billed usage accrued by history summarization into context.ai_tokens.
`ChatHistory.compact` makes its own LLM calls and stashes their billed
usage on the history object; drain it here so it is counted exactly once.
"""
history = getattr(self, "history", None)
if history is None:
return
tokens = getattr(history, "billed_tokens", 0) or 0
cost = getattr(history, "billed_cost", 0.0) or 0.0
if tokens:
self._account_usage({"tokens": tokens, "cost": cost})
history.billed_tokens = 0
history.billed_cost = 0.0
def _drain_history_usage(self):
"""Roll billed usage accrued by history summarization into context.ai_tokens.
`ChatHistory.compact` makes its own LLM calls and stashes their billed
usage on the history object; drain it here so it is counted exactly once.
"""
history = getattr(self, "history", None)
if history is None:
return
tokens = getattr(history, "billed_tokens", 0) or 0
cost = getattr(history, "billed_cost", 0.0) or 0.0
if tokens or cost:
self._account_usage({"tokens": tokens, "cost": cost})
history.billed_tokens = 0
history.billed_cost = 0.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@secator/tasks/ai.py` around lines 800 - 814, The usage drain in
_drain_history_usage only accounts and clears history.billed_tokens and
history.billed_cost when tokens is truthy, which can drop cost-only usage.
Update the condition so the drain runs when either billed_tokens or billed_cost
is present, and make sure both history.billed_tokens and history.billed_cost are
reset after calling self._account_usage, even when tokens is zero.

ocervell and others added 2 commits June 25, 2026 20:12
Extend the single per-run billed-token accumulator so every call_llm
contributes its prompt/completion breakdown, not just the total. call_llm
now reads response.usage.prompt_tokens / completion_tokens (alongside
total_tokens) and _account_usage rolls each into a dedicated cumulative
context key: context.ai_prompt_tokens / context.ai_completion_tokens
(context.ai_tokens remains the billed total). The history-compaction path
(ChatHistory.compact -> billed_* -> _drain_history_usage) threads the split
through too, so summarization is split-billed alongside the main loop and
intent-detection calls.

Accounting stays at the source: every successful call_llm (tool-only main
turn, _detect_mode intent call, and history compaction) is counted exactly
once regardless of whether the turn produced display content. Missing/None
usage adds 0 and never raises. There is one accounting path — the old
"sum ai_type==response findings" approach is not used.

Tests (tests/unit/test_ai_tokens.py): add a tool-only-turn test (proves a
content-less turn is billed), a combined main+intent+compaction sum test,
and prompt/completion-split accumulation tests. 13/13 pass; flake8 clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
…tering

The platform metering chore prices a run's consumed tokens against a model
registry (free vs paid, per-million in/out/cached rates), so it needs to know
WHICH model produced the tokens. Record the resolved run model id on
`context.ai_model` in `_init_options`, alongside the existing `context.ai_tokens`
accounting seeds. This is the configured model for the run; a mid-session model
switch is out of scope (the configured model is recorded).

Tests: TestAiModelRecording drives _init_options with collaborators stubbed and
asserts context.ai_model == the resolved model (paid + free ids). 15/15 pass;
flake8 clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01P5vSjfkBuGAAHdKxHS3ySm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant