Skip to content

fix: use backend thread token usage for header total#2800

Open
Layau-code wants to merge 3 commits intobytedance:mainfrom
Layau-code:fix/thread-token-usage-header-total
Open

fix: use backend thread token usage for header total#2800
Layau-code wants to merge 3 commits intobytedance:mainfrom
Layau-code:fix/thread-token-usage-header-total

Conversation

@Layau-code
Copy link
Copy Markdown
Contributor

Refs #2734

Summary

  • use backend persisted thread/run token usage for the header total
  • keep the header live during in-flight runs by adding current streamed-message usage as a pending delta
  • keep per-turn and debug token usage based on currently visible messages
  • fall back to visible-message aggregation when backend thread usage is unavailable or empty
  • add backend and frontend tests for the thread-level accounting path

Details

The header token total now requests GET /api/threads/{thread_id}/token-usage and uses the persisted thread-level run aggregation when available.

To avoid freezing the header during long responses, the UI records the current live message baseline before sending a new run. While the run is streaming, the header displays:

persisted backend total + current in-flight message usage

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the token-usage header total to prefer backend-persisted, thread-level token accounting (via a new GET /api/threads/{thread_id}/token-usage endpoint), while keeping the header responsive during in-flight runs by adding a streamed “pending” delta derived from currently visible messages. It also adds backend/frontend tests for the new thread-level accounting path.

Changes:

  • Backend: add a typed /api/threads/{thread_id}/token-usage response model + endpoint backed by RunStore.aggregate_tokens_by_thread.
  • Frontend: add a thread token-usage query + mapping helper, and update the header indicator to prefer backend totals with an in-flight delta.
  • Tests: add unit coverage for the backend response shape, repository aggregation behavior, and frontend selection logic.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
frontend/tests/unit/core/threads/token-usage.test.ts Adds unit tests for mapping backend thread token usage into UI TokenUsage.
frontend/tests/unit/core/messages/usage.test.ts Adds unit tests for selecting header totals using backend usage + pending delta, with fallback behavior.
frontend/src/core/threads/types.ts Introduces ThreadTokenUsageResponse type for the new backend API payload.
frontend/src/core/threads/token-usage.ts Adds query key helper + response-to-UI mapping function.
frontend/src/core/threads/hooks.ts Adds pending-usage baseline tracking and a new useThreadTokenUsage query; exposes pendingUsageMessages from useThreadStream.
frontend/src/core/messages/usage.ts Adds selectHeaderTokenUsage to prefer backend totals and optionally add pending in-flight usage.
frontend/src/components/workspace/token-usage-indicator.tsx Switches header usage calculation to selectHeaderTokenUsage with backend + pending inputs.
frontend/src/app/workspace/chats/[thread_id]/page.tsx Wires useThreadTokenUsage + pending usage messages into the header indicator.
frontend/src/app/workspace/agents/[agent_name]/chats/[thread_id]/page.tsx Same wiring for agent chat route (including mock handling).
frontend/src/core/i18n/locales/en-US.ts Updates token usage note text to reflect new accounting sources.
frontend/src/core/i18n/locales/zh-CN.ts Updates token usage note text to reflect new accounting sources.
backend/app/gateway/routers/thread_runs.py Adds response models and the /token-usage endpoint with response_model=ThreadTokenUsageResponse.
backend/tests/test_thread_token_usage.py Adds API test asserting stable response shape for /token-usage.
backend/tests/test_run_repository.py Adds repository test ensuring aggregation counts only completed runs and produces expected breakdowns.

Comment on lines +573 to +577
const pendingUsageMessages = thread.isLoading
? getMessagesAfterBaseline(
thread.messages,
pendingUsageBaselineMessageIdsRef.current,
)
Comment thread frontend/src/core/threads/hooks.ts Outdated
Comment on lines +770 to +775
if (!response.ok) {
throw new Error("Failed to load thread token usage.");
}
return (await response.json()) as ThreadTokenUsageResponse;
},
enabled: enabled && Boolean(threadId),
@Layau-code Layau-code force-pushed the fix/thread-token-usage-header-total branch 2 times, most recently from 5755d68 to 3850c8d Compare May 8, 2026 15:58
@Layau-code Layau-code force-pushed the fix/thread-token-usage-header-total branch from 3850c8d to 219804f Compare May 8, 2026 16:03
@WillemJiang
Copy link
Copy Markdown
Collaborator

@Layau-code, thanks for your contribution. Here are some other review comments on your PR.

  1. isMock destructured from useThreadChat but not shown in diff

The diff adds isMock to the destructuring:
const { threadId, setThreadId, isNewThread, setIsNewThread, isMock } = useThreadChat();
But the diff doesn't show useThreadChat being updated to return isMock. If this property doesn't exist yet, this will be undefined at runtime (not a crash, but the mock-mode disabling won't work). Should verify useThreadChat already exposes this.

  1. useThreadTokenUsage uses raw fetch instead of shared API client
  const response = await fetch(`${getBackendBaseURL()}/api/threads/...`);

Other hooks in the same file use getAPIClient() (e.g., useRunDetail). If the shared client handles auth headers, request interceptors, or error normalization, this hook bypasses all of that. It is worth aligning with the existing pattern unless there's a reason to use a raw fetch.

  1. Content-Type: application/json on a GET request is unnecessary

headers: { "Content-Type": "application/json" },

GET requests have no body, so Content-Type has no effect. Harmless but slightly misleading.

@WillemJiang WillemJiang added this to the 2.0-m1 milestone May 9, 2026
@WillemJiang WillemJiang added the reviewing The PR is in reviewing status label May 9, 2026
@Layau-code
Copy link
Copy Markdown
Contributor Author

useThreadChat already returns isMock in frontend/src/components/workspace/chats/use-thread-chat.ts, so it won't be undefined at runtime. I think the PR diff just didn't show that context clearly. I also updated the token usage request to use the shared auth fetch path and removed the unnecessary `Content-Type` header from the GET request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing The PR is in reviewing status

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants