fix: use backend thread token usage for header total#2800
fix: use backend thread token usage for header total#2800Layau-code wants to merge 3 commits intobytedance:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the token-usage header total to prefer backend-persisted, thread-level token accounting (via a new GET /api/threads/{thread_id}/token-usage endpoint), while keeping the header responsive during in-flight runs by adding a streamed “pending” delta derived from currently visible messages. It also adds backend/frontend tests for the new thread-level accounting path.
Changes:
- Backend: add a typed
/api/threads/{thread_id}/token-usageresponse model + endpoint backed byRunStore.aggregate_tokens_by_thread. - Frontend: add a thread token-usage query + mapping helper, and update the header indicator to prefer backend totals with an in-flight delta.
- Tests: add unit coverage for the backend response shape, repository aggregation behavior, and frontend selection logic.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/tests/unit/core/threads/token-usage.test.ts | Adds unit tests for mapping backend thread token usage into UI TokenUsage. |
| frontend/tests/unit/core/messages/usage.test.ts | Adds unit tests for selecting header totals using backend usage + pending delta, with fallback behavior. |
| frontend/src/core/threads/types.ts | Introduces ThreadTokenUsageResponse type for the new backend API payload. |
| frontend/src/core/threads/token-usage.ts | Adds query key helper + response-to-UI mapping function. |
| frontend/src/core/threads/hooks.ts | Adds pending-usage baseline tracking and a new useThreadTokenUsage query; exposes pendingUsageMessages from useThreadStream. |
| frontend/src/core/messages/usage.ts | Adds selectHeaderTokenUsage to prefer backend totals and optionally add pending in-flight usage. |
| frontend/src/components/workspace/token-usage-indicator.tsx | Switches header usage calculation to selectHeaderTokenUsage with backend + pending inputs. |
| frontend/src/app/workspace/chats/[thread_id]/page.tsx | Wires useThreadTokenUsage + pending usage messages into the header indicator. |
| frontend/src/app/workspace/agents/[agent_name]/chats/[thread_id]/page.tsx | Same wiring for agent chat route (including mock handling). |
| frontend/src/core/i18n/locales/en-US.ts | Updates token usage note text to reflect new accounting sources. |
| frontend/src/core/i18n/locales/zh-CN.ts | Updates token usage note text to reflect new accounting sources. |
| backend/app/gateway/routers/thread_runs.py | Adds response models and the /token-usage endpoint with response_model=ThreadTokenUsageResponse. |
| backend/tests/test_thread_token_usage.py | Adds API test asserting stable response shape for /token-usage. |
| backend/tests/test_run_repository.py | Adds repository test ensuring aggregation counts only completed runs and produces expected breakdowns. |
| const pendingUsageMessages = thread.isLoading | ||
| ? getMessagesAfterBaseline( | ||
| thread.messages, | ||
| pendingUsageBaselineMessageIdsRef.current, | ||
| ) |
| if (!response.ok) { | ||
| throw new Error("Failed to load thread token usage."); | ||
| } | ||
| return (await response.json()) as ThreadTokenUsageResponse; | ||
| }, | ||
| enabled: enabled && Boolean(threadId), |
5755d68 to
3850c8d
Compare
3850c8d to
219804f
Compare
|
@Layau-code, thanks for your contribution. Here are some other review comments on your PR.
The diff adds isMock to the destructuring:
Other hooks in the same file use getAPIClient() (e.g., useRunDetail). If the shared client handles auth headers, request interceptors, or error normalization, this hook bypasses all of that. It is worth aligning with the existing pattern unless there's a reason to use a raw fetch.
GET requests have no body, so Content-Type has no effect. Harmless but slightly misleading. |
|
|
Refs #2734
Summary
Details
The header token total now requests
GET /api/threads/{thread_id}/token-usageand uses the persisted thread-level run aggregation when available.To avoid freezing the header during long responses, the UI records the current live message baseline before sending a new run. While the run is streaming, the header displays: