-
Notifications
You must be signed in to change notification settings - Fork 58
fix(token-estimation): refine logic #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Removed redundant estimation of tokens when promptTokens and completionTokens are provided by the API. - Now only estimate prompt tokens if promptTokens is missing and messages are available. - Only estimate completion tokens if completionTokens is missing and content is available. - Improved code clarity by separating conditions for prompt and completion token estimation. - Maintained fallback logic with error logging for encoding failures. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
❌ Preview Environment deleted from BunnyshellAvailable commands (reply to this comment):
|
WalkthroughSeparated prompt and completion token estimation into independent guarded paths and switched chat flows to prefer provided token fields; also changed base64 size calculation to round to an integer. Logging and per-path error fallbacks were adjusted accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Chat as ChatService
participant Estimator as estimateTokens()
participant Encoder as TokenEncoder
participant Logger
Client->>Chat: send request (may include promptTokens, completionTokens)
alt both tokens provided
Chat-->>Client: proceed using provided promptTokens & completionTokens
else any token missing
Chat->>Estimator: estimateTokens(input)
alt promptTokens missing
Estimator->>Encoder: encodeChat(messages)
alt encode succeeds
Encoder-->>Estimator: promptCount
else encode fails
Estimator->>Logger: log prompt encode error
Estimator-->>Estimator: promptCount = Math.round(fallback)
end
end
alt completionTokens missing
Estimator->>Encoder: encode(JSON.stringify(content))
alt encode succeeds
Encoder-->>Estimator: completionCount
else encode fails
Estimator->>Logger: log completion encode error
Estimator-->>Estimator: completionCount = Math.round(fallback)
end
end
Estimator-->>Chat: return promptTokens, completionTokens (only those calculated)
Chat-->>Client: continue using provided + calculated token fields
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/gateway/src/chat/tools/estimate-tokens.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
{apps/api,apps/gateway,apps/ui,apps/docs,packages}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Always use top-level import; never use require() or dynamic imports (e.g., import(), next/dynamic)
Files:
apps/gateway/src/chat/tools/estimate-tokens.ts
{apps/api,apps/gateway,packages/db}/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
.findMany() or db().query.
{apps/api,apps/gateway,packages/db}/**/*.ts: Use Drizzle ORM with the latest object syntax for database access
For reads, use db().query..findFirst() Files:
apps/gateway/src/chat/tools/estimate-tokens.ts**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Never useanyoras anyin this TypeScript project unless absolutely necessary
Always use top-levelimport; never userequireor dynamic importsFiles:
apps/gateway/src/chat/tools/estimate-tokens.ts{apps/api,apps/gateway}/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
For reads, use db().query.
.findMany() or db().query.
.findFirst() Files:
apps/gateway/src/chat/tools/estimate-tokens.ts🧬 Code graph analysis (1)
apps/gateway/src/chat/tools/estimate-tokens.ts (2)
apps/gateway/src/chat/tools/types.ts (2)
ChatMessage(4-8)DEFAULT_TOKENIZER_MODEL(1-1)packages/logger/src/index.ts (2)
error(147-154)logger(175-175)⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: e2e-shards (5)
- GitHub Check: e2e-shards (2)
- GitHub Check: e2e-shards (4)
- GitHub Check: e2e-shards (1)
- GitHub Check: e2e-shards (3)
- GitHub Check: autofix
- GitHub Check: test / run
- GitHub Check: build / run
- GitHub Check: generate / run
- GitHub Check: lint / run
| // Estimate prompt tokens only if not provided by the API | ||
| if (!promptTokens && messages && messages.length > 0) { | ||
| try { | ||
| // Convert messages to the format expected by gpt-tokenizer | ||
| const chatMessages: ChatMessage[] = messages.map((m) => ({ | ||
| role: m.role, | ||
| content: | ||
| typeof m.content === "string" ? m.content : JSON.stringify(m.content), | ||
| name: m.name, | ||
| })); | ||
| calculatedPromptTokens = encodeChat( | ||
| chatMessages, | ||
| DEFAULT_TOKENIZER_MODEL, | ||
| ).length; | ||
| } catch (error) { | ||
| // Fallback to simple estimation if encoding fails | ||
| logger.error( | ||
| "Failed to encode chat messages in estimate tokens", | ||
| error instanceof Error ? error : new Error(String(error)), | ||
| ); | ||
| calculatedPromptTokens = | ||
| messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard for “missing” must not treat 0 as missing; fix fallback undercount and return integers
- Line 23: Using
!promptTokensre-estimates when the API legitimately returns0. Use a null/undefined check. - Lines 42-44: Fallback undercounts when
contentis non-string (objects/arrays). Mirror the encoding path and stringify per‑message before length. Also round to an integer token count.
Apply this diff:
- // Estimate prompt tokens only if not provided by the API
- if (!promptTokens && messages && messages.length > 0) {
+ // Estimate prompt tokens only if not provided by the API
+ if (promptTokens == null && messages?.length > 0) {
try {
@@
- calculatedPromptTokens =
- messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4;
+ calculatedPromptTokens = Math.round(
+ messages.reduce((acc, m) => {
+ const text =
+ typeof m.content === "string" ? m.content : JSON.stringify(m.content);
+ return acc + text.length;
+ }, 0) / 4
+ );
}
}Additionally: the parameter messages: any[] violates the project guideline to avoid any. Consider typing it (e.g., a ChatMessageLike with content: unknown) and narrowing in place.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Estimate prompt tokens only if not provided by the API | |
| if (!promptTokens && messages && messages.length > 0) { | |
| try { | |
| // Convert messages to the format expected by gpt-tokenizer | |
| const chatMessages: ChatMessage[] = messages.map((m) => ({ | |
| role: m.role, | |
| content: | |
| typeof m.content === "string" ? m.content : JSON.stringify(m.content), | |
| name: m.name, | |
| })); | |
| calculatedPromptTokens = encodeChat( | |
| chatMessages, | |
| DEFAULT_TOKENIZER_MODEL, | |
| ).length; | |
| } catch (error) { | |
| // Fallback to simple estimation if encoding fails | |
| logger.error( | |
| "Failed to encode chat messages in estimate tokens", | |
| error instanceof Error ? error : new Error(String(error)), | |
| ); | |
| calculatedPromptTokens = | |
| messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4; | |
| } | |
| } | |
| // Estimate prompt tokens only if not provided by the API | |
| if (promptTokens == null && messages?.length > 0) { | |
| try { | |
| // Convert messages to the format expected by gpt-tokenizer | |
| const chatMessages: ChatMessage[] = messages.map((m) => ({ | |
| role: m.role, | |
| content: | |
| typeof m.content === "string" ? m.content : JSON.stringify(m.content), | |
| name: m.name, | |
| })); | |
| calculatedPromptTokens = encodeChat( | |
| chatMessages, | |
| DEFAULT_TOKENIZER_MODEL, | |
| ).length; | |
| } catch (error) { | |
| // Fallback to simple estimation if encoding fails | |
| logger.error( | |
| "Failed to encode chat messages in estimate tokens", | |
| error instanceof Error ? error : new Error(String(error)), | |
| ); | |
| calculatedPromptTokens = Math.round( | |
| messages.reduce((acc, m) => { | |
| const text = | |
| typeof m.content === "string" ? m.content : JSON.stringify(m.content); | |
| return acc + text.length; | |
| }, 0) / 4 | |
| ); | |
| } | |
| } |
| // Estimate completion tokens only if not provided by the API | ||
| if (!completionTokens && content) { | ||
| try { | ||
| calculatedCompletionTokens = encode(JSON.stringify(content)).length; | ||
| } catch (error) { | ||
| // Fallback to simple estimation if encoding fails | ||
| logger.error( | ||
| "Failed to encode completion text", | ||
| error instanceof Error ? error : new Error(String(error)), | ||
| ); | ||
| calculatedCompletionTokens = content.length / 4; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix completion-path guard and encoding; avoid JSON.stringify and round fallback
- Line 48:
!completionTokensincorrectly treats0as missing and skips empty-string content. Use a null/undefined check and allow empty strings. - Line 50:
encode(JSON.stringify(content))overcounts by adding quotes/escapes; pass the string directly. - Line 57: Round the heuristic to an integer.
Apply this diff:
- // Estimate completion tokens only if not provided by the API
- if (!completionTokens && content) {
+ // Estimate completion tokens only if not provided by the API
+ if (completionTokens == null && content != null) {
try {
- calculatedCompletionTokens = encode(JSON.stringify(content)).length;
+ calculatedCompletionTokens = encode(content).length;
} catch (error) {
@@
- calculatedCompletionTokens = content.length / 4;
+ calculatedCompletionTokens = Math.round(content.length / 4);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Estimate completion tokens only if not provided by the API | |
| if (!completionTokens && content) { | |
| try { | |
| calculatedCompletionTokens = encode(JSON.stringify(content)).length; | |
| } catch (error) { | |
| // Fallback to simple estimation if encoding fails | |
| logger.error( | |
| "Failed to encode completion text", | |
| error instanceof Error ? error : new Error(String(error)), | |
| ); | |
| calculatedCompletionTokens = content.length / 4; | |
| } | |
| } | |
| // Estimate completion tokens only if not provided by the API | |
| if (completionTokens == null && content != null) { | |
| try { | |
| calculatedCompletionTokens = encode(content).length; | |
| } catch (error) { | |
| // Fallback to simple estimation if encoding fails | |
| logger.error( | |
| "Failed to encode completion text", | |
| error instanceof Error ? error : new Error(String(error)), | |
| ); | |
| calculatedCompletionTokens = Math.round(content.length / 4); | |
| } | |
| } |
🤖 Prompt for AI Agents
In apps/gateway/src/chat/tools/estimate-tokens.ts around lines 47 to 59, the
guard `!completionTokens` treats 0 as missing and the encoder call uses
JSON.stringify which inflates token count and the fallback heuristic isn't
rounded; change the guard to check for null/undefined (e.g., completionTokens ==
null) so zero is allowed, only run the block when content is not null/undefined
(allow empty strings), call encode on the string content directly (e.g.,
encode(String(content))) instead of JSON.stringify(content), and round the
fallback heuristic to an integer (e.g., Math.round(content.length / 4)).
Rounded token estimation values in `estimateTokens` to ensure accurate integer predictions for both prompt and completion tokens. Enhanced size estimation for base64 data in `process-image-url` by rounding results.
Removed redundant token estimation when `promptTokens` and `completionTokens` are provided. Simplified token-related calculations and ensured consistent handling across all relevant functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/models/src/process-image-url.ts (1)
51-69: Harden against SSRF (block private/link-local IPs even over HTTPS).Fetching user-supplied URLs can still hit internal services (e.g., 169.254.169.254) via HTTPS. Add DNS/IP resolution guards and allow-list public IP ranges or domains.
I can provide a helper to resolve hostnames and reject private/reserved CIDRs if you want.
apps/gateway/src/chat/chat.ts (1)
2525-2533: Avoidany[]; useChatMessage[]per project TS guidelines.You already import ChatMessage; use it here.
Apply this diff:
- const chatMessages: any[] = messages.map((m) => ({ + const chatMessages: ChatMessage[] = messages.map((m) => ({ role: m.role as "user" | "assistant" | "system" | undefined, content: m.content || "", name: m.name, }));
🧹 Nitpick comments (1)
apps/gateway/src/chat/chat.ts (1)
2548-2551: Don’t JSON.stringify completion before encoding.JSON.stringify(fullContent) inflates tokens. Encode the raw text.
Apply this diff:
- calculatedCompletionTokens = encode( - JSON.stringify(fullContent), - ).length; + calculatedCompletionTokens = encode(fullContent).length;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/gateway/src/chat/chat.ts(3 hunks)apps/gateway/src/chat/tools/estimate-tokens.ts(1 hunks)packages/models/src/process-image-url.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/gateway/src/chat/tools/estimate-tokens.ts
🧰 Additional context used
📓 Path-based instructions (4)
{apps/api,apps/gateway,apps/ui,apps/docs,packages}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Always use top-level import; never use require() or dynamic imports (e.g., import(), next/dynamic)
Files:
apps/gateway/src/chat/chat.tspackages/models/src/process-image-url.ts
{apps/api,apps/gateway,packages/db}/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
.findMany() or db().query.
{apps/api,apps/gateway,packages/db}/**/*.ts: Use Drizzle ORM with the latest object syntax for database access
For reads, use db().query..findFirst() Files:
apps/gateway/src/chat/chat.ts**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Never useanyoras anyin this TypeScript project unless absolutely necessary
Always use top-levelimport; never userequireor dynamic importsFiles:
apps/gateway/src/chat/chat.tspackages/models/src/process-image-url.ts{apps/api,apps/gateway}/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
For reads, use db().query.
.findMany() or db().query.
.findFirst() Files:
apps/gateway/src/chat/chat.ts⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: e2e-shards (2)
- GitHub Check: e2e-shards (5)
- GitHub Check: e2e-shards (1)
- GitHub Check: e2e-shards (3)
- GitHub Check: e2e-shards (4)
- GitHub Check: build / run
- GitHub Check: test / run
- GitHub Check: generate / run
- GitHub Check: lint / run
- GitHub Check: autofix
🔇 Additional comments (1)
apps/gateway/src/chat/chat.ts (1)
65-67: LGTM: round length-based token estimate.Keeps integer semantics and avoids zeros with Math.max(1, …).
| promptTokens, | ||
| completionTokens, | ||
| cachedTokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure costs/response use tokens even when providers omit them.
When prompt/completion tokens are null, derive them once and pass consistent numbers into calculateCosts and transformResponseToOpenai.
Apply this diff to use safe values in the two calls:
const costs = calculateCosts(
usedModel,
usedProvider,
- promptTokens,
- completionTokens,
+ safePromptTokens,
+ safeCompletionTokens,
cachedTokens, const transformedResponse = transformResponseToOpenai(
usedProvider,
usedModel,
json,
content,
reasoningContent,
finishReason,
- promptTokens,
- completionTokens,
- (promptTokens || 0) + (completionTokens || 0) + (reasoningTokens || 0),
+ safePromptTokens ?? 0,
+ safeCompletionTokens ?? 0,
+ (safePromptTokens ?? 0) + (safeCompletionTokens ?? 0) + (reasoningTokens || 0),Add these helpers right before the calculateCosts call:
// Compute safe token values only if missing
let safePromptTokens = promptTokens ?? null;
let safeCompletionTokens = completionTokens ?? null;
if (safePromptTokens === null && messages?.length) {
try {
const chatMsgs: ChatMessage[] = messages.map((m) => ({
role: m.role as "user" | "assistant" | "system" | undefined,
content: typeof m.content === "string" ? m.content : JSON.stringify(m.content ?? ""),
name: m.name,
}));
safePromptTokens = encodeChat(chatMsgs, DEFAULT_TOKENIZER_MODEL).length;
} catch (e) {
logger.error("Failed to encode messages (non-streaming)", e instanceof Error ? e : new Error(String(e)));
safePromptTokens = estimateTokensFromContent(messages.map((m) => String(m.content ?? "")).join("\n"));
}
}
if (safeCompletionTokens === null && content) {
try {
safeCompletionTokens = encode(content).length;
} catch (e) {
logger.error("Failed to encode completion (non-streaming)", e instanceof Error ? e : new Error(String(e)));
safeCompletionTokens = estimateTokensFromContent(content);
}
}Also applies to: 3035-3038
🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3017-3019 (and similarly
3035-3038), promptTokens/completionTokens can be null from providers so derive
safe values once and reuse them: add helpers just before the calculateCosts call
to set let safePromptTokens = promptTokens ?? null and let safeCompletionTokens
= completionTokens ?? null, then if safePromptTokens is null and messages exist
compute it by encoding messages via encodeChat with DEFAULT_TOKENIZER_MODEL
falling back to estimateTokensFromContent on error (logging via logger.error);
if safeCompletionTokens is null and content exists compute it via encode with
fallback to estimateTokensFromContent (also logging on error). Finally replace
direct promptTokens/completionTokens uses in calculateCosts and
transformResponseToOpenai with safePromptTokens and safeCompletionTokens so both
functions receive consistent, non-null token counts.
| promptTokens: promptTokens?.toString() || null, | ||
| completionTokens: completionTokens?.toString() || null, | ||
| totalTokens: | ||
| totalTokens || | ||
| ( | ||
| (calculatedPromptTokens || 0) + (calculatedCompletionTokens || 0) | ||
| ).toString(), | ||
| totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(), | ||
| reasoningTokens: reasoningTokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix totalTokens type/consistency and include reasoning fallback.
Currently totalTokens may be a number (not string) and ignores reasoning in fallback; align with other fields.
Apply this diff:
- promptTokens: promptTokens?.toString() || null,
- completionTokens: completionTokens?.toString() || null,
- totalTokens:
- totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(),
+ promptTokens: (safePromptTokens ?? promptTokens)?.toString() || null,
+ completionTokens: (safeCompletionTokens ?? completionTokens)?.toString() || null,
+ totalTokens: (
+ (totalTokens ?? ((safePromptTokens ?? promptTokens ?? 0) + (safeCompletionTokens ?? completionTokens ?? 0) + (reasoningTokens ?? 0)))
+ ).toString(),Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3084 to 3088, totalTokens can be a
number and currently ignores reasoningTokens when falling back; change
totalTokens to follow the same string-or-null pattern as
promptTokens/completionTokens by ensuring it is computed and stored as a string
(or null) and include reasoningTokens in the fallback sum (i.e., sum
promptTokens, completionTokens and reasoningTokens treating missing values as 0,
then call toString() or set null appropriately) so types and formatting match
the other token fields.
|
|
||
| // Validate size (estimate: base64 adds ~33% overhead) | ||
| const estimatedSize = (base64Data.length * 3) / 4; | ||
| const estimatedSize = Math.round((base64Data.length * 3) / 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compute base64 byte size precisely (handle padding/whitespace).
Math.round(len*3/4) can over/under-estimate and cause false 20MB limit rejections. Account for '=' padding and possible whitespace without decoding the payload.
Apply this diff:
- const estimatedSize = Math.round((base64Data.length * 3) / 4);
+ const sanitized = base64Data.replace(/\s/g, "");
+ const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0;
+ const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const estimatedSize = Math.round((base64Data.length * 3) / 4); | |
| const sanitized = base64Data.replace(/\s/g, ""); | |
| const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0; | |
| const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding; |
🤖 Prompt for AI Agents
In packages/models/src/process-image-url.ts around line 31, the current size
estimate uses Math.round(base64.length*3/4) which miscounts when padding or
whitespace are present; instead strip any data URL prefix and all whitespace
from the base64 string, count trailing '=' padding characters (0,1,2) and
compute exact byte length as (cleanLen * 3) / 4 - paddingCount (use integer
math, no decoding), then use that value to enforce the 20MB limit; update the
code to perform these steps so padding/whitespace are handled precisely.
Summary
estimateTokensfunctionChanges
Token Estimation Logic
promptTokensis missing and messages are presentcompletionTokensis missing and content is presentencodeChatfor prompt token estimation andencodefor completion token estimationTest plan
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://www.terragonlabs.com/task/79437a36-00fa-4c42-b027-987164208a0a
Summary by CodeRabbit
Bug Fixes
Bug Fixes / Data Validation