Skip to content

Conversation

@steebchen
Copy link
Member

@steebchen steebchen commented Sep 23, 2025

Summary

  • Refines token estimation logic in estimateTokens function
  • Estimates prompt tokens only if not provided by the API
  • Estimates completion tokens only if not provided by the API
  • Removes redundant estimation when tokens are already available
  • Adds error handling with fallback for encoding failures

Changes

Token Estimation Logic

  • Updated to estimate prompt tokens only when promptTokens is missing and messages are present
  • Updated to estimate completion tokens only when completionTokens is missing and content is present
  • Uses encodeChat for prompt token estimation and encode for completion token estimation
  • Logs errors and falls back to simple length-based estimation if encoding fails

Test plan

  • Verify prompt tokens are estimated only when missing
  • Verify completion tokens are estimated only when missing
  • Confirm error handling and fallback logic works by simulating encoding failures
  • Ensure no redundant token estimation occurs when tokens are provided by the API

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/79437a36-00fa-4c42-b027-987164208a0a

Summary by CodeRabbit

  • Bug Fixes

    • Token usage is now estimated only when the API omits counts, improving accuracy of reported usage and quotas.
    • Fallback estimates for both prompt and completion tokens use rounded values for more consistent results.
    • Chat totals now rely on provided token counts to reduce misreporting.
  • Bug Fixes / Data Validation

    • Base64 image size estimation now rounds to an integer before enforcing the size limit.

- Removed redundant estimation of tokens when promptTokens and completionTokens are provided by the API.
- Now only estimate prompt tokens if promptTokens is missing and messages are available.
- Only estimate completion tokens if completionTokens is missing and content is available.
- Improved code clarity by separating conditions for prompt and completion token estimation.
- Maintained fallback logic with error logging for encoding failures.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@bunnyshell
Copy link

bunnyshell bot commented Sep 23, 2025

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Separated prompt and completion token estimation into independent guarded paths and switched chat flows to prefer provided token fields; also changed base64 size calculation to round to an integer. Logging and per-path error fallbacks were adjusted accordingly.

Changes

Cohort / File(s) Summary
Token estimation helper
apps/gateway/src/chat/tools/estimate-tokens.ts
Split combined estimation into two isolated guards: estimate promptTokens only when missing (and messages exist) and estimate completionTokens only when missing (and content exists). Encode-based calculations retained; failure fallbacks now use Math.round for length-based estimates. Comments updated.
Chat request/flow
apps/gateway/src/chat/chat.ts
Replaced use of calculatedPromptTokens/calculatedCompletionTokens with direct use of provided promptTokens and completionTokens across streaming and non-streaming paths. Removed some debug logs and removed calls to the combined estimation paths; simplified total token computation to sum provided fields (plus reasoningTokens).
Image size calculation
packages/models/src/process-image-url.ts
Changed base64 data URL estimated size calculation to use Math.round(base64Data.length * 3 / 4) (integer rounding) while preserving existing validation and 20 MB limit checks.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Chat as ChatService
  participant Estimator as estimateTokens()
  participant Encoder as TokenEncoder
  participant Logger

  Client->>Chat: send request (may include promptTokens, completionTokens)
  alt both tokens provided
    Chat-->>Client: proceed using provided promptTokens & completionTokens
  else any token missing
    Chat->>Estimator: estimateTokens(input)
    alt promptTokens missing
      Estimator->>Encoder: encodeChat(messages)
      alt encode succeeds
        Encoder-->>Estimator: promptCount
      else encode fails
        Estimator->>Logger: log prompt encode error
        Estimator-->>Estimator: promptCount = Math.round(fallback)
      end
    end
    alt completionTokens missing
      Estimator->>Encoder: encode(JSON.stringify(content))
      alt encode succeeds
        Encoder-->>Estimator: completionCount
      else encode fails
        Estimator->>Logger: log completion encode error
        Estimator-->>Estimator: completionCount = Math.round(fallback)
      end
    end
    Estimator-->>Chat: return promptTokens, completionTokens (only those calculated)
    Chat-->>Client: continue using provided + calculated token fields
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "fix(token-estimation): refine logic" is concise and accurately summarizes the primary change — refinements to the token-estimation logic — using a conventional-commit style prefix and clear scope. It directly reflects the estimateTokens-focused updates described in the PR summary and avoids vague terms or noisy details. This makes the intent easy to understand for teammates scanning history.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/fix-token-estimation-logic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot changed the title Fix token estimation logic to only estimate missing tokens fix(token-estimation): refine logic Sep 23, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df7dffa and c2f7be5.

📒 Files selected for processing (1)
  • apps/gateway/src/chat/tools/estimate-tokens.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
{apps/api,apps/gateway,apps/ui,apps/docs,packages}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Always use top-level import; never use require() or dynamic imports (e.g., import(), next/dynamic)

Files:

  • apps/gateway/src/chat/tools/estimate-tokens.ts
{apps/api,apps/gateway,packages/db}/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

{apps/api,apps/gateway,packages/db}/**/*.ts: Use Drizzle ORM with the latest object syntax for database access
For reads, use db().query.

.findMany() or db().query.
.findFirst()

Files:

  • apps/gateway/src/chat/tools/estimate-tokens.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Never use any or as any in this TypeScript project unless absolutely necessary
Always use top-level import; never use require or dynamic imports

Files:

  • apps/gateway/src/chat/tools/estimate-tokens.ts
{apps/api,apps/gateway}/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

For reads, use db().query.

.findMany() or db().query.
.findFirst()

Files:

  • apps/gateway/src/chat/tools/estimate-tokens.ts
🧬 Code graph analysis (1)
apps/gateway/src/chat/tools/estimate-tokens.ts (2)
apps/gateway/src/chat/tools/types.ts (2)
  • ChatMessage (4-8)
  • DEFAULT_TOKENIZER_MODEL (1-1)
packages/logger/src/index.ts (2)
  • error (147-154)
  • logger (175-175)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: e2e-shards (5)
  • GitHub Check: e2e-shards (2)
  • GitHub Check: e2e-shards (4)
  • GitHub Check: e2e-shards (1)
  • GitHub Check: e2e-shards (3)
  • GitHub Check: autofix
  • GitHub Check: test / run
  • GitHub Check: build / run
  • GitHub Check: generate / run
  • GitHub Check: lint / run

Comment on lines 22 to 45
// Estimate prompt tokens only if not provided by the API
if (!promptTokens && messages && messages.length > 0) {
try {
// Convert messages to the format expected by gpt-tokenizer
const chatMessages: ChatMessage[] = messages.map((m) => ({
role: m.role,
content:
typeof m.content === "string" ? m.content : JSON.stringify(m.content),
name: m.name,
}));
calculatedPromptTokens = encodeChat(
chatMessages,
DEFAULT_TOKENIZER_MODEL,
).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode chat messages in estimate tokens",
error instanceof Error ? error : new Error(String(error)),
);
calculatedPromptTokens =
messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard for “missing” must not treat 0 as missing; fix fallback undercount and return integers

  • Line 23: Using !promptTokens re-estimates when the API legitimately returns 0. Use a null/undefined check.
  • Lines 42-44: Fallback undercounts when content is non-string (objects/arrays). Mirror the encoding path and stringify per‑message before length. Also round to an integer token count.

Apply this diff:

- // Estimate prompt tokens only if not provided by the API
- if (!promptTokens && messages && messages.length > 0) {
+ // Estimate prompt tokens only if not provided by the API
+ if (promptTokens == null && messages?.length > 0) {
   try {
@@
-      calculatedPromptTokens =
-        messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4;
+      calculatedPromptTokens = Math.round(
+        messages.reduce((acc, m) => {
+          const text =
+            typeof m.content === "string" ? m.content : JSON.stringify(m.content);
+          return acc + text.length;
+        }, 0) / 4
+      );
   }
 }

Additionally: the parameter messages: any[] violates the project guideline to avoid any. Consider typing it (e.g., a ChatMessageLike with content: unknown) and narrowing in place.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Estimate prompt tokens only if not provided by the API
if (!promptTokens && messages && messages.length > 0) {
try {
// Convert messages to the format expected by gpt-tokenizer
const chatMessages: ChatMessage[] = messages.map((m) => ({
role: m.role,
content:
typeof m.content === "string" ? m.content : JSON.stringify(m.content),
name: m.name,
}));
calculatedPromptTokens = encodeChat(
chatMessages,
DEFAULT_TOKENIZER_MODEL,
).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode chat messages in estimate tokens",
error instanceof Error ? error : new Error(String(error)),
);
calculatedPromptTokens =
messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4;
}
}
// Estimate prompt tokens only if not provided by the API
if (promptTokens == null && messages?.length > 0) {
try {
// Convert messages to the format expected by gpt-tokenizer
const chatMessages: ChatMessage[] = messages.map((m) => ({
role: m.role,
content:
typeof m.content === "string" ? m.content : JSON.stringify(m.content),
name: m.name,
}));
calculatedPromptTokens = encodeChat(
chatMessages,
DEFAULT_TOKENIZER_MODEL,
).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode chat messages in estimate tokens",
error instanceof Error ? error : new Error(String(error)),
);
calculatedPromptTokens = Math.round(
messages.reduce((acc, m) => {
const text =
typeof m.content === "string" ? m.content : JSON.stringify(m.content);
return acc + text.length;
}, 0) / 4
);
}
}

Comment on lines 47 to 59
// Estimate completion tokens only if not provided by the API
if (!completionTokens && content) {
try {
calculatedCompletionTokens = encode(JSON.stringify(content)).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode completion text",
error instanceof Error ? error : new Error(String(error)),
);
calculatedCompletionTokens = content.length / 4;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix completion-path guard and encoding; avoid JSON.stringify and round fallback

  • Line 48: !completionTokens incorrectly treats 0 as missing and skips empty-string content. Use a null/undefined check and allow empty strings.
  • Line 50: encode(JSON.stringify(content)) overcounts by adding quotes/escapes; pass the string directly.
  • Line 57: Round the heuristic to an integer.

Apply this diff:

- // Estimate completion tokens only if not provided by the API
- if (!completionTokens && content) {
+ // Estimate completion tokens only if not provided by the API
+ if (completionTokens == null && content != null) {
   try {
-    calculatedCompletionTokens = encode(JSON.stringify(content)).length;
+    calculatedCompletionTokens = encode(content).length;
   } catch (error) {
@@
-    calculatedCompletionTokens = content.length / 4;
+    calculatedCompletionTokens = Math.round(content.length / 4);
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Estimate completion tokens only if not provided by the API
if (!completionTokens && content) {
try {
calculatedCompletionTokens = encode(JSON.stringify(content)).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode completion text",
error instanceof Error ? error : new Error(String(error)),
);
calculatedCompletionTokens = content.length / 4;
}
}
// Estimate completion tokens only if not provided by the API
if (completionTokens == null && content != null) {
try {
calculatedCompletionTokens = encode(content).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode completion text",
error instanceof Error ? error : new Error(String(error)),
);
calculatedCompletionTokens = Math.round(content.length / 4);
}
}
🤖 Prompt for AI Agents
In apps/gateway/src/chat/tools/estimate-tokens.ts around lines 47 to 59, the
guard `!completionTokens` treats 0 as missing and the encoder call uses
JSON.stringify which inflates token count and the fallback heuristic isn't
rounded; change the guard to check for null/undefined (e.g., completionTokens ==
null) so zero is allowed, only run the block when content is not null/undefined
(allow empty strings), call encode on the string content directly (e.g.,
encode(String(content))) instead of JSON.stringify(content), and round the
fallback heuristic to an integer (e.g., Math.round(content.length / 4)).

Rounded token estimation values in `estimateTokens` to ensure accurate integer predictions for both prompt and completion tokens. Enhanced size estimation for base64 data in `process-image-url` by rounding results.
Removed redundant token estimation when `promptTokens` and `completionTokens` are provided. Simplified token-related calculations and ensured consistent handling across all relevant functions.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/models/src/process-image-url.ts (1)

51-69: Harden against SSRF (block private/link-local IPs even over HTTPS).

Fetching user-supplied URLs can still hit internal services (e.g., 169.254.169.254) via HTTPS. Add DNS/IP resolution guards and allow-list public IP ranges or domains.

I can provide a helper to resolve hostnames and reject private/reserved CIDRs if you want.

apps/gateway/src/chat/chat.ts (1)

2525-2533: Avoid any[]; use ChatMessage[] per project TS guidelines.

You already import ChatMessage; use it here.

Apply this diff:

-              const chatMessages: any[] = messages.map((m) => ({
+              const chatMessages: ChatMessage[] = messages.map((m) => ({
                 role: m.role as "user" | "assistant" | "system" | undefined,
                 content: m.content || "",
                 name: m.name,
               }));
🧹 Nitpick comments (1)
apps/gateway/src/chat/chat.ts (1)

2548-2551: Don’t JSON.stringify completion before encoding.

JSON.stringify(fullContent) inflates tokens. Encode the raw text.

Apply this diff:

-              calculatedCompletionTokens = encode(
-                JSON.stringify(fullContent),
-              ).length;
+              calculatedCompletionTokens = encode(fullContent).length;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2f7be5 and 2008509.

📒 Files selected for processing (3)
  • apps/gateway/src/chat/chat.ts (3 hunks)
  • apps/gateway/src/chat/tools/estimate-tokens.ts (1 hunks)
  • packages/models/src/process-image-url.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/gateway/src/chat/tools/estimate-tokens.ts
🧰 Additional context used
📓 Path-based instructions (4)
{apps/api,apps/gateway,apps/ui,apps/docs,packages}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Always use top-level import; never use require() or dynamic imports (e.g., import(), next/dynamic)

Files:

  • apps/gateway/src/chat/chat.ts
  • packages/models/src/process-image-url.ts
{apps/api,apps/gateway,packages/db}/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

{apps/api,apps/gateway,packages/db}/**/*.ts: Use Drizzle ORM with the latest object syntax for database access
For reads, use db().query.

.findMany() or db().query.
.findFirst()

Files:

  • apps/gateway/src/chat/chat.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Never use any or as any in this TypeScript project unless absolutely necessary
Always use top-level import; never use require or dynamic imports

Files:

  • apps/gateway/src/chat/chat.ts
  • packages/models/src/process-image-url.ts
{apps/api,apps/gateway}/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

For reads, use db().query.

.findMany() or db().query.
.findFirst()

Files:

  • apps/gateway/src/chat/chat.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: e2e-shards (2)
  • GitHub Check: e2e-shards (5)
  • GitHub Check: e2e-shards (1)
  • GitHub Check: e2e-shards (3)
  • GitHub Check: e2e-shards (4)
  • GitHub Check: build / run
  • GitHub Check: test / run
  • GitHub Check: generate / run
  • GitHub Check: lint / run
  • GitHub Check: autofix
🔇 Additional comments (1)
apps/gateway/src/chat/chat.ts (1)

65-67: LGTM: round length-based token estimate.

Keeps integer semantics and avoids zeros with Math.max(1, …).

Comment on lines +3017 to 3019
promptTokens,
completionTokens,
cachedTokens,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure costs/response use tokens even when providers omit them.

When prompt/completion tokens are null, derive them once and pass consistent numbers into calculateCosts and transformResponseToOpenai.

Apply this diff to use safe values in the two calls:

   const costs = calculateCosts(
     usedModel,
     usedProvider,
-    promptTokens,
-    completionTokens,
+    safePromptTokens,
+    safeCompletionTokens,
     cachedTokens,
   const transformedResponse = transformResponseToOpenai(
     usedProvider,
     usedModel,
     json,
     content,
     reasoningContent,
     finishReason,
-    promptTokens,
-    completionTokens,
-    (promptTokens || 0) + (completionTokens || 0) + (reasoningTokens || 0),
+    safePromptTokens ?? 0,
+    safeCompletionTokens ?? 0,
+    (safePromptTokens ?? 0) + (safeCompletionTokens ?? 0) + (reasoningTokens || 0),

Add these helpers right before the calculateCosts call:

// Compute safe token values only if missing
let safePromptTokens = promptTokens ?? null;
let safeCompletionTokens = completionTokens ?? null;

if (safePromptTokens === null && messages?.length) {
  try {
    const chatMsgs: ChatMessage[] = messages.map((m) => ({
      role: m.role as "user" | "assistant" | "system" | undefined,
      content: typeof m.content === "string" ? m.content : JSON.stringify(m.content ?? ""),
      name: m.name,
    }));
    safePromptTokens = encodeChat(chatMsgs, DEFAULT_TOKENIZER_MODEL).length;
  } catch (e) {
    logger.error("Failed to encode messages (non-streaming)", e instanceof Error ? e : new Error(String(e)));
    safePromptTokens = estimateTokensFromContent(messages.map((m) => String(m.content ?? "")).join("\n"));
  }
}

if (safeCompletionTokens === null && content) {
  try {
    safeCompletionTokens = encode(content).length;
  } catch (e) {
    logger.error("Failed to encode completion (non-streaming)", e instanceof Error ? e : new Error(String(e)));
    safeCompletionTokens = estimateTokensFromContent(content);
  }
}

Also applies to: 3035-3038

🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3017-3019 (and similarly
3035-3038), promptTokens/completionTokens can be null from providers so derive
safe values once and reuse them: add helpers just before the calculateCosts call
to set let safePromptTokens = promptTokens ?? null and let safeCompletionTokens
= completionTokens ?? null, then if safePromptTokens is null and messages exist
compute it by encoding messages via encodeChat with DEFAULT_TOKENIZER_MODEL
falling back to estimateTokensFromContent on error (logging via logger.error);
if safeCompletionTokens is null and content exists compute it via encode with
fallback to estimateTokensFromContent (also logging on error). Finally replace
direct promptTokens/completionTokens uses in calculateCosts and
transformResponseToOpenai with safePromptTokens and safeCompletionTokens so both
functions receive consistent, non-null token counts.

Comment on lines +3084 to 3088
promptTokens: promptTokens?.toString() || null,
completionTokens: completionTokens?.toString() || null,
totalTokens:
totalTokens ||
(
(calculatedPromptTokens || 0) + (calculatedCompletionTokens || 0)
).toString(),
totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(),
reasoningTokens: reasoningTokens,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix totalTokens type/consistency and include reasoning fallback.

Currently totalTokens may be a number (not string) and ignores reasoning in fallback; align with other fields.

Apply this diff:

-    promptTokens: promptTokens?.toString() || null,
-    completionTokens: completionTokens?.toString() || null,
-    totalTokens:
-      totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(),
+    promptTokens: (safePromptTokens ?? promptTokens)?.toString() || null,
+    completionTokens: (safeCompletionTokens ?? completionTokens)?.toString() || null,
+    totalTokens: (
+      (totalTokens ?? ((safePromptTokens ?? promptTokens ?? 0) + (safeCompletionTokens ?? completionTokens ?? 0) + (reasoningTokens ?? 0)))
+    ).toString(),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3084 to 3088, totalTokens can be a
number and currently ignores reasoningTokens when falling back; change
totalTokens to follow the same string-or-null pattern as
promptTokens/completionTokens by ensuring it is computed and stored as a string
(or null) and include reasoningTokens in the fallback sum (i.e., sum
promptTokens, completionTokens and reasoningTokens treating missing values as 0,
then call toString() or set null appropriately) so types and formatting match
the other token fields.


// Validate size (estimate: base64 adds ~33% overhead)
const estimatedSize = (base64Data.length * 3) / 4;
const estimatedSize = Math.round((base64Data.length * 3) / 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Compute base64 byte size precisely (handle padding/whitespace).

Math.round(len*3/4) can over/under-estimate and cause false 20MB limit rejections. Account for '=' padding and possible whitespace without decoding the payload.

Apply this diff:

-    const estimatedSize = Math.round((base64Data.length * 3) / 4);
+    const sanitized = base64Data.replace(/\s/g, "");
+    const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0;
+    const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const estimatedSize = Math.round((base64Data.length * 3) / 4);
const sanitized = base64Data.replace(/\s/g, "");
const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0;
const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding;
🤖 Prompt for AI Agents
In packages/models/src/process-image-url.ts around line 31, the current size
estimate uses Math.round(base64.length*3/4) which miscounts when padding or
whitespace are present; instead strip any data URL prefix and all whitespace
from the base64 string, count trailing '=' padding characters (0,1,2) and
compute exact byte length as (cleanLen * 3) / 4 - paddingCount (use integer
math, no decoding), then use that value to enforce the 20MB limit; update the
code to perform these steps so padding/whitespace are handled precisely.

@steebchen steebchen self-assigned this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant