Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 8 additions & 27 deletions apps/gateway/src/chat/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3011,25 +3011,11 @@ chat.openapi(completions, async (c) => {
images,
} = parseProviderResponse(usedProvider, json, messages);

// Debug: Log images found in response
logger.debug("Gateway - parseProviderResponse extracted images", { images });
logger.debug("Gateway - Used provider", { usedProvider });
logger.debug("Gateway - Used model", { usedModel });

// Estimate tokens if not provided by the API
const { calculatedPromptTokens, calculatedCompletionTokens } = estimateTokens(
usedProvider,
messages,
content,
promptTokens,
completionTokens,
);

const costs = calculateCosts(
usedModel,
usedProvider,
calculatedPromptTokens,
calculatedCompletionTokens,
promptTokens,
completionTokens,
cachedTokens,
Comment on lines +3017 to 3019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure costs/response use tokens even when providers omit them.

When prompt/completion tokens are null, derive them once and pass consistent numbers into calculateCosts and transformResponseToOpenai.

Apply this diff to use safe values in the two calls:

   const costs = calculateCosts(
     usedModel,
     usedProvider,
-    promptTokens,
-    completionTokens,
+    safePromptTokens,
+    safeCompletionTokens,
     cachedTokens,
   const transformedResponse = transformResponseToOpenai(
     usedProvider,
     usedModel,
     json,
     content,
     reasoningContent,
     finishReason,
-    promptTokens,
-    completionTokens,
-    (promptTokens || 0) + (completionTokens || 0) + (reasoningTokens || 0),
+    safePromptTokens ?? 0,
+    safeCompletionTokens ?? 0,
+    (safePromptTokens ?? 0) + (safeCompletionTokens ?? 0) + (reasoningTokens || 0),

Add these helpers right before the calculateCosts call:

// Compute safe token values only if missing
let safePromptTokens = promptTokens ?? null;
let safeCompletionTokens = completionTokens ?? null;

if (safePromptTokens === null && messages?.length) {
  try {
    const chatMsgs: ChatMessage[] = messages.map((m) => ({
      role: m.role as "user" | "assistant" | "system" | undefined,
      content: typeof m.content === "string" ? m.content : JSON.stringify(m.content ?? ""),
      name: m.name,
    }));
    safePromptTokens = encodeChat(chatMsgs, DEFAULT_TOKENIZER_MODEL).length;
  } catch (e) {
    logger.error("Failed to encode messages (non-streaming)", e instanceof Error ? e : new Error(String(e)));
    safePromptTokens = estimateTokensFromContent(messages.map((m) => String(m.content ?? "")).join("\n"));
  }
}

if (safeCompletionTokens === null && content) {
  try {
    safeCompletionTokens = encode(content).length;
  } catch (e) {
    logger.error("Failed to encode completion (non-streaming)", e instanceof Error ? e : new Error(String(e)));
    safeCompletionTokens = estimateTokensFromContent(content);
  }
}

Also applies to: 3035-3038

🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3017-3019 (and similarly
3035-3038), promptTokens/completionTokens can be null from providers so derive
safe values once and reuse them: add helpers just before the calculateCosts call
to set let safePromptTokens = promptTokens ?? null and let safeCompletionTokens
= completionTokens ?? null, then if safePromptTokens is null and messages exist
compute it by encoding messages via encodeChat with DEFAULT_TOKENIZER_MODEL
falling back to estimateTokensFromContent on error (logging via logger.error);
if safeCompletionTokens is null and content exists compute it via encode with
fallback to estimateTokensFromContent (also logging on error). Finally replace
direct promptTokens/completionTokens uses in calculateCosts and
transformResponseToOpenai with safePromptTokens and safeCompletionTokens so both
functions receive consistent, non-null token counts.

{
prompt: messages.map((m) => m.content).join("\n"),
Expand All @@ -3046,11 +3032,9 @@ chat.openapi(completions, async (c) => {
content,
reasoningContent,
finishReason,
calculatedPromptTokens,
calculatedCompletionTokens,
(calculatedPromptTokens || 0) +
(calculatedCompletionTokens || 0) +
(reasoningTokens || 0),
promptTokens,
completionTokens,
(promptTokens || 0) + (completionTokens || 0) + (reasoningTokens || 0),
reasoningTokens,
cachedTokens,
toolResults,
Expand Down Expand Up @@ -3097,13 +3081,10 @@ chat.openapi(completions, async (c) => {
content: content,
reasoningContent: reasoningContent,
finishReason: finishReason,
promptTokens: calculatedPromptTokens?.toString() || null,
completionTokens: calculatedCompletionTokens?.toString() || null,
promptTokens: promptTokens?.toString() || null,
completionTokens: completionTokens?.toString() || null,
totalTokens:
totalTokens ||
(
(calculatedPromptTokens || 0) + (calculatedCompletionTokens || 0)
).toString(),
totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(),
reasoningTokens: reasoningTokens,
Comment on lines +3084 to 3088
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix totalTokens type/consistency and include reasoning fallback.

Currently totalTokens may be a number (not string) and ignores reasoning in fallback; align with other fields.

Apply this diff:

-    promptTokens: promptTokens?.toString() || null,
-    completionTokens: completionTokens?.toString() || null,
-    totalTokens:
-      totalTokens || ((promptTokens || 0) + (completionTokens || 0)).toString(),
+    promptTokens: (safePromptTokens ?? promptTokens)?.toString() || null,
+    completionTokens: (safeCompletionTokens ?? completionTokens)?.toString() || null,
+    totalTokens: (
+      (totalTokens ?? ((safePromptTokens ?? promptTokens ?? 0) + (safeCompletionTokens ?? completionTokens ?? 0) + (reasoningTokens ?? 0)))
+    ).toString(),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In apps/gateway/src/chat/chat.ts around lines 3084 to 3088, totalTokens can be a
number and currently ignores reasoningTokens when falling back; change
totalTokens to follow the same string-or-null pattern as
promptTokens/completionTokens by ensuring it is computed and stored as a string
(or null) and include reasoningTokens in the fallback sum (i.e., sum
promptTokens, completionTokens and reasoningTokens treating missing values as 0,
then call toString() or set null appropriately) so types and formatting match
the other token fields.

cachedTokens: cachedTokens?.toString() || null,
hasError: false,
Expand Down
74 changes: 35 additions & 39 deletions apps/gateway/src/chat/tools/estimate-tokens.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,47 +19,43 @@ export function estimateTokens(
let calculatedPromptTokens = promptTokens;
let calculatedCompletionTokens = completionTokens;

// Always estimate missing tokens for any provider
if (!promptTokens || !completionTokens) {
// Estimate prompt tokens using encodeChat for better accuracy
if (!promptTokens && messages && messages.length > 0) {
try {
// Convert messages to the format expected by gpt-tokenizer
const chatMessages: ChatMessage[] = messages.map((m) => ({
role: m.role,
content:
typeof m.content === "string"
? m.content
: JSON.stringify(m.content),
name: m.name,
}));
calculatedPromptTokens = encodeChat(
chatMessages,
DEFAULT_TOKENIZER_MODEL,
).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode chat messages in estimate tokens",
error instanceof Error ? error : new Error(String(error)),
);
calculatedPromptTokens =
messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4;
}
// Estimate prompt tokens only if not provided by the API
if (!promptTokens && messages && messages.length > 0) {
try {
// Convert messages to the format expected by gpt-tokenizer
const chatMessages: ChatMessage[] = messages.map((m) => ({
role: m.role,
content:
typeof m.content === "string" ? m.content : JSON.stringify(m.content),
name: m.name,
}));
calculatedPromptTokens = encodeChat(
chatMessages,
DEFAULT_TOKENIZER_MODEL,
).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode chat messages in estimate tokens",
error instanceof Error ? error : new Error(String(error)),
);
calculatedPromptTokens = Math.round(
messages.reduce((acc, m) => acc + (m.content?.length || 0), 0) / 4,
);
}
}

// Estimate completion tokens using encode for better accuracy
if (!completionTokens && content) {
try {
calculatedCompletionTokens = encode(JSON.stringify(content)).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode completion text",
error instanceof Error ? error : new Error(String(error)),
);
calculatedCompletionTokens = content.length / 4;
}
// Estimate completion tokens only if not provided by the API
if (!completionTokens && content) {
try {
calculatedCompletionTokens = encode(JSON.stringify(content)).length;
} catch (error) {
// Fallback to simple estimation if encoding fails
logger.error(
"Failed to encode completion text",
error instanceof Error ? error : new Error(String(error)),
);
calculatedCompletionTokens = Math.round(content.length / 4);
}
}

Expand Down
2 changes: 1 addition & 1 deletion packages/models/src/process-image-url.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ export async function processImageUrl(
const base64Data = isBase64 ? data : btoa(data);

// Validate size (estimate: base64 adds ~33% overhead)
const estimatedSize = (base64Data.length * 3) / 4;
const estimatedSize = Math.round((base64Data.length * 3) / 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Compute base64 byte size precisely (handle padding/whitespace).

Math.round(len*3/4) can over/under-estimate and cause false 20MB limit rejections. Account for '=' padding and possible whitespace without decoding the payload.

Apply this diff:

-    const estimatedSize = Math.round((base64Data.length * 3) / 4);
+    const sanitized = base64Data.replace(/\s/g, "");
+    const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0;
+    const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const estimatedSize = Math.round((base64Data.length * 3) / 4);
const sanitized = base64Data.replace(/\s/g, "");
const padding = sanitized.endsWith("==") ? 2 : sanitized.endsWith("=") ? 1 : 0;
const estimatedSize = Math.floor((sanitized.length * 3) / 4) - padding;
🤖 Prompt for AI Agents
In packages/models/src/process-image-url.ts around line 31, the current size
estimate uses Math.round(base64.length*3/4) which miscounts when padding or
whitespace are present; instead strip any data URL prefix and all whitespace
from the base64 string, count trailing '=' padding characters (0,1,2) and
compute exact byte length as (cleanLen * 3) / 4 - paddingCount (use integer
math, no decoding), then use that value to enforce the 20MB limit; update the
code to perform these steps so padding/whitespace are handled precisely.

if (estimatedSize > 20 * 1024 * 1024) {
logger.warn("Data URL image size exceeds limit", { estimatedSize });
throw new Error("Image size exceeds 20MB limit");
Expand Down
Loading