[Bugfix]: missing partial content if openai tool calling is enabled by dr75 · Pull Request #28122 · vllm-project/vllm

dr75 · 2025-11-05T11:37:49Z

Purpose

The OpenAIToolParser overrides the code of regular response parsing when enabled. This breaks partial responses when the token limit is reached and harmony does not generate a final message but only current_content is left, which is ignored.

To solve this, return the parsers current_content as it is also done in the non-tool calling case.

The code should probably be refactored such that the logic from parse_chat_output() is not duplicated here. Actually, most of the parsing is already done there and repeated here. Leaving this refactoring for a separate PR.

Test Plan

Manually tested with chat requests with limited max_tokens while the parser is enabled.
Added test cases.

Test Result

max_completion_tokens=5, stopping in reasoning

--- Request: Tell me something about confidential computing. ---
(Need to)

max_completion_tokens=50, stopping in final message

--- Request: Tell me something about confidential computing. ---
(Need explain concept.)

### What is Confidential Computing?

Confidential computing is a set of hardware‑enforced

gemini-code-assist

Code Review

This pull request effectively addresses a bug where partial content was being dropped when OpenAI tool calling is enabled and the token limit is reached. The fix correctly captures the parser.current_content for partial final messages, aligning its behavior with the non-tool-calling path. The addition of specific test cases for partial responses, both with and without tool calls, is excellent and ensures the fix is well-verified. The code change is logical and directly solves the described issue.

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py

bbrowning · 2025-11-05T15:47:15Z

Do you know if this is also an issue in the streaming tool calling case? Or whether we need a potentially separate fix there?

dr75 · 2025-11-05T16:36:16Z

This is properly handled in the streaming case as tokens from current content are sent while streaming and when ~~the client~~ it aborts due to max tokens reached, everything streamed so far has already been received by the client.

dr75 · 2025-11-05T16:40:28Z

@bbrowning, here is a streaming issue fix that is related but different:
#28139

bbrowning · 2025-11-05T18:22:16Z

Great, thanks for the clarification!

dr75 · 2025-11-28T11:08:29Z

@heheda12345, @yeqcharlotte, would be great if someone could review.

yeqcharlotte

hey @dr75 thanks for the fixes. wonder if you could also double confirm the behavior with the official oai implementation? will they return partial final messages when limit is reached? also please fix the precommit.

dr75 · 2025-12-10T13:59:32Z

@yeqcharlotte the CI failure was unrelated. I rebased, seems fine now.

Will try with the OpenAI API to confirm the behaviour.

dr75 · 2025-12-10T16:31:20Z

Actually the OpenAI API doesn't specify the behaviour of cut-off responses due to max_completion_tokens for reasoning models. The actual implementation shows the same behaviour as in vLLM (before this change) and is inconsistent:

with gpt 4.1 (non-reasoning) I get the incomplete response
with gpt 5.1 (reasoning) I get an empty response
with gpt 5.1 (reasoning and stream) I get the incomplete response

Given that a user has to pay for generating such an incomplete response but doesn't get the generated tokens it seems to be incorrect. Considering very long context responses such as long summaries of documents, it also renders max_completion_tokens nearly useless as a user risks getting nothing if the limit is too low so I have to specify a very high limit. It is also confusing as the first assumption is that the model spent all the time with reasoning. A workaround is to switch to streaming responses where I do get the partial response.

As the problem only appears for reasoning models in the non-streaming case I think that it is an issue in the OpenAI implementation.

I think we should provide a more consistent implementation. Also, the problem does not occur when tool calling is disabled making it even more inconsistent.

@yeqcharlotte , wdyt?

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

mergify · 2025-12-21T02:44:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

dr75 requested review from aarnphm and chaunceyjiang as code owners November 5, 2025 11:37

mergify bot added frontend tool-calling labels Nov 5, 2025

github-project-automation bot added this to Tool Calling Nov 5, 2025

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py Outdated Show resolved Hide resolved

heheda12345 requested a review from yeqcharlotte November 5, 2025 19:28

yeqcharlotte requested changes Dec 9, 2025

View reviewed changes

dr75 force-pushed the gpt-oss-tools-parser branch from 1174eb2 to 738ffd4 Compare December 10, 2025 13:00

dr75 requested a review from yeqcharlotte December 10, 2025 16:32

dr75 added 3 commits December 16, 2025 13:56

[Bugfix]: missing partial content when gpt-oss tool calling is enabled

fe70594

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

test

59dd664

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

typo

006b024

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

dr75 force-pushed the gpt-oss-tools-parser branch from 738ffd4 to 006b024 Compare December 16, 2025 13:58

mergify bot added the needs-rebase label Dec 21, 2025

mergify bot added the bug Something isn't working label Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix]: missing partial content if openai tool calling is enabled#28122

[Bugfix]: missing partial content if openai tool calling is enabled#28122
dr75 wants to merge 3 commits intovllm-project:mainfrom
dr75:gpt-oss-tools-parser

dr75 commented Nov 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

bbrowning commented Nov 5, 2025

Uh oh!

dr75 commented Nov 5, 2025 •

edited

Loading

Uh oh!

dr75 commented Nov 5, 2025

Uh oh!

bbrowning commented Nov 5, 2025

Uh oh!

dr75 commented Nov 28, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

dr75 commented Dec 10, 2025 •

edited

Loading

Uh oh!

dr75 commented Dec 10, 2025 •

edited

Loading

Uh oh!

mergify bot commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dr75 commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

bbrowning commented Nov 5, 2025

Uh oh!

dr75 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Nov 5, 2025

Uh oh!

bbrowning commented Nov 5, 2025

Uh oh!

dr75 commented Nov 28, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

dr75 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dr75 commented Nov 5, 2025 •

edited by github-actions bot

Loading

dr75 commented Nov 5, 2025 •

edited

Loading

dr75 commented Dec 10, 2025 •

edited

Loading

dr75 commented Dec 10, 2025 •

edited

Loading