[Frontend][Bug] allow tool calls in analysis channel by dr75 · Pull Request #28139 · vllm-project/vllm

dr75 · 2025-11-05T16:37:27Z

Purpose

The gpt-oss streaming parser does not handle tool calls in the analysis channel. However, those calls can happen as explained here.

This especially happens when harmony renders the recipient in previous tool calls ~~in the analysis channel~~ before the channel token, ~~convincing~~ "confusing" the model which returns tool calls in the analysis channel. However, the harmony parser is built to handle both cases, such that the vLLM parser should also support both.

The fix proposed in this PR aims at changing the message rendering in harmony (for converting the history to tokens) to avoid confusing the model. While this fixes the issue it seems to not address the root cause: even with such a change, the model may emit tool calls in the analysis channel as described in the OpenAI docs.

To solve this, I propose to handle also tool calls in the analysis channel.

To review changes, you may start at the first commit, which is the actual fix. Following commits are refactoring and testing changes as described below.

Test Plan

The previous state of the code did not allow for unit testing. I added a temporary automated test using an actual request that reliably fails without the fix. This requires a large context window and reducing the request size doesn't work. As this does not work for CI, I only use it to validate the fix and removed it in the last commit. See the respective commit for the test and checkout/run for validation.
Extracted the stream parsing from serving_chat.py‎ to a new file serving_chat_stream_harmony.py and added a unit test for testing this part in isolation with a test case for testing tools in the analysis channel.
Manually tested with repeated tool calls, which are causing issues.
Also tested with the script provided in [Bug] Fix gpt-oss missing tool content #24954.

gemini-code-assist

Code Review

This pull request addresses a bug where tool calls in the analysis channel were not being handled by the gpt-oss streaming parser. The fix correctly extends the tool call handling logic to include the analysis channel, in addition to the commentary channel. This is a robust solution that aligns with the OpenAI Harmony documentation. The addition of a complex streaming test case that reproduces the issue is a great way to ensure the fix is effective. However, I have one concern regarding the resource requirements of the new test, which I've detailed in a specific comment.

tests/entrypoints/openai/test_serving_chat.py

gemini-code-assist

Code Review

This pull request addresses a bug where tool calls in the analysis channel for gpt-oss models were not being handled correctly by the streaming parser. The proposed solution correctly extends the existing tool call parsing logic for the commentary channel to also include the analysis channel. The change in vllm/entrypoints/openai/serving_chat.py is well-targeted and correctly re-prioritizes the logic to check for tool calls in the analysis channel before treating it as reasoning content. A new complex streaming test case has been added in tests/entrypoints/openai/test_serving_chat.py, which effectively validates the fix. The changes are sound and I have not identified any issues of high or critical severity.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/entrypoints/openai/test_serving_chat.py

alecsolder · 2025-11-05T18:25:55Z

Hrm I feel like the issue here is actually that we are putting these tool calls on the analysis channel in the first place. Do you know why that happens in the first place vs it always being on the commentary channel? I couldn't figure it out from a quick look at the code

dr75 · 2025-11-05T19:36:34Z

See the OpenAI docs and the limited docs and comments in the harmony repo. The problem is that you cannot force the model to put the recipient where it does, except you would use some guided decoding. For that reason, harmony allows for both.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well. But in any case it wouldn't be a solution for every case as internal tools seem to be emitted on the analysis channel (see open ai docs).

As long as you don't get a different answer from OpenAI on the respective Harmony PR, this fix here seems to be the right solution to me. And I guess other frameworks are doing it the same way. But I didn't check.

alecsolder · 2025-11-05T21:50:56Z

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

dr75 · 2025-11-06T07:16:52Z

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

You are right and my description was not really correct mixing the rendering with the output. The rendering "issue" (not sure if it's really an issue) is that harmony renders the recipient before the channel token. So when we use a commentary message, it will first render the recipient and then the channel token:

<|start|>assistant to=functions.lookup_weather<|channel|>commentary

This seems what leads to the model switching to output tools in the analysis channel (well I am speculating here, so maybe that's not right) and what this PR is trying to solve.

However, even if we could change that, tools may still appear in any channel and we would want to handle them in any case. So it doesn't affect this PR I would say and I am only attempting to explain what might be happening here.

Also, I don't know if such a change to the rendering is a good idea as it may affect model performance (assuming harmony was used during training; maybe negligible impact maybe not) and prefix caching (if the model constantly outputs x and harmony renders y then the cache is gone from that location). So a decision whether to change it is more complex and should take that into account.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

Didn't know that, but I think should_drop_analysis is important here. Would have to take a closer look.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

Sounds reasonable to me.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

Makes sense! Do you mean changing the solution here or keeping as is? Also see the code for non-streaming in OpenAIToolParser, which is what you describe.

levunet · 2025-11-06T09:12:24Z

Based on my observations from conducting various tests, although I'm not certain, I believe that built-in tool calls and function tool calls are distinguished by the tool invocations written before and after 'channel'.

For example, the structure '<|start|>assistant to=' calls built-in tools, while the structure '<|channel|>commentary to=' proceeds with function tool calls. It appears that confusion occurs when function tools are provided to the gpt-oss model using the built-in tool call structure.

dr75 · 2025-11-06T09:42:52Z

Ah, thanks @levunet, that might explain things more: the built-in tools are in the analysis channel (as per docs) and usually done with tool invocation before channel as you found out. If now harmony renders all tool calls that way without distinguishing, then the model may also switch to invocation before channel and because of that then also switches to emitting them in the analysis channel.

That would mean the ideal tool rendering (for the current model) would also distinguish between the channel and place the recipient accordingly and that way keep the output from the model as is (avoiding prefix cache cut-off for that message) and avoid the model confusion.

Probably not a very import fix but could be useful. Given that it only affects tool calls, the impact on model performance might be very small or not there.

In vLLM we might then also make this distinction and generate analysis messages for the internal tools (being all commentary atm I assume).

mergify · 2025-11-12T03:14:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bbartels · 2025-11-12T17:59:12Z

@dr75 There seem to be some merge conflicts! Also is this ready for review?

dr75 · 2025-11-28T10:45:51Z

@dr75 There seem to be some merge conflicts! Also is this ready for review?

Hey @bbartels! I resolved the conflicts and made some changes:

refactor stream parsing: extract harmony stream parsing into its own file
added a unit test for it
removed the long test that would require a large context window and doesn't add too much value; I kept it in the commit history for the review.

So its ready for review.

bbartels · 2025-11-28T10:52:24Z

@dr75 There seem to be some merge conflicts! Also is this ready for review?

Hey @bbartels! I resolved the conflicts and made some changes:

refactor stream parsing: extract harmony stream parsing into its own file

added a unit test for it

removed the long test that would require a large context window and doesn't add too much value; I kept it in the commit history for the review.

So its ready for review.

Great thanks, can't review myself, but asked in the VLLM slack for a maintainers review!

chaunceyjiang · 2025-12-03T05:16:42Z

/cc @yeqcharlotte @qandrew PTAL.

qandrew · 2025-12-09T03:18:12Z

@dr75 could you fix the CI issues? thanks!

dr75 · 2025-12-10T10:50:26Z

CI failures seem to be unrelated. All like this
RuntimeError: _C::***_marlin_repack() is missing value for argument 'is_a_8bit'. For non-gpt-oss models.

mergify · 2025-12-10T19:57:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-12-12T15:03:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

dr75 · 2025-12-16T13:42:54Z

Recent test failure is in entrypoints/openai/test_response_api_with_harmony.py::test_function_calling
which is the responses API, not affected by this PR.

openai.BadRequestError: Error code: 400 - {'error': {'message': 'unexpected tokens remaining in message header: ["to=python<|channel|>commentary"]', 'type': 'BadRequestError', 'param': None, 'code': 400}}

Locally the test is passing, so function call parsing issue due to unstable model behaviour.

dr75 · 2025-12-16T13:55:09Z

@qandrew, @chaunceyjiang, @yeqcharlotte can you please help with merging this PR!

chaunceyjiang · 2025-12-18T01:17:15Z

Thanks @dr75

) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

dr75 requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and simon-mo as code owners November 5, 2025 16:37

mergify bot added the frontend label Nov 5, 2025

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

tests/entrypoints/openai/test_serving_chat.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

dr75 mentioned this pull request Nov 5, 2025

[Bugfix]: missing partial content if openai tool calling is enabled #28122

Open

chatgpt-codex-connector bot reviewed Nov 5, 2025

View reviewed changes

tests/entrypoints/openai/test_serving_chat.py Outdated Show resolved Hide resolved

dr75 mentioned this pull request Nov 6, 2025

[Bug] Fix gpt-oss missing tool content #24954

Open

heheda12345 requested a review from yeqcharlotte November 7, 2025 07:24

mergify bot added the needs-rebase label Nov 12, 2025

dr75 force-pushed the gpt-oss-tools-stream branch from 7f68de3 to 73f5121 Compare November 28, 2025 08:44

mergify bot removed the needs-rebase label Nov 28, 2025

dr75 force-pushed the gpt-oss-tools-stream branch from 73f5121 to 4540364 Compare November 28, 2025 10:20

dr75 force-pushed the gpt-oss-tools-stream branch from 70ca2ab to ef82054 Compare November 28, 2025 12:13

chaunceyjiang assigned yeqcharlotte Dec 3, 2025

dr75 force-pushed the gpt-oss-tools-stream branch from ef82054 to 4fcb6d2 Compare December 10, 2025 13:00

mergify bot added the needs-rebase label Dec 10, 2025

dr75 force-pushed the gpt-oss-tools-stream branch from 4fcb6d2 to 4c6daaf Compare December 11, 2025 07:45

mergify bot removed the needs-rebase label Dec 11, 2025

mergify bot added the needs-rebase label Dec 12, 2025

dr75 added 6 commits December 12, 2025 15:08

fix: handle tool calls in analysis channel

c19fcfd

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

test

c253d27

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

fmt

24f7111

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

merge conflict

066c163

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

refactor and test

89dc593

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

remove validation test

6a061f5

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

dr75 force-pushed the gpt-oss-tools-stream branch from 4c6daaf to 6a061f5 Compare December 12, 2025 15:34

mergify bot removed the needs-rebase label Dec 12, 2025

fix test: merged changes

9a1e400

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

Merge branch 'main' into gpt-oss-tools-stream

b41a77b

chaunceyjiang enabled auto-merge (squash) December 18, 2025 01:16

Merge branch 'main' into gpt-oss-tools-stream

d8cb90d

chaunceyjiang merged commit 4559496 into vllm-project:main Dec 19, 2025
47 checks passed

Uh oh!

Conversation

dr75 commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 5, 2025

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Nov 6, 2025

Uh oh!

dr75 commented Nov 6, 2025

Uh oh!

mergify bot commented Nov 12, 2025

Uh oh!

bbartels commented Nov 12, 2025

Uh oh!

dr75 commented Nov 28, 2025

Uh oh!

bbartels commented Nov 28, 2025

Uh oh!

chaunceyjiang commented Dec 3, 2025

Uh oh!

qandrew commented Dec 9, 2025

Uh oh!

dr75 commented Dec 10, 2025

Uh oh!

mergify bot commented Dec 10, 2025

Uh oh!

mergify bot commented Dec 12, 2025

Uh oh!

dr75 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Dec 16, 2025

Uh oh!

chaunceyjiang commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dr75 commented Nov 5, 2025 •

edited by github-actions bot

Loading

dr75 commented Nov 6, 2025 •

edited

Loading

dr75 commented Dec 16, 2025 •

edited

Loading