Skip to content

[Frontend][Bug] allow tool calls in analysis channel#28139

Merged
chaunceyjiang merged 9 commits intovllm-project:mainfrom
dr75:gpt-oss-tools-stream
Dec 19, 2025
Merged

[Frontend][Bug] allow tool calls in analysis channel#28139
chaunceyjiang merged 9 commits intovllm-project:mainfrom
dr75:gpt-oss-tools-stream

Conversation

@dr75
Copy link
Contributor

@dr75 dr75 commented Nov 5, 2025

Purpose

The gpt-oss streaming parser does not handle tool calls in the analysis channel. However, those calls can happen as explained here.

This especially happens when harmony renders the recipient in previous tool calls in the analysis channel before the channel token, convincing "confusing" the model which returns tool calls in the analysis channel. However, the harmony parser is built to handle both cases, such that the vLLM parser should also support both.

The fix proposed in this PR aims at changing the message rendering in harmony (for converting the history to tokens) to avoid confusing the model. While this fixes the issue it seems to not address the root cause: even with such a change, the model may emit tool calls in the analysis channel as described in the OpenAI docs.

To solve this, I propose to handle also tool calls in the analysis channel.

To review changes, you may start at the first commit, which is the actual fix. Following commits are refactoring and testing changes as described below.

Test Plan

  • The previous state of the code did not allow for unit testing. I added a temporary automated test using an actual request that reliably fails without the fix. This requires a large context window and reducing the request size doesn't work. As this does not work for CI, I only use it to validate the fix and removed it in the last commit. See the respective commit for the test and checkout/run for validation.
  • Extracted the stream parsing from serving_chat.py‎ to a new file serving_chat_stream_harmony.py and added a unit test for testing this part in isolation with a test case for testing tools in the analysis channel.
  • Manually tested with repeated tool calls, which are causing issues.
  • Also tested with the script provided in [Bug] Fix gpt-oss missing tool content #24954.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where tool calls in the analysis channel were not being handled by the gpt-oss streaming parser. The fix correctly extends the tool call handling logic to include the analysis channel, in addition to the commentary channel. This is a robust solution that aligns with the OpenAI Harmony documentation. The addition of a complex streaming test case that reproduces the issue is a great way to ensure the fix is effective. However, I have one concern regarding the resource requirements of the new test, which I've detailed in a specific comment.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where tool calls in the analysis channel for gpt-oss models were not being handled correctly by the streaming parser. The proposed solution correctly extends the existing tool call parsing logic for the commentary channel to also include the analysis channel. The change in vllm/entrypoints/openai/serving_chat.py is well-targeted and correctly re-prioritizes the logic to check for tool calls in the analysis channel before treating it as reasoning content. A new complex streaming test case has been added in tests/entrypoints/openai/test_serving_chat.py, which effectively validates the fix. The changes are sound and I have not identified any issues of high or critical severity.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@alecsolder
Copy link
Contributor

Hrm I feel like the issue here is actually that we are putting these tool calls on the analysis channel in the first place. Do you know why that happens in the first place vs it always being on the commentary channel? I couldn't figure it out from a quick look at the code

@dr75
Copy link
Contributor Author

dr75 commented Nov 5, 2025

See the OpenAI docs and the limited docs and comments in the harmony repo. The problem is that you cannot force the model to put the recipient where it does, except you would use some guided decoding. For that reason, harmony allows for both.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well. But in any case it wouldn't be a solution for every case as internal tools seem to be emitted on the analysis channel (see open ai docs).

As long as you don't get a different answer from OpenAI on the respective Harmony PR, this fix here seems to be the right solution to me. And I guess other frameworks are doing it the same way. But I didn't check.

@alecsolder
Copy link
Contributor

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

@dr75
Copy link
Contributor Author

dr75 commented Nov 6, 2025

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

You are right and my description was not really correct mixing the rendering with the output. The rendering "issue" (not sure if it's really an issue) is that harmony renders the recipient before the channel token. So when we use a commentary message, it will first render the recipient and then the channel token:

<|start|>assistant to=functions.lookup_weather<|channel|>commentary

This seems what leads to the model switching to output tools in the analysis channel (well I am speculating here, so maybe that's not right) and what this PR is trying to solve.

However, even if we could change that, tools may still appear in any channel and we would want to handle them in any case. So it doesn't affect this PR I would say and I am only attempting to explain what might be happening here.

Also, I don't know if such a change to the rendering is a good idea as it may affect model performance (assuming harmony was used during training; maybe negligible impact maybe not) and prefix caching (if the model constantly outputs x and harmony renders y then the cache is gone from that location). So a decision whether to change it is more complex and should take that into account.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

Didn't know that, but I think should_drop_analysis is important here. Would have to take a closer look.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

Sounds reasonable to me.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

Makes sense! Do you mean changing the solution here or keeping as is? Also see the code for non-streaming in OpenAIToolParser, which is what you describe.

@levunet
Copy link
Contributor

levunet commented Nov 6, 2025

Based on my observations from conducting various tests, although I'm not certain, I believe that built-in tool calls and function tool calls are distinguished by the tool invocations written before and after 'channel'.

For example, the structure '<|start|>assistant to=' calls built-in tools, while the structure '<|channel|>commentary to=' proceeds with function tool calls. It appears that confusion occurs when function tools are provided to the gpt-oss model using the built-in tool call structure.

@dr75
Copy link
Contributor Author

dr75 commented Nov 6, 2025

Ah, thanks @levunet, that might explain things more: the built-in tools are in the analysis channel (as per docs) and usually done with tool invocation before channel as you found out. If now harmony renders all tool calls that way without distinguishing, then the model may also switch to invocation before channel and because of that then also switches to emitting them in the analysis channel.

That would mean the ideal tool rendering (for the current model) would also distinguish between the channel and place the recipient accordingly and that way keep the output from the model as is (avoiding prefix cache cut-off for that message) and avoid the model confusion.

Probably not a very import fix but could be useful. Given that it only affects tool calls, the impact on model performance might be very small or not there.

In vLLM we might then also make this distinction and generate analysis messages for the internal tools (being all commentary atm I assume).

@mergify
Copy link

mergify bot commented Nov 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 12, 2025
@bbartels
Copy link
Contributor

@dr75 There seem to be some merge conflicts! Also is this ready for review?

@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from 7f68de3 to 73f5121 Compare November 28, 2025 08:44
@mergify mergify bot removed the needs-rebase label Nov 28, 2025
@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from 73f5121 to 4540364 Compare November 28, 2025 10:20
@dr75
Copy link
Contributor Author

dr75 commented Nov 28, 2025

@dr75 There seem to be some merge conflicts! Also is this ready for review?

Hey @bbartels! I resolved the conflicts and made some changes:

  • refactor stream parsing: extract harmony stream parsing into its own file
  • added a unit test for it
  • removed the long test that would require a large context window and doesn't add too much value; I kept it in the commit history for the review.

So its ready for review.

@bbartels
Copy link
Contributor

@dr75 There seem to be some merge conflicts! Also is this ready for review?

Hey @bbartels! I resolved the conflicts and made some changes:

  • refactor stream parsing: extract harmony stream parsing into its own file
  • added a unit test for it
  • removed the long test that would require a large context window and doesn't add too much value; I kept it in the commit history for the review.

So its ready for review.

Great thanks, can't review myself, but asked in the VLLM slack for a maintainers review!

@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from 70ca2ab to ef82054 Compare November 28, 2025 12:13
@chaunceyjiang
Copy link
Collaborator

/cc @yeqcharlotte @qandrew PTAL.

@qandrew
Copy link
Contributor

qandrew commented Dec 9, 2025

@dr75 could you fix the CI issues? thanks!

@dr75
Copy link
Contributor Author

dr75 commented Dec 10, 2025

CI failures seem to be unrelated. All like this
RuntimeError: _C::***_marlin_repack() is missing value for argument 'is_a_8bit'. For non-gpt-oss models.

@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from ef82054 to 4fcb6d2 Compare December 10, 2025 13:00
@mergify
Copy link

mergify bot commented Dec 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 10, 2025
@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from 4fcb6d2 to 4c6daaf Compare December 11, 2025 07:45
@mergify mergify bot removed the needs-rebase label Dec 11, 2025
@mergify
Copy link

mergify bot commented Dec 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dr75.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 12, 2025
dr75 added 6 commits December 12, 2025 15:08
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
@dr75 dr75 force-pushed the gpt-oss-tools-stream branch from 4c6daaf to 6a061f5 Compare December 12, 2025 15:34
@mergify mergify bot removed the needs-rebase label Dec 12, 2025
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
@dr75
Copy link
Contributor Author

dr75 commented Dec 16, 2025

Recent test failure is in entrypoints/openai/test_response_api_with_harmony.py::test_function_calling
which is the responses API, not affected by this PR.

openai.BadRequestError: Error code: 400 - {'error': {'message': 'unexpected tokens remaining in message header: ["to=python<|channel|>commentary"]', 'type': 'BadRequestError', 'param': None, 'code': 400}}

Locally the test is passing, so function call parsing issue due to unstable model behaviour.

@dr75
Copy link
Contributor Author

dr75 commented Dec 16, 2025

@qandrew, @chaunceyjiang, @yeqcharlotte can you please help with merging this PR!

@chaunceyjiang chaunceyjiang enabled auto-merge (squash) December 18, 2025 01:16
@chaunceyjiang
Copy link
Collaborator

Thanks @dr75

@chaunceyjiang chaunceyjiang merged commit 4559496 into vllm-project:main Dec 19, 2025
47 checks passed
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Dec 22, 2025
)

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
)

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
)

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants