[Bug] Fix gpt-oss missing tool content by levunet · Pull Request #24954 · vllm-project/vllm

levunet · 2025-09-16T09:06:25Z

Purpose

This PR requires the following PRs to be merged first: #24768 and the harmony lib PR (openai/harmony#76).

The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.

Test Plan

gpt-oss_test.py
messages.txt

Run python3 gpt-oss_test.py about 10 times.

Test Result

(before)

(after applying the harmony lib changes)

gemini-code-assist

Code Review

This pull request aims to fix a bug in gpt-oss model tool calls by adding 'analysis' content and with_recipient. The changes are logical and align with the stated purpose. However, I've identified a high-severity issue where the handling of 'analysis' content is not robust for multimodal inputs, which could lead to a runtime crash. I have provided a code suggestion to address this.

gemini-code-assist · 2025-09-16T09:08:35Z

vllm/entrypoints/harmony_utils.py

The logic to extract content is not robust. The content of an assistant message can be a list of parts (e.g., for multimodal inputs), not just a string. The current implementation content = chat_msg.get("content") or "" will cause a runtime error if content is a non-empty list, as Message.from_role_and_content expects a string. The code should handle the case where content is a list by extracting the text parts, similar to how it's handled for other message types in this file.

content = chat_msg.get("content") if isinstance(content, list): # Extract text from multimodal content content = "\n".join( p.get("text", "") for p in content if isinstance(p, dict) and p.get("type") == "text") elif not isinstance(content, str): content = "" analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content) analysis_msg = analysis_msg.with_channel("analysis") msgs.append(analysis_msg)

levunet · 2025-09-16T09:33:55Z

@alecsolder
In the harmony library, the ' to' value should have been encoded as id 316, but occasionally it was incorrectly encoded as a mixture of 220+935 id values, which caused the model to output incorrect tokens.

alecsolder · 2025-09-18T15:12:50Z

In your test script, I see you're using the streaming completions endpoint, which I don't think uses the harmony_utils method you modified? I just want to double check I'm reading it right

alecsolder · 2025-09-18T15:18:12Z

Also, thanks for mentioning that huggingface tokenizer change, I hadn't seen it and my snapshot was out of date!

levunet · 2025-09-19T01:17:43Z

Thank you for checking. I've double-checked the part you mentioned.
I can confirm that the parse_chat_input I modified in the harmony_utils.py file is being used without any issues through the flow: /v1/chat/completions -> def create_chat_completion -> def _make_request_with_harmony -> def parse_chat_input to parse request.messages.
The streaming endpoint is also correctly using this modified method.

fouadbakkour · 2025-09-30T14:59:39Z

When this PR will be merged guys?
we need this fix please.

youkaichao · 2025-10-06T08:39:34Z

cc @chaunceyjiang @aarnphm

The changes include adding 'with_recipient' and the Assistant's 'analysis' content. Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools. Signed-off-by: kyt <[email protected]>

levunet · 2025-10-10T13:23:23Z

I was delayed today due to some issues while resolving git conflicts. The issue was caused by FlashInfer 0.4.0, which was added in recent commits, causing responses to repeat infinitely in the gpt-oss model. It took quite some time to find a solution.

VLLM_USE_FLASHINFER_SAMPLER=0

The problem occurred in the sampling process, and it works normally when the FlashInfer sampler is disabled.

epark001 · 2025-10-10T14:27:29Z

I was delayed today due to some issues while resolving git conflicts. The issue was caused by FlashInfer 0.4.0, which was added in recent commits, causing responses to repeat infinitely in the gpt-oss model. It took quite some time to find a solution.

VLLM_USE_FLASHINFER_SAMPLER=0

The problem occurred in the sampling process, and it works normally when the FlashInfer sampler is disabled.

will the fix need both the VLLM_USE_FLASHINFER_SAMPLER=0 and this pr? or just the flashinfer=0?

levunet · 2025-10-10T15:15:55Z

VLLM_USE_FLASHINFER_SAMPLER=0

I think this setting is for resolving an issue that occurs in a completely different place from the current PR. After testing multiple times, it seems like the same problem exists even without the current PR, so I think this setting can be used separately.

ashgold · 2025-10-21T00:59:13Z

Any progress here?
Many people in my department want to use the tool calling feature with gpt-oss, but it seems that feature isn't fully implemented in vLLM yet.

levunet · 2025-10-21T01:08:11Z

Unfortunately, the current changes alone are insufficient to fully resolve the bug - the changes from the harmony repo are essential. Since OpenAI hasn't reviewed that PR yet, should I propose pip installing from my fork branch in the meantime?..

levunet · 2025-10-21T01:19:43Z

@aarnphm @chaunceyjiang

This PR fixes a bug that occurred when using tool calling with gpt-oss models. However, it requires additional changes from the openai/harmony repo, which unfortunately hasn't been reviewed for 1 months now. Would it be acceptable to temporarily modify the requirements to install from my forked harmony branch until the upstream PR is merged?

ashgold · 2025-10-23T02:09:23Z

@aarnphm @chaunceyjiang

This PR fixes a bug that occurred when using tool calling with gpt-oss models. However, it requires additional changes from the openai/harmony repo, which unfortunately hasn't been reviewed for 1 months now. Would it be acceptable to temporarily modify the requirements to install from my forked harmony branch until the upstream PR is merged?

@levunet
Could you provide guidance on how to build using the modified version of the openai-harmony module?
I'd like to build the vLLM image with this PR and the modified openai-harmony module, then test the tool calling functionality.
I have experience building the vLLM v0.11.0 image in my environment.

dr75 · 2025-10-31T11:40:35Z

@levunet, could you give an example where tool calling does not work in v0.11.1? I couldn't find any so I wonder if this fix and the harmony fix is actually needed.

Basically #24768 makes tool calling work for me for chat completions in streaming mode.

However, there are two remaining issues:

using --tool-call-parser openai breaks normal non-streaming chat completions; so this flag must not be used right now (tool calling works without it)
tool parsing for non-streaming requests doesn't work in v0.11.0. This also doesn't work in v0.11.1 nor with the fix here and the harmony fix you mentioned.

dr75 · 2025-10-31T13:29:41Z

Ok, could reproduce the cases now with your script above and confirm that it still happens on v0.11.1 and is fixed with this PR and the harmony change.

bbrowning · 2025-11-06T21:02:44Z

This PR isn't actually dependent on openai/harmony#76, is it? The fixes here to include the reasoning text from the previous turn into an input message to the analysis channel and to include the assistant recipient on the tool response seem reasonable and neccessary, even without the changes to the harmony library.

dr75 · 2025-11-06T21:31:20Z

I think you are right, that information seems just missing atm. This is problematic for the prefix cache and probably for the reasoning of the model.

But I see the problem that we are missing tests that verify the loop model_output -> parse -> encode -> model_output.

Without such tests it is hard to say where we are in terms of correctness.

But maybe for this PR it's good enough to verify it's working as expected also without the harmony changes. Together with PR #28139, the test script above should work and also the temporary test in PR #28139. Both are not working on main.

Don't know if there are any tests that verify complex tool calls.

bbrowning · 2025-11-07T19:48:39Z

There's another related problem here as well I identified by stubbing in a test for this. Our tool response ends up looking this like as a Harmony message:

{'role': <Role.TOOL: 'tool'>, 'name': 'functions.', 'content': [{'type': 'text', 'text': '20 degrees Celsius'}], 'channel': 'commentary', 'recipient': 'assistant'}

Note the name there - it's wrong. This should be functions.get_weather (in my example, because this is the response to my get_weather function) but we're not properly extracting the name, as we need to use the tool_call_id for the response to match with the generated tool call by the model earlier in the message stack to find the name.

Here's the example test I'm using that verifies both the changes in this PR as well as uncovered that additional issue:

diff --git a/tests/entrypoints/openai/test_serving_chat.py b/tests/entrypoints/openai/test_serving_chat.py
index dd10384a7..6ac94160c 100644
--- a/tests/entrypoints/openai/test_serving_chat.py
+++ b/tests/entrypoints/openai/test_serving_chat.py
@@ -728,3 +728,112 @@ async def test_serving_chat_data_parallel_rank_extraction():
     # Verify that data_parallel_rank defaults to None
     assert "data_parallel_rank" in mock_engine.generate.call_args.kwargs
     assert mock_engine.generate.call_args.kwargs["data_parallel_rank"] is None
+
+
+class TestServingChatMakeRequestWithHarmony:
+    @pytest.fixture()
+    def mock_engine(self):
+        mock_engine = MagicMock(spec=AsyncLLM)
+        mock_engine.get_tokenizer.return_value = get_tokenizer(MODEL_NAME)
+        mock_engine.errored = False
+        mock_engine.model_config = MockModelConfig()
+        mock_engine.processor = MagicMock()
+        mock_engine.io_processor = MagicMock()
+        return mock_engine
+
+    @pytest.fixture()
+    def serving_chat(self, mock_engine):
+        return _build_serving_chat(mock_engine)
+
+    @pytest.mark.asyncio
+    async def test_simple_chat(self, serving_chat):
+        messages = [{"role": "user", "content": "what is 1+1?"}]
+        req = ChatCompletionRequest(model=MODEL_NAME, messages=messages)
+        output_messages, _, _ = serving_chat._make_request_with_harmony(req)
+        assert len(output_messages) == 3
+        assert output_messages[0].author.role == "system"
+        assert output_messages[1].author.role == "developer"
+        assert output_messages[2].author.role == "user"
+        assert output_messages[2].content[0].text == messages[0]["content"]
+
+    @pytest.mark.asyncio
+    async def test_tool_call_response_with_content(self, serving_chat):
+        tools = [
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_weather",
+                    "description": "Get the weather in a given location",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "location": {"type": "string"},
+                        },
+                        "required": ["location"],
+                    },
+                },
+            },
+        ]
+        messages = [
+            {
+                "role": "user",
+                "content": "What's the weather like in Paris today?",
+            },
+            {
+                "role": "assistant",
+                "content": "We'll call get_weather.",
+                "tool_calls": [
+                    {
+                        "id": "call_123",
+                        "type": "function",
+                        "function": {
+                            "name": "get_weather",
+                            "arguments": '{"location": "Paris"}',
+                        },
+                    },
+                ]
+            },
+            {
+                "role": "tool",
+                "tool_call_id": "call_123",
+                "content": "20 degrees Celsius",
+            },
+        ]
+        req = ChatCompletionRequest(model=MODEL_NAME, messages=messages, tools=tools)
+        output_messages, _, _ = serving_chat._make_request_with_harmony(req)
+
+        # temporary debugging output
+        print("Output messages:\n")
+        for m in output_messages:
+            print(m.to_dict())
+
+        assert len(output_messages) == 6
+
+        assert output_messages[0].author.role == "system"
+
+        developer_msg = output_messages[1]
+        assert developer_msg.author.role == "developer"
+        developer_tools = developer_msg.content[0].tools
+        assert developer_tools["functions"].tools[0].name == "get_weather"
+
+        user_msg = output_messages[2]
+        assert user_msg.author.role == "user"
+        assert user_msg.content[0].text == messages[0]["content"]
+
+        analysis_msg = output_messages[3]
+        assert analysis_msg.author.role == "assistant"
+        assert analysis_msg.channel == "analysis"
+        assert analysis_msg.content[0].text == messages[1]["content"]
+
+        assistant_msg = output_messages[4]
+        assert assistant_msg.author.role == "assistant"
+        assert assistant_msg.channel == "commentary"
+        assert assistant_msg.recipient == "functions.get_weather"
+        assert assistant_msg.content[0].text == messages[1]["tool_calls"][0]["function"]["arguments"]
+
+        tool_msg = output_messages[5]
+        assert tool_msg.author.role == "tool"
+        assert tool_msg.author.name == "functions.get_weather"
+        assert tool_msg.channel == "commentary"
+        assert tool_msg.recipient == "assistant"
+        assert tool_msg.content[0].text == messages[2]["content"]

That gives us output like:

tests/entrypoints/openai/test_serving_chat.py::TestServingChatMakeRequestWithHarmony::test_tool_call_response_with_content Output messages:

{'role': <Role.SYSTEM: 'system'>, 'name': None, 'content': [{'model_identity': 'You are ChatGPT, a large language model trained by OpenAI.', 'reasoning_effort': <ReasoningEffort.MEDIUM: 'Medium'>, 'conversation_start_date': '2025-11-07', 'knowledge_cutoff': '2024-06', 'channel_config': {'valid_channels': ['analysis', 'commentary', 'final'], 'channel_required': True}, 'type': 'system_content'}]}
{'role': <Role.DEVELOPER: 'developer'>, 'name': None, 'content': [{'tools': {'functions': {'name': 'functions', 'tools': [{'name': 'get_weather', 'description': 'Get the weather in a given location', 'parameters': {'type': 'object', 'properties': {'location': {'type': 'string'}}, 'required': ['location']}}]}}, 'type': 'developer_content'}]}
{'role': <Role.USER: 'user'>, 'name': None, 'content': [{'type': 'text', 'text': "What's the weather like in Paris today?"}]}
{'role': <Role.ASSISTANT: 'assistant'>, 'name': None, 'content': [{'type': 'text', 'text': "We'll call get_weather."}], 'channel': 'analysis'}
{'role': <Role.ASSISTANT: 'assistant'>, 'name': None, 'content': [{'type': 'text', 'text': '{"location": "Paris"}'}], 'channel': 'commentary', 'recipient': 'functions.get_weather', 'content_type': 'json'}
{'role': <Role.TOOL: 'tool'>, 'name': 'functions.', 'content': [{'type': 'text', 'text': '20 degrees Celsius'}], 'channel': 'commentary', 'recipient': 'assistant'}
...
        tool_msg = output_messages[5]
        assert tool_msg.author.role == "tool"
>       assert tool_msg.author.name == "functions.get_weather"
E       AssertionError: assert 'functions.' == 'functions.get_weather'
E         
E         - functions.get_weather
E         + functions.

tests/entrypoints/openai/test_serving_chat.py:836: AssertionError

bbrowning · 2025-11-07T19:54:21Z

@levunet If you want to add a test for this, feel free to borrow mine above. You can comment out the one line checking the tool_msg.author.name and remove the temporary debug output in the test and it should pass with your changes here. It's probably also worth addressing the Gemini content about how to concatenate parts, as well as adding an additional check to not include the analysis message if there is no content there.

If that's unclear, here's what I mean:

diff --git a/vllm/entrypoints/harmony_utils.py b/vllm/entrypoints/harmony_utils.py
index 7958d0317..3308eb03c 100644
--- a/vllm/entrypoints/harmony_utils.py
+++ b/vllm/entrypoints/harmony_utils.py
@@ -246,6 +246,17 @@ def parse_input_to_harmony_message(chat_msg) -> list[Message]:
     tool_calls = chat_msg.get("tool_calls")
     if role == "assistant" and tool_calls:
         msgs: list[Message] = []
+        content = chat_msg.get("content") or ""
+        if isinstance(content, list):
+            content = "".join(
+                item.get("text", "")
+                for item in content
+                if isinstance(item, dict) and item.get("type") == "text"
+            )
+        if content:
+            analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content)
+            analysis_msg = analysis_msg.with_channel("analysis")
+            msgs.append(analysis_msg)

Then there's the issue of addressing the function name problem, which could be added to this but also deferred to a separate PR just to keep the scope down here.

Either way, I'd like to get your fixes in as they definitely clean up some areas where we're not properly reconstructing the input as we prompt the gpt-oss models with tool call results. This should improve things for all tool calling usage of these models over the Chat Completions API.

levunet · 2025-11-08T05:11:44Z

Thanks for diving deep into this and sharing your thoughts along with the test! @bbrowning

Regarding the first issue:
The current implementation retrieves the function name directly from the name field rather than resolving it through tool_call_id. Therefore, the expected data structure in our current codebase is:

{role: 'tool', name: 'get_weather', content: 'data'}

When sending data in this format, you'll see that function.name works correctly. That said, I agree that implementing tool_call_id resolution would be a better approach for OpenAI API compatibility. (I recall considering this issue previously but opted for the current implementation due to time constraints.)

Regarding the second issue:
About handling array-type content for analysis_msg - since gpt-oss models don't generate array-formatted analysis content, I'm not certain this change is necessary. The current string-only handling should suffice for our specific use case with gpt-oss models.

bbrowning · 2025-11-08T13:44:53Z

Regarding the first issue:
The current implementation retrieves the function name directly from the name field rather than resolving it through tool_call_id. Therefore, the expected data structure in our current codebase is:

{role: 'tool', name: 'get_weather', content: 'data'}

When sending data in this format, you'll see that function.name works correctly. That said, I agree that implementing tool_call_id resolution would be a better approach for OpenAI API compatibility. (I recall considering this issue previously but opted for the current implementation due to time constraints.)

I'm not aware of any clients that would send tool results in using that format? vLLM itself explicitly expects and uses the OpenAI types here to define this:

https://github.com/openai/openai-python/blob/6574bcd612771186995074846253caa0ff1ba517/src/openai/types/chat/chat_completion_tool_message_param.py#L13-L21

Because we allow extra fields sending in the name wouldn't technically throw an error, but that's also making our API not consistent with the Chat Completions API spec and would require every client and client framework to adapt to this instead of passing in tool output as they normally would using tool_call_id.

bbrowning · 2025-11-08T13:46:34Z

Given that this PR isn't actually changing our wrong handling of the tool name parameters here, I'm still supportive of merging this as-is and handling name separately if that's desired 😄

chaunceyjiang · 2025-11-09T13:09:13Z

/cc @yeqcharlotte PTAL.

bbrowning · 2025-11-14T15:32:58Z

I incorporate a version of these changes at #28729 because I needed to be able to do an end-to-end test with the bfcl suite to really evaluate before/after gpt-oss performance on multi-turn tool calling. That PR also fixes a number of other important accuracy issues, and I'll keep an eye on this one and rebase the other if this one goes in.

levunet requested review from aarnphm and chaunceyjiang as code owners September 16, 2025 09:06

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 16, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 16, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

levunet force-pushed the feat/gptoss-tool-fix branch 2 times, most recently from c3780bf to eed8815 Compare September 16, 2025 09:19

levunet force-pushed the feat/gptoss-tool-fix branch from eed8815 to 39d6e8e Compare September 16, 2025 14:13

This was referenced Sep 17, 2025

[Bug]: gpt-oss Intermittent 500 Internal Server Error with empty response body when using strict JSON “function router” system prompt #23837

Open

[Feature][Responses API] Stream Function Call #23222

Open

This was referenced Sep 26, 2025

[openai] Fix missing tool usage check (system message) #24768

Merged

fix: Resolve response format corruption due to incorrect encoding openai/harmony#76

Open

levunet force-pushed the feat/gptoss-tool-fix branch from 39d6e8e to e8e7579 Compare October 10, 2025 13:12

levunet force-pushed the feat/gptoss-tool-fix branch from e8e7579 to f13f45b Compare October 10, 2025 13:19

bbrowning mentioned this pull request Oct 10, 2025

Bump Flashinfer to v0.4.0 #26326

Merged

5 tasks

dr75 mentioned this pull request Nov 5, 2025

[Frontend][Bug] allow tool calls in analysis channel #28139

Merged

alecsolder mentioned this pull request Nov 5, 2025

[Frontend] [gpt-oss] Chat format GD for tool calling with gptoss #28148

Open

5 tasks

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Nov 8, 2025

mgoin approved these changes Nov 8, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 8, 2025

Uh oh!

Conversation

levunet commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

levunet commented Sep 16, 2025

Uh oh!

alecsolder commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alecsolder commented Sep 18, 2025

Uh oh!

levunet commented Sep 19, 2025

Uh oh!

fouadbakkour commented Sep 30, 2025

Uh oh!

youkaichao commented Oct 6, 2025

Uh oh!

levunet commented Oct 10, 2025

Uh oh!

epark001 commented Oct 10, 2025

Uh oh!

levunet commented Oct 10, 2025

Uh oh!

ashgold commented Oct 21, 2025

Uh oh!

levunet commented Oct 21, 2025

Uh oh!

levunet commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashgold commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbrowning commented Nov 6, 2025

Uh oh!

dr75 commented Nov 6, 2025

Uh oh!

bbrowning commented Nov 7, 2025

Uh oh!

bbrowning commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbrowning commented Nov 8, 2025

Uh oh!

bbrowning commented Nov 8, 2025

Uh oh!

chaunceyjiang commented Nov 9, 2025

Uh oh!

bbrowning commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

levunet commented Sep 16, 2025 •

edited by github-actions bot

Loading

alecsolder commented Sep 18, 2025 •

edited

Loading

levunet commented Oct 21, 2025 •

edited

Loading

ashgold commented Oct 23, 2025 •

edited

Loading

dr75 commented Oct 31, 2025 •

edited

Loading

dr75 commented Oct 31, 2025 •

edited

Loading

bbrowning commented Nov 7, 2025 •

edited

Loading

levunet commented Nov 8, 2025 •

edited

Loading