Feature/function calling update by YAMY1234 · Pull Request #2700 · sgl-project/sglang

YAMY1234 · 2025-01-02T04:54:42Z

Pull Request Description

Summary

This pull request introduces streaming modes for function calling within the OpenAI API integration, and updated the non-streaming framework for better extensibility. The changes include:

New Features:
- Implementation of a FunctionCallParser for robust and efficient parsing of function calls in both streaming and non-streaming contexts.
- Added support for incremental streaming responses using the parse_streaming_increment method.
- Enhanced tooling support with structured parsing for tool calls, enabling seamless function integration with improved parameter handling.
Refactoring:
- Refactored openai_api/adapter.py to integrate streaming tool call parsing logic.
- Updated openai_api/protocol.py with additional models (ToolCallItem, DeltaMessage) to support streaming functionalities.
Documentation:
- Added detailed comments and docstrings for new classes and methods to enhance readability and maintainability.

Detailed Changes

docs/backend/function_calling_streaming.py:
- Added functionality to demonstrate streaming and non-streaming API calls with mock tool integrations.
- Included an example for handling tool calls and parsing streamed arguments incrementally.
python/sglang/srt/function_call_parser.py:
- Introduced FunctionCallParser, StreamingJSONParser, and related utility functions to handle function calls during streaming responses.
- Implemented logic for detecting and parsing incremental JSON inputs with robust error handling.
python/sglang/srt/openai_api/adapter.py:
- Integrated FunctionCallParser to enable real-time function call parsing during streaming response generation.
- Adjusted tool-related logic to align with the new structured tool parsing approach.
python/sglang/srt/openai_api/protocol.py:
- Modified FunctionResponse and ToolCall models to use Optional fields for compatibility with the new parser.
- Added ToolCallItem and DeltaMessage models to streamline the representation of parsed tool calls and response deltas.

Testing

Verified the functionality of streaming and non-streaming API calls using mock scenarios.
Validated the correctness of tool call parsing through tests and real-time simulations.

merrymercy · 2025-01-02T10:08:52Z

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

YAMY1234 · 2025-01-03T00:12:32Z

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

Sure! We’ll review each other’s code and collaborate to work out a great solution. 🚀

HaoyuWang4188 · 2025-01-03T05:52:52Z

Hi! After a general review, I would like to initiate some discussions to help we determine the best solution:

1. Support for parallel_tool_calls

OpenAI API supports parrallel_tool_calls option (default true) to determine whether LLM should output multiple tool calls at once.
In vLLM, this option is added but skipped like always treating parrallel_tool_calls=true (details).
In our current implementation, we have not considered this option in both #2544 and #2700. And the actual behaviour is summarized as follow :

Static API in #2544 (details)

parrallel_tool_calls=true for qwen2.5 (can output multiple tool calls at once)

parrallel_tool_calls=false for internlm2, llama3.1, llama3.2 (only output the first parsed tool call)

I tried to align these behaviours by set parrallel_tool_calls=false by default and force qwen2.5 to only output the first tool call in API level in link.

IMO, parallel_tool_calls should be supported in our function calling API (both static and stream).
And I suggest to support it in two steps:

Step1: For PR Feature/function calling update #2700, I recommand @YAMY1234 @Thunderbeee to support parrallel_tool_calls=false for streaming API in this PR (no need to add this option, just treat it like it is false). And I will change [Feature] Add partial support for parrallel_tool_calls in Funcation Calling API #2576 to be a support PR to always set parallel_tool_calls option to false and align the behaviour of static API of all supported models.
Step2: Then we can raise a new PR to fully support these 4 conditions (static/streaming x parallel_tool_calls=true/false) and add corresponding tests.

2. Aligning Terms of model names
In #2544, we uses these names in link

Name	Special Token (i.e. `bot_token`)
Llama 3.2	`<\|python_tag\|>`
Llama 3.1	`<function=`
Qwen 2.5	`<tool_call>`
InternLM	`<\|plugin\|>`

I prefer to change Llama 3.1/3.2 into Llama 3.1+ (since Llama3.3 also shares the same pattern) and use terms JSON-based and User-defined from Meta's doc for clarification, because Llama 3.2 adds no new function calling support from training phase and both <|python_tag|> and <function= is supported from 3.1.

HaoyuWang4188 · 2025-01-03T05:59:57Z

Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy
If you guys agree with my suggestions, I will proceed with the proposed plan. Thank you! 🚀

YAMY1234 · 2025-01-08T07:59:00Z

Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy If you guys agree with my suggestions, I will proceed with the proposed plan. Thank you! 🚀

Hi @HaoyuWang4188 Thank you for the detailed review and suggestions!
1. We fully support the idea of creating a new PR to add Support for parallel_tool_calls as described.
2. We also agree with the plan to change [Feature] Add partial support for parallel_tool_calls in Function Calling API #2576 to a support PR that always sets parallel_tool_calls to false.
3. For changing Llama 3.1/3.2 into Llama 3.1+, we totally agree and have changed this in our recent commits.

Currently, our implementation is nearly complete. Could you consider adding this new parameter based on our existing changes (perhaps cherry-picking them)? Alternatively, I have another suggestion: we could combine the addition of the parallel_tool_calls parameter and its full implementation into the next PR to streamline the process.

Let me know your thoughts!

HaoyuWang4188

Add some suggested changes for parrallel_too_call

HaoyuWang4188 · 2025-01-08T08:25:16Z

Suggested change

calls.append(tool_call_item)

calls.append(tool_call_item)

break

HaoyuWang4188 · 2025-01-08T08:28:57Z

Suggested change

_, action = text.split("<|python_tag|>")

_, action = text.split("<|python_tag|>")

# split multiple actions and only select the first one

# e.g. {"name": "A", "parameters": {"arg": "x"}}; {"name": "B", "parameters": {"arg": "y"}}

if "}};" in action:

action = action.split("}};")[0] + "}}"

HaoyuWang4188 · 2025-01-08T08:36:30Z

Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy If you guys agree with my suggestions, I will proceed with the proposed plan. Thank you! 🚀

Hi @HaoyuWang4188 Thank you for the detailed review and suggestions! 1. We fully support the idea of creating a new PR to add Support for parallel_tool_calls as described. 2. We also agree with the plan to change [Feature] Add partial support for parallel_tool_calls in Function Calling API #2576 to a support PR that always sets parallel_tool_calls to false. 3. For changing Llama 3.1/3.2 into Llama 3.1+, we totally agree and have changed this in our recent commits.

Currently, our implementation is nearly complete. Could you consider adding this new parameter based on our existing changes (perhaps cherry-picking them)? Alternatively, I have another suggestion: we could combine the addition of the parallel_tool_calls parameter and its full implementation into the next PR to streamline the process.

Let me know your thoughts!

I'm glad we agreed 😄. I commented with the related suggestion (no cherry-pick needed now). Next, we just need to merge #2700 and #2576 to finish the main support for the function calling API. Can't wait to use it in Sglang 🚀 .

zhaochenyang20 · 2025-01-09T17:09:33Z

Refer to the docs readme:

# 4) Clean notebook outputs # nbstripout removes notebook outputs so your PR stays clean pip install nbstripout find . -name '*.ipynb' -exec nbstripout {} \;

zhaochenyang20 · 2025-01-09T17:12:06Z

Make this concise.

This guide demonstrates how to use SGLang’s ToolCalling functionality with a get_current_weather function. You can replace or add any tool function depending on your use case.

zhaochenyang20 · 2025-01-09T17:15:16Z

Concise it and do not use os.environ["CUDA_VISIBLE_DEVICES"] = "7". We only have 1 GPU for docs CI.

Launch the Sever

import os from openai import OpenAI import json from sglang.utils import execute_shell_command, wait_for_server, terminate_process server_process = execute_shell_command( "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --tool-call-parser llama3 --port 30222 --host 0.0.0.0" ) wait_for_server("http://localhost:30222")

Note that --tool-call-parser defines the parser used to interpret responses. Currently supported parsers include:

llama3: Llama 3.1 / 3.2 (e.g. meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct).

mistral: Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3).

qwen25: Qwen 2.5 (e.g. Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-7B-Instruct).

zhaochenyang20 · 2025-01-09T17:16:41Z

Below is a Python snippet that shows how to define a tool as a dictionary. The dictionary includes a tool name, a description, and property defined Parameters.

zhaochenyang20 · 2025-01-09T17:17:04Z

delete this line.

zhaochenyang20 · 2025-01-09T17:31:54Z

This should alway be true and even assert to be true. Do not add this if-else.

zhaochenyang20 · 2025-01-09T17:32:35Z

This should also assert to be true.

zhaochenyang20 · 2025-01-09T17:33:05Z

print_highlight

zhaochenyang20 · 2025-01-09T17:33:16Z

print_highlight.

zhaochenyang20 · 2025-01-09T17:34:11Z

Just adding one block at the end and:

terminate_process(server_process)

No need to explain and make a title.

zhaochenyang20 · 2025-01-10T17:51:21Z

This is verbose. Use this plz:

Launch the Sever

import os from openai import OpenAI import json from sglang.utils import execute_shell_command, wait_for_server, terminate_process server_process = execute_shell_command( "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --tool-call-parser llama3 --port 30222 --host 0.0.0.0" ) wait_for_server("http://localhost:30222")

Note that --tool-call-parser defines the parser used to interpret responses. Currently supported parsers include:

llama3: Llama 3.1 / 3.2 (e.g. meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct).

mistral: Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3).

qwen25: Qwen 2.5 (e.g. Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-7B-Instruct).

zhaochenyang20 · 2025-01-10T17:52:18Z

Delete this line. Do not need any explanation since the code is clear.

zhaochenyang20 · 2025-01-10T17:52:39Z

When the server -> When the engine

zhaochenyang20 · 2025-01-10T17:53:56Z

Note, non-streaming mode also supports function calling. ->

Give example to it like streaming ones.

zhaochenyang20 · 2025-01-10T17:54:57Z

Delete this explanation. Since in the codes, you have this line:

# This is a demonstration, define real function according to your usage.

zhaochenyang20 · 2025-01-10T17:56:41Z

delete this line, it's redundant to # This is a demonstration, define real function according to your usage..

zhaochenyang20 · 2025-01-10T17:58:18Z

delete this line, I don't see the meaning leaving it here.

zhaochenyang20 · 2025-01-10T17:58:51Z

delete this.

zhaochenyang20 · 2025-01-10T17:59:00Z

delete this.

zhaochenyang20 · 2025-01-10T17:59:51Z

There is one blank block. Delete it:

zhaochenyang20 · 2025-01-11T18:55:28Z

Concise, name it: Tool and Function Calling

zhaochenyang20 · 2025-01-11T18:57:20Z

Concise:

This guide demonstrates how to use SGLang’s Tool Calling functionality.

zhaochenyang20 · 2025-01-11T18:57:39Z

zhaochenyang20 · 2025-01-11T18:58:49Z

Do not use more than ### titles. Use **Non-Streaming Request** instead.

zhaochenyang20 · 2025-01-11T18:59:09Z

Do not use more than ### titles. Use **Streaming Request** instead.

zhaochenyang20 · 2025-01-11T19:01:22Z

Do not use ( and ) here.

Just:

arguments_non_stream = response_non_stream.choices[0].message.tool_calls[0].function.arguments

zhaochenyang20

The docs looks good to me. @shuaills will review other parts.

zhaochenyang20 · 2025-01-20T23:20:39Z

@YAMY1234 @shuaills @Thunderbeee Tom is making rebase these days, so it's urgent to merge and review 😂

zhaochenyang20 · 2025-01-22T08:58:14Z

Nice work!

zhaochenyang20 · 2025-01-25T18:54:49Z

As a matter of fact, if these contents are redundant in function_Calling.ipynb, please just use a link to redirect contents in function_calling.ipynb. Use URL is okay, like https://docs.sglang.ai/backend/openai_api_completions.html#Launch-A-Server

zhaochenyang20 · 2025-01-25T18:55:15Z

As a matter of fact, if these contents are redundant in function_Calling.ipynb, please just use a link to redirect contents in function_calling.ipynb. Use URL is okay, like https://docs.sglang.ai/backend/openai_api_completions.html#Launch-A-Server

zhaochenyang20

LGTM

Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>

YAMY1234 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners January 2, 2025 04:54

YAMY1234 changed the title ~~Feature/function calling update~~ WIP: Feature/function calling update Jan 2, 2025

Tushar-ml mentioned this pull request Jan 2, 2025

[Feature] Add partial support for parrallel_tool_calls in Funcation Calling API #2576

Closed

3 tasks

YAMY1234 requested a review from HaiShaw as a code owner January 2, 2025 19:59

YAMY1234 force-pushed the feature/function-calling-update branch 3 times, most recently from 25a03b0 to 63c3d4e Compare January 2, 2025 20:10

YAMY1234 force-pushed the feature/function-calling-update branch from 8ad4c1f to 882b77a Compare January 8, 2025 06:00

HaoyuWang4188 reviewed Jan 8, 2025

View reviewed changes

zhaochenyang20 requested changes Jan 9, 2025

View reviewed changes

zhaochenyang20 requested changes Jan 10, 2025

View reviewed changes

Thunderbeee force-pushed the feature/function-calling-update branch from 9f9ecec to cce2859 Compare January 11, 2025 08:42

zhaochenyang20 requested changes Jan 11, 2025

View reviewed changes

YAMY1234 force-pushed the feature/function-calling-update branch from 6d71133 to 94a1338 Compare January 14, 2025 17:09

zhaochenyang20 requested changes Jan 15, 2025

View reviewed changes

YAMY1234 force-pushed the feature/function-calling-update branch from 8f4ac89 to 9cf3086 Compare January 20, 2025 19:23

Thunderbeee force-pushed the feature/function-calling-update branch from 5442889 to cc147ba Compare January 22, 2025 00:08

YAMY1234 and others added 11 commits January 25, 2025 16:38

fix: function_calling_streaming doc

d5ba89a

fix: bug for native_api doc

f4088bf

update offline engine function calling api

1d15814

fix: function tool data model change issue

97a1170

fix: rebase issue

b4266ca

fix: function call parser issue

d2a78ea

support multiple function calls

4727b3e

resolve code review issues

9327fb1

fix: tool-call-parser args issue

83161ac

fix: function call parser args input for native api

add8e36

resolve code review comments

5717965

YAMY1234 force-pushed the feature/function-calling-update branch from 522ab47 to 5717965 Compare January 25, 2025 17:41

YAMY1234 changed the title ~~WIP: Feature/function calling update~~ Feature/function calling update Jan 25, 2025

update tool call parser help info

3f43538

zhaochenyang20 requested changes Jan 25, 2025

View reviewed changes

combine documentation

cc8bcfb

zhaochenyang20 approved these changes Jan 25, 2025

View reviewed changes

zhaochenyang20 added 5 commits January 25, 2025 14:47

Merge branch 'main' into feature/function-calling-update

a17a80a

Merge branch 'main' into feature/function-calling-update

230355e

Merge branch 'main' into feature/function-calling-update

76debe3

Merge branch 'main' into feature/function-calling-update

4a06960

Merge branch 'main' into feature/function-calling-update

2e5391d

zhaochenyang20 merged commit b045841 into sgl-project:main Jan 26, 2025

NeilJohnson0930 mentioned this pull request Feb 28, 2025

[Feature Request] Add function call in SGLang camel-ai/camel#1665

Closed

2 tasks

This was referenced Mar 3, 2025

Feat/support code completion #3612

Merged

[Feature] Refactor all parser features #4036

Closed

DarkSharpness mentioned this pull request Mar 11, 2025

[Feature] Support "strict" in function calling #4310

Merged

6 tasks

vhain mentioned this pull request Mar 27, 2025

deps: lazy import optional dependencies gguf and torchvision #4826

Merged

6 tasks

	calls.append(tool_call_item)
	calls.append(tool_call_item)
	break

-        _, action = text.split("<|python_tag|>")
+        _, action = text.split("<|python_tag|>")
+        # split multiple actions and only select the first one
+        # e.g. {"name": "A", "parameters": {"arg": "x"}}; {"name": "B", "parameters": {"arg": "y"}}
+        if "}};" in action:
+            action = action.split("}};")[0] + "}}"

Conversation

YAMY1234 commented Jan 2, 2025

Pull Request Description

Summary

Detailed Changes

Testing

Uh oh!

merrymercy commented Jan 2, 2025

Uh oh!

YAMY1234 commented Jan 3, 2025

Uh oh!

HaoyuWang4188 commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoyuWang4188 commented Jan 3, 2025

Uh oh!

YAMY1234 commented Jan 8, 2025

Uh oh!

HaoyuWang4188 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HaoyuWang4188 commented Jan 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Launch the Sever

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Launch the Sever

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

HaoyuWang4188 commented Jan 3, 2025 •

edited

Loading