Feature/function calling update#2700
Conversation
|
@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576 |
25a03b0 to
63c3d4e
Compare
Sure! We’ll review each other’s code and collaborate to work out a great solution. 🚀 |
|
Hi! After a general review, I would like to initiate some discussions to help we determine the best solution: 1. Support for parallel_tool_calls OpenAI API supports
I tried to align these behaviours by set IMO, parallel_tool_calls should be supported in our function calling API (both static and stream).
2. Aligning Terms of model names
I prefer to change |
|
Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy |
8ad4c1f to
882b77a
Compare
Hi @HaoyuWang4188 Thank you for the detailed review and suggestions! Currently, our implementation is nearly complete. Could you consider adding this new parameter based on our existing changes (perhaps cherry-picking them)? Alternatively, I have another suggestion: we could combine the addition of the parallel_tool_calls parameter and its full implementation into the next PR to streamline the process. Let me know your thoughts! |
HaoyuWang4188
left a comment
There was a problem hiding this comment.
Add some suggested changes for parrallel_too_call
There was a problem hiding this comment.
| calls.append(tool_call_item) | |
| calls.append(tool_call_item) | |
| break |
There was a problem hiding this comment.
| _, action = text.split("<|python_tag|>") | |
| _, action = text.split("<|python_tag|>") | |
| # split multiple actions and only select the first one | |
| # e.g. {"name": "A", "parameters": {"arg": "x"}}; {"name": "B", "parameters": {"arg": "y"}} | |
| if "}};" in action: | |
| action = action.split("}};")[0] + "}}" |
I'm glad we agreed 😄. I commented with the related suggestion (no cherry-pick needed now). Next, we just need to merge #2700 and #2576 to finish the main support for the function calling API. Can't wait to use it in Sglang 🚀 . |
There was a problem hiding this comment.
Refer to the docs readme:
# 4) Clean notebook outputs
# nbstripout removes notebook outputs so your PR stays clean
pip install nbstripout
find . -name '*.ipynb' -exec nbstripout {} \;There was a problem hiding this comment.
Make this concise.
This guide demonstrates how to use SGLang’s ToolCalling functionality with a get_current_weather function. You can replace or add any tool function depending on your use case.
There was a problem hiding this comment.
Concise it and do not use os.environ["CUDA_VISIBLE_DEVICES"] = "7". We only have 1 GPU for docs CI.
Launch the Sever
import os
from openai import OpenAI
import json
from sglang.utils import execute_shell_command, wait_for_server, terminate_process
server_process = execute_shell_command(
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --tool-call-parser llama3 --port 30222 --host 0.0.0.0"
)
wait_for_server("http://localhost:30222")Note that --tool-call-parser defines the parser used to interpret responses. Currently supported parsers include:
- llama3: Llama 3.1 / 3.2 (e.g. meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct).
- mistral: Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3).
- qwen25: Qwen 2.5 (e.g. Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-7B-Instruct).
There was a problem hiding this comment.
Below is a Python snippet that shows how to define a tool as a dictionary. The dictionary includes a tool name, a description, and property defined Parameters.
There was a problem hiding this comment.
This should alway be true and even assert to be true. Do not add this if-else.
There was a problem hiding this comment.
This should also assert to be true.
There was a problem hiding this comment.
Just adding one block at the end and:
terminate_process(server_process)No need to explain and make a title.
There was a problem hiding this comment.
This is verbose. Use this plz:
Launch the Sever
import os
from openai import OpenAI
import json
from sglang.utils import execute_shell_command, wait_for_server, terminate_process
server_process = execute_shell_command(
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --tool-call-parser llama3 --port 30222 --host 0.0.0.0"
)
wait_for_server("http://localhost:30222")Note that --tool-call-parser defines the parser used to interpret responses. Currently supported parsers include:
- llama3: Llama 3.1 / 3.2 (e.g. meta-llama/Llama-3.1-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct).
- mistral: Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3).
- qwen25: Qwen 2.5 (e.g. Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-7B-Instruct).
There was a problem hiding this comment.
Delete this line. Do not need any explanation since the code is clear.
There was a problem hiding this comment.
When the server -> When the engine
There was a problem hiding this comment.
Note, non-streaming mode also supports function calling. ->
Give example to it like streaming ones.
There was a problem hiding this comment.
Delete this explanation. Since in the codes, you have this line:
# This is a demonstration, define real function according to your usage.There was a problem hiding this comment.
delete this line, it's redundant to # This is a demonstration, define real function according to your usage..
There was a problem hiding this comment.
delete this line, I don't see the meaning leaving it here.
9f9ecec to
cce2859
Compare
There was a problem hiding this comment.
Concise, name it: Tool and Function Calling
There was a problem hiding this comment.
Concise:
This guide demonstrates how to use SGLang’s Tool Calling functionality.
There was a problem hiding this comment.
Do not use more than ### titles. Use **Non-Streaming Request** instead.
There was a problem hiding this comment.
Do not use more than ### titles. Use **Streaming Request** instead.
There was a problem hiding this comment.
Do not use ( and ) here.
Just:
arguments_non_stream = response_non_stream.choices[0].message.tool_calls[0].function.arguments6d71133 to
94a1338
Compare
zhaochenyang20
left a comment
There was a problem hiding this comment.
The docs looks good to me. @shuaills will review other parts.
8f4ac89 to
9cf3086
Compare
|
@YAMY1234 @shuaills @Thunderbeee Tom is making rebase these days, so it's urgent to merge and review 😂 |
5442889 to
cc147ba
Compare
|
Nice work! |
522ab47 to
5717965
Compare
There was a problem hiding this comment.
As a matter of fact, if these contents are redundant in function_Calling.ipynb, please just use a link to redirect contents in function_calling.ipynb. Use URL is okay, like https://docs.sglang.ai/backend/openai_api_completions.html#Launch-A-Server
There was a problem hiding this comment.
As a matter of fact, if these contents are redundant in function_Calling.ipynb, please just use a link to redirect contents in function_calling.ipynb. Use URL is okay, like https://docs.sglang.ai/backend/openai_api_completions.html#Launch-A-Server
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>

Pull Request Description
Summary
This pull request introduces streaming modes for function calling within the OpenAI API integration, and updated the non-streaming framework for better extensibility. The changes include:
New Features:
FunctionCallParserfor robust and efficient parsing of function calls in both streaming and non-streaming contexts.parse_streaming_incrementmethod.Refactoring:
openai_api/adapter.pyto integrate streaming tool call parsing logic.openai_api/protocol.pywith additional models (ToolCallItem,DeltaMessage) to support streaming functionalities.Documentation:
Detailed Changes
docs/backend/function_calling_streaming.py:python/sglang/srt/function_call_parser.py:FunctionCallParser,StreamingJSONParser, and related utility functions to handle function calls during streaming responses.python/sglang/srt/openai_api/adapter.py:FunctionCallParserto enable real-time function call parsing during streaming response generation.python/sglang/srt/openai_api/protocol.py:FunctionResponseandToolCallmodels to useOptionalfields for compatibility with the new parser.ToolCallItemandDeltaMessagemodels to streamline the representation of parsed tool calls and response deltas.Testing