-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Chat schemas #39609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chat schemas #39609
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks a lot! Could you push to the Hub an example of a model that would have this regex in its config/chat template so we can see what the result would look like? |
Will do! I'll add that once the utils are integrated with tokenizers. |
🙏 Since tool use is probably the biggest issue this PR addresses, would it make sense to add a |
cc @yonigottesman as well - the parsers for this could be modified to automatically indicate regions that come from assistant/user messages, which would mean no more need for manually writing |
[emerging from a dungeon marked "GPT-OSS launch"] hopefully should have a mostly-ready draft for this in the next week or two so I can start getting feedback |
Speaking of the gpt-oss launch, the new OpenAI Harmony format would be a good test of this proposal. On the surface, I worry it and any future similar formats may be too complex for the parsing as defined here, and OpenAI ships their own Rust / Python libraries to parse the streaming and non-streaming variants of their format properly. With that said, I do think this idea generally is pointing in a positive direction. The challenge will be whether a context-free grammar, regex, or something similar is enough to properly construct things like tool call parsers for every popular model. And, the new free-form function calling and context-free grammars added in GPT-5 will add extra wrinkles to tool call parsing if we start to see open models implement those same capabilities. |
@bbrowning yep! I'm quietly working on writing chat schemas for ~5 models that I know have complex tool templates, of which |
elif parser == "python_type": | ||
# TODO eval is obviously enormously insecure and only used for prototyping here | ||
# make a safer parser before merging | ||
node_content = _parse_type_hint(eval(node_content)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would ast.literal_eval
work here? (docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan! While prototyping I was just using eval
because I already had existing code to parse a Python type object into a JSON type. Obviously all of the eval stuff will have to go before merging
78cb942
to
1fd2ce6
Compare
bbe4483
to
19b8938
Compare
After a lot of iteration, I'm closing this PR in favour of a simpler approach, focusing on just parsing model output to start, rather than the entire history. The main reason is simply that the schema to parse a whole chat was too complex, and after a lot of trying I wasn't able to find a simpler approach. As a result, users probably wouldn't be able to write them! |
Output schema PR at #40894 |
This is an experimental PR to get feedback on a new potential feature called Chat Schemas.
What problem does this fix?
Since the arrival of chat templates, I've gotten a lot of requests for the same two features:
Right now, people handle these in hacky ways. For example, some code searches the template for references to "tools" to decide if it supports tools or not. Other frameworks use hardcoded functions to infer some common tool call formats and parse them.
What's the solution?
Models can have a chat schema alongside the chat template. This is a pure JSON file, containing a JSON schema representing the model's input format. For example, a simple chat schema for a model that only supports messages, not tools, might look like this:
There's a twist, though: We allow an extra field in the schema:
x-regex
. This specifies the regex used to extract this schema node, optionally with named groups that indicate how to extract child nodes as well. For example, for a simple model with ChatML formatting, the regex could be:Using this schema and the regex(es), we can walk the schema and formatted output recursively and reconstruct the original model inputs.
What do we get?
This resolves both of the long-standing demands: We now have a way to parse formatted chats back to lists of messages. We can also parse tool calls! This means that if models have chat schemas, they can be used in a universal API that doesn't require any model-specific tool parsing. This has been a major weakness in chat templates since I made them!
What are the downsides?
The main downside is that, like with chat templates, someone has to actually add these schemas to models! The base schema is easy enough to write, but the regexes may be harder for complex tool calling models. However, in testing, they weren't too bad. I think writing a chat schema is a lot less work than writing a chat template, especially since you can usually copy the entire schema from another model and just tweak the regexes a little.
Work still to do in this PR
eval
callsFixes #40776