Chat schemas #39609

Rocketknight1 · 2025-07-23T14:36:54Z

This is an experimental PR to get feedback on a new potential feature called Chat Schemas.

What problem does this fix?

Since the arrival of chat templates, I've gotten a lot of requests for the same two features:

People want a way to detect which inputs a template supports. For example, does it support system messages or tools? Do the tools need special formatting, or is the default okay?
People want a way to parse model outputs, especially when the model calls a tool or has thinking blocks. Ideally, people want a way to turn an entire formatted conversation back into a list of messages, tool defs, tool calls, etc.

Right now, people handle these in hacky ways. For example, some code searches the template for references to "tools" to decide if it supports tools or not. Other frameworks use hardcoded functions to infer some common tool call formats and parse them.

What's the solution?

Models can have a chat schema alongside the chat template. This is a pure JSON file, containing a JSON schema representing the model's input format. For example, a simple chat schema for a model that only supports messages, not tools, might look like this:

{
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "role": {"type": "string"},
            "content": {"type": "string"}
        },
        "required": ["role", "content"]
    },
}

There's a twist, though: We allow an extra field in the schema: x-regex. This specifies the regex used to extract this schema node, optionally with named groups that indicate how to extract child nodes as well. For example, for a simple model with ChatML formatting, the regex could be:

r"<\|im_start\|>(?P<role>.*?)\n(?P<content>.*?)<\|im_end\|>\n"

Using this schema and the regex(es), we can walk the schema and formatted output recursively and reconstruct the original model inputs.

What do we get?

This resolves both of the long-standing demands: We now have a way to parse formatted chats back to lists of messages. We can also parse tool calls! This means that if models have chat schemas, they can be used in a universal API that doesn't require any model-specific tool parsing. This has been a major weakness in chat templates since I made them!

What are the downsides?

The main downside is that, like with chat templates, someone has to actually add these schemas to models! The base schema is easy enough to write, but the regexes may be harder for complex tool calling models. However, in testing, they weren't too bad. I think writing a chat schema is a lot less work than writing a chat template, especially since you can usually copy the entire schema from another model and just tweak the regexes a little.

Work still to do in this PR

Fixes #40776

HuggingFaceDocBuilderDev · 2025-07-23T14:50:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik · 2025-07-23T15:09:46Z

Thanks a lot! Could you push to the Hub an example of a model that would have this regex in its config/chat template so we can see what the result would look like?

Rocketknight1 · 2025-07-23T15:13:25Z

Will do! I'll add that once the utils are integrated with tokenizers.

gante · 2025-07-23T16:49:29Z

🙏

Since tool use is probably the biggest issue this PR addresses, would it make sense to add a chat round -> tool name, tool arguments util at a later point in this PR?

Rocketknight1 · 2025-07-24T16:20:28Z

cc @yonigottesman as well - the parsers for this could be modified to automatically indicate regions that come from assistant/user messages, which would mean no more need for manually writing {% generation %} tags. That assumes it works, though - I'm still doing a lot of testing of edge cases!

Rocketknight1 · 2025-08-18T15:43:19Z

[emerging from a dungeon marked "GPT-OSS launch"] hopefully should have a mostly-ready draft for this in the next week or two so I can start getting feedback

bbrowning · 2025-08-19T13:50:46Z

Speaking of the gpt-oss launch, the new OpenAI Harmony format would be a good test of this proposal. On the surface, I worry it and any future similar formats may be too complex for the parsing as defined here, and OpenAI ships their own Rust / Python libraries to parse the streaming and non-streaming variants of their format properly.

With that said, I do think this idea generally is pointing in a positive direction. The challenge will be whether a context-free grammar, regex, or something similar is enough to properly construct things like tool call parsers for every popular model. And, the new free-form function calling and context-free grammars added in GPT-5 will add extra wrinkles to tool call parsing if we start to see open models implement those same capabilities.

Rocketknight1 · 2025-08-20T11:39:20Z

@bbrowning yep! I'm quietly working on writing chat schemas for ~5 models that I know have complex tool templates, of which gpt-oss is one. I'm using that to shake out the problems in the implementation and see if any features are missing. Right now I'm reasonably confident that JSON schema with a couple of extensions will work, but we'll see if it breaks down somewhere!

xenova · 2025-08-28T14:28:21Z

src/transformers/utils/chat_parsing_utils.py

+        elif parser == "python_type":
+            # TODO eval is obviously enormously insecure and only used for prototyping here
+            #      make a safer parser before merging
+            node_content = _parse_type_hint(eval(node_content))


would ast.literal_eval work here? (docs)

Yes, that's the plan! While prototyping I was just using eval because I already had existing code to parse a Python type object into a JSON type. Obviously all of the eval stuff will have to go before merging

…or new format

Rocketknight1 · 2025-09-15T17:10:53Z

After a lot of iteration, I'm closing this PR in favour of a simpler approach, focusing on just parsing model output to start, rather than the entire history. The main reason is simply that the schema to parse a whole chat was too complex, and after a lot of trying I wasn't able to find a simpler approach. As a result, users probably wouldn't be able to write them!

Rocketknight1 · 2025-09-15T17:22:10Z

Output schema PR at #40894

Rocketknight1 marked this pull request as ready for review July 24, 2025 16:21

Rocketknight1 marked this pull request as draft July 24, 2025 16:21

Rocketknight1 mentioned this pull request Jul 28, 2025

feat(tokenization): add encode_message to tokenize messages one by one #39507

Merged

wseaton mentioned this pull request Aug 18, 2025

[Feature][Tools]: Complete Redesign of Tool Calling vllm-project/vllm#22918

Open

1 task

Rocketknight1 marked this pull request as ready for review August 28, 2025 14:20

xenova reviewed Aug 28, 2025

View reviewed changes

Rocketknight1 force-pushed the chat_schemas branch from 78cb942 to 1fd2ce6 Compare August 29, 2025 15:51

Rocketknight1 mentioned this pull request Sep 10, 2025

Add function for reversing chat templates #40776

Open

Rocketknight1 added 14 commits September 10, 2025 18:20

Initial commit

50d59aa

Adding more tests, bugfixes, starting tool tests

d607fd3

Add support for JSON parsers and some tool tests

71c31a7

stash commit

11acee8

stash commit

a1e45d9

stash commit

16bc1ac

stash commit

77fbc2b

stash commit

688236c

Fix cohere schema, fix a lot of the recursive parser code

56ed7fe

GPT-OSS passing too!

0b7691e

Update tests

7d9cbcf

make fixup

594e591

Offset tracking partially done

c85f6d0

stash commit

ace17f1

Rocketknight1 added 6 commits September 10, 2025 18:20

stash commit

b2be4cb

Assistant masking Just Works

6c86402

make fixup

9ae8aec

stash commit

6ada125

stash commit

996109d

JMESPath approach

19b8938

Rocketknight1 force-pushed the chat_schemas branch from bbe4483 to 19b8938 Compare September 10, 2025 17:21

Rocketknight1 added 4 commits September 12, 2025 16:42

stash commit before i rip this PR apart

546f056

Remove broken offset code

89b3fe0

Remove broken offset code

cb25dfa

Update chat parsing code and add tests for Ernie + fix Cohere tests f…

3f642a2

…or new format

Rocketknight1 closed this Sep 15, 2025

Rocketknight1 mentioned this pull request Sep 15, 2025

Chat response parsing #40894

Open

21 tasks

Chat schemas #39609

Chat schemas #39609

Uh oh!

Conversation

Rocketknight1 commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this fix?

What's the solution?

What do we get?

What are the downsides?

Work still to do in this PR

Uh oh!

HuggingFaceDocBuilderDev commented Jul 23, 2025

Uh oh!

LysandreJik commented Jul 23, 2025

Uh oh!

Rocketknight1 commented Jul 23, 2025

Uh oh!

gante commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Jul 24, 2025

Uh oh!

Rocketknight1 commented Aug 18, 2025

Uh oh!

bbrowning commented Aug 19, 2025

Uh oh!

Rocketknight1 commented Aug 20, 2025

Uh oh!

xenova Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Sep 15, 2025

Uh oh!

Rocketknight1 commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Rocketknight1 commented Jul 23, 2025 •

edited

Loading

gante commented Jul 23, 2025 •

edited

Loading