Skip to content

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Jul 23, 2025

This is an experimental PR to get feedback on a new potential feature called Chat Schemas.

What problem does this fix?

Since the arrival of chat templates, I've gotten a lot of requests for the same two features:

  1. People want a way to detect which inputs a template supports. For example, does it support system messages or tools? Do the tools need special formatting, or is the default okay?
  2. People want a way to parse model outputs, especially when the model calls a tool or has thinking blocks. Ideally, people want a way to turn an entire formatted conversation back into a list of messages, tool defs, tool calls, etc.

Right now, people handle these in hacky ways. For example, some code searches the template for references to "tools" to decide if it supports tools or not. Other frameworks use hardcoded functions to infer some common tool call formats and parse them.

What's the solution?

Models can have a chat schema alongside the chat template. This is a pure JSON file, containing a JSON schema representing the model's input format. For example, a simple chat schema for a model that only supports messages, not tools, might look like this:

{
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "role": {"type": "string"},
            "content": {"type": "string"}
        },
        "required": ["role", "content"]
    },
}

There's a twist, though: We allow an extra field in the schema: x-regex. This specifies the regex used to extract this schema node, optionally with named groups that indicate how to extract child nodes as well. For example, for a simple model with ChatML formatting, the regex could be:

r"<\|im_start\|>(?P<role>.*?)\n(?P<content>.*?)<\|im_end\|>\n"

Using this schema and the regex(es), we can walk the schema and formatted output recursively and reconstruct the original model inputs.

What do we get?

This resolves both of the long-standing demands: We now have a way to parse formatted chats back to lists of messages. We can also parse tool calls! This means that if models have chat schemas, they can be used in a universal API that doesn't require any model-specific tool parsing. This has been a major weakness in chat templates since I made them!

What are the downsides?

The main downside is that, like with chat templates, someone has to actually add these schemas to models! The base schema is easy enough to write, but the regexes may be harder for complex tool calling models. However, in testing, they weren't too bad. I think writing a chat schema is a lot less work than writing a chat template, especially since you can usually copy the entire schema from another model and just tweak the regexes a little.

Work still to do in this PR

  • Add a lot more test coverage (50% done)
    • Make sure we can parse tool defs as well as tool calls, and add more formats!
  • Add JSON parser
  • Add Python type / tool def parser
  • Add code to tokenizers to load / save chat schemas
  • Add code to pipelines for output / tool call parsing?
  • Overhaul chat template docs to include chat schemas too
  • Add code for offset extraction
  • Use offsets for assistant turn masking
  • Cleanup code TODOs and remove very insecure eval calls

Fixes #40776

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@LysandreJik
Copy link
Member

Thanks a lot! Could you push to the Hub an example of a model that would have this regex in its config/chat template so we can see what the result would look like?

@Rocketknight1
Copy link
Member Author

Will do! I'll add that once the utils are integrated with tokenizers.

@gante
Copy link
Member

gante commented Jul 23, 2025

🙏

Since tool use is probably the biggest issue this PR addresses, would it make sense to add a chat round -> tool name, tool arguments util at a later point in this PR?

@Rocketknight1
Copy link
Member Author

cc @yonigottesman as well - the parsers for this could be modified to automatically indicate regions that come from assistant/user messages, which would mean no more need for manually writing {% generation %} tags. That assumes it works, though - I'm still doing a lot of testing of edge cases!

@Rocketknight1 Rocketknight1 marked this pull request as ready for review July 24, 2025 16:21
@Rocketknight1 Rocketknight1 marked this pull request as draft July 24, 2025 16:21
@Rocketknight1
Copy link
Member Author

[emerging from a dungeon marked "GPT-OSS launch"] hopefully should have a mostly-ready draft for this in the next week or two so I can start getting feedback

@bbrowning
Copy link

Speaking of the gpt-oss launch, the new OpenAI Harmony format would be a good test of this proposal. On the surface, I worry it and any future similar formats may be too complex for the parsing as defined here, and OpenAI ships their own Rust / Python libraries to parse the streaming and non-streaming variants of their format properly.

With that said, I do think this idea generally is pointing in a positive direction. The challenge will be whether a context-free grammar, regex, or something similar is enough to properly construct things like tool call parsers for every popular model. And, the new free-form function calling and context-free grammars added in GPT-5 will add extra wrinkles to tool call parsing if we start to see open models implement those same capabilities.

@Rocketknight1
Copy link
Member Author

@bbrowning yep! I'm quietly working on writing chat schemas for ~5 models that I know have complex tool templates, of which gpt-oss is one. I'm using that to shake out the problems in the implementation and see if any features are missing. Right now I'm reasonably confident that JSON schema with a couple of extensions will work, but we'll see if it breaks down somewhere!

@Rocketknight1 Rocketknight1 marked this pull request as ready for review August 28, 2025 14:20
elif parser == "python_type":
# TODO eval is obviously enormously insecure and only used for prototyping here
# make a safer parser before merging
node_content = _parse_type_hint(eval(node_content))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would ast.literal_eval work here? (docs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the plan! While prototyping I was just using eval because I already had existing code to parse a Python type object into a JSON type. Obviously all of the eval stuff will have to go before merging

@Rocketknight1
Copy link
Member Author

After a lot of iteration, I'm closing this PR in favour of a simpler approach, focusing on just parsing model output to start, rather than the entire history. The main reason is simply that the schema to parse a whole chat was too complex, and after a lot of trying I wasn't able to find a simpler approach. As a result, users probably wouldn't be able to write them!

@Rocketknight1 Rocketknight1 mentioned this pull request Sep 15, 2025
21 tasks
@Rocketknight1
Copy link
Member Author

Output schema PR at #40894

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add function for reversing chat templates

6 participants