[WIP] Structured model state #929

hudson-ai · 2024-06-25T21:55:09Z

This PR primarily replaces the implementation of model state (Model._state) with a list of python objects. Currently, state is a str with embedded "tags" that represent special embedded objects, e.g. html for formatting, images for multimodal models, etc.

On the one hand, the existing implementation is "nice" because it allows adding these special objects as strings inside of normal (and even stateless) guidance functions without having do anything "dirty" like reach into private methods on Model -- everything is string concatenation.

On the other hand, this means that this string has to be "parsed" to produce actual prompts (usually pretty straight-forwardly via re.replace), introspect on token probabilities, or extract image data from tags. This feels fragile as a model could easily produce strings that correspond to our tags and blow everything up. Furthermore, if we ever have a guidance function that produces "special" strings with formatting, etc. inside of a select block, we're actually asking the model to produce this special structure instead of just the actual textual content...

A few TODOS:

Rethink formatting-only context blocks like silent and monospace
See if this approach simplifies the implementation of multimodal models (@nking-1)
See if this approach simplifies chat engines
- e.g. have a method on ModelState that turns it into a list of {"role": "user", "message": ...} objects

Thank you @nking-1 for the help and feedback on this so far!
Note that this train of thought originated from the discussion on PR #905 for adding terminal colors. This isn't technically a blocker for that PR, but adding structured state will make that PR trivial :)

The existing implementation wouldn't properly pop the selected role off the opened_blocks dict, so roles could be closed twice. We'd need a stateful implementation that could modify the Model object more directly... Removing it for now

hudson-ai · 2024-06-25T21:57:26Z

guidance/_grammar.py

@@ -966,10 +966,6 @@ def _re_with_temperature(grammar, temperature, visited_set):
 #     return ModelVariable(name)


-def active_role_end() -> ModelVariable:


The current implementation of active_role_end will actually produce the closing tag twice since it is essentially a stateless grammar and can't pop ContextBlocks off of a model. Furthermore, generating active_role_end requires the model to produce special tags which will be hidden from future prompts... A better implementation of this may exist, but I'm not quite sure yet. Deleting it for now

hudson-ai · 2024-06-25T21:58:41Z

guidance/models/_model.py

+        # Import in function to guard against circular import
+        from ..library._role import RoleBlock


I have a feeling that some may object to the import inside of a function... Is anyone else better at resolving issues with circular imports?

Well, we haven't tried turning on ruff or flake8 yet.....

hudson-ai · 2024-06-25T22:00:03Z

guidance/models/_model_state.py

+    def _html(self) -> str:
+        raise NotImplementedError
+
+    def __str__(self) -> str:
+        raise NotImplementedError


Currently undecided as to whether these should be methods or if these should just be data containers with external "Handler" classes to produce HTML, strings for prompts, pretty ANSI codes for terminal outputs, etc...

I'd definitely separate model state from how it gets displayed.

riedgar-ms

I think the basic idea of this is great.

riedgar-ms · 2024-06-27T11:50:45Z

guidance/models/_model_state.py

+
+
+@dataclass(frozen=True, slots=True)
+class Object:


We might want a better name....

Definitely -- please consider this a sketch 😆

riedgar-ms · 2024-06-27T11:52:32Z

guidance/models/_model_state.py

+    __slots__ = ["objects"]
+
+    def __init__(self) -> None:
+        self.objects: list[Object] = []


I imagine there are constraints on this list - for example, a RoleCloser must be followed by a RoleOpener?

Closers may come arbitrarily far in the future, but there is logic inside the model class to ensure that they get closed... Currently we just have run-time checks for balanced openers/closers inside the chat models, and I'm not sure if we can do much better... But we may at least be able to have constraints that you can't open a role if one is already open

I was wondering if there should be a nested structure - a list of Role blocks, each of which could contain an arbitrary list of Text, Image etc. blocks. Or would that be overly complicated?

I had the same thought actually. On my first attempt, that felt pretty complicated, but it might not be so bad now that this skeleton is here. I'll experiment with it a bit :)

paulbkoch · 2024-06-29T07:12:58Z

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

hudson-ai · 2024-07-02T00:20:06Z

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

Thanks for taking a look @paulbkoch :)

I have to think about this a bit, but my first impression is that there is nothing about structured model state that would prohibit special control sequences from the models themselves. Just thinking out loud...

internal model state can contain extra data that isn't used for LLM prompting, e.g. probabilities of generated tokens, formatting information, etc.
models are allowed to produce control sequences that add special data to the model state (e.g. start/end of roles)
that being said, but models should never be able to write that extra data directly (i.e. they have no control over the probability we log for a token or how we format role blocks, they only have control over whether or not a role block has begun, etc.)

The idea is just to make a logical distinction between the way that we do internal bookkeeping and how the contents of that bookkeeping are displayed to models, to ipython for formatting, etc.

Is this reasonable?

Harsha-Nori · 2024-07-03T16:00:11Z

I had the same concern as @paulbkoch, but I think your reply makes sense here Hudson :). In thinking about this, I wanted to brainstorm broadly about what types of metadata/bookkeeping we may want to track -- even beyond what we have implemented today -- and what the atomic "unit" the metadata ties to is (e.g. each token, each char in the model state, each byte, etc.)

A rough laundry list off the top of my head (we certainly don't need to implement all of these, but just want to brainstorm broadly):

Generated vs. templated text
(Log) Probability of each token
Was a token forced by the grammar?
What block(s) does the text/tokens belong to (role tags, user defined blocks)
Sampling parameters/strategy at time of generation (?)
If we introduce speculative decoding, is there speculative model metadata (e.g. how many tokens were accepted in the speculation)
Runtime performance metrics per token (wall clock time, hardware util stats, etc.)
If we update the parser to be hierarchical, which parser handled this piece of text (e.g. regex lexer vs parser)
Is a byte/token a text byte, or something multimodal (e.g. image, audio)
etc.

If we get the design right here, it should get much easier to expose rich metrics to users and build alternative UIs. We could also shift over the (potentially buggy) token counting we're doing in the Metrics classes to this new model state view. Thoughts?

hudson-ai · 2024-07-03T16:09:09Z

@Harsha-Nori love these ideas!

I feel that all of these "logging" use cases, multimodality, chat vs. completion APIs, ... suggest that some kind of additional structure is needed (whether or not it ends up looking anything like what I drafted here).

Whatever the right design is, it should probably be:

Extensible (e.g. it should be trivial to add a new modality or piece of logging attached to certain tokens or bytes in the future)
Performant (I don't want to worry too much about optimizing so early, but the structure should hopefully be lightweight enough that we don't spend all of our CPU time formatting prompts...)
User friendly (this maybe goes up with extensible, but the current html-ish/tag formatting business is confusing and the implementation is distributed across the repo, making it very hard to understand/introspect/modify)

riedgar-ms · 2024-07-10T17:03:07Z

When it comes to performance, do bear in mind that we already have things like this hanging around:

guidance/guidance/models/_openai.py

Line 89 in 4f7bf9e

while found:

(other models have similar, if not quite identical code). If we make the internal model state richer, then we'll be able to generate conversations turns directly, rather than having to parse them out of a single state string.

hudson-ai added 15 commits June 20, 2024 18:10

make _state structured data rather than a string

ac8f832

deep copy modelstate

75ffd29

RoleBlock type

1ca3a33

Role{Opener,Closer} Object types

5d86bd1

Fix _repr_html_

423489c

Be a bit more careful about close_text

86b9904

fix and simplify model.__str__ impl

d88dfb2

Kill unnecessary tag strings

6b13846

No more format_pattern

8d51092

Remove old patterns

88b4a9c

immutability and slots

55b9162

Object-typed closers

b9fb115

Use closers in html

7ab695b

Remove active_role_end

c6af474

The existing implementation wouldn't properly pop the selected role off the opened_blocks dict, so roles could be closed twice. We'd need a stateful implementation that could modify the Model object more directly... Removing it for now

Merge branch 'main' into structured_state

ced5147

hudson-ai requested review from Harsha-Nori and riedgar-ms June 25, 2024 21:55

hudson-ai self-assigned this Jun 25, 2024

hudson-ai commented Jun 25, 2024

View reviewed changes

riedgar-ms reviewed Jun 27, 2024

View reviewed changes

hudson-ai mentioned this pull request Aug 14, 2024

added Marimo support for echo=True #981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Structured model state #929

[WIP] Structured model state #929

hudson-ai commented Jun 25, 2024 •

edited

Loading

hudson-ai Jun 25, 2024

hudson-ai Jun 25, 2024

riedgar-ms Jul 10, 2024

hudson-ai Jun 25, 2024

riedgar-ms Jul 10, 2024

riedgar-ms left a comment

riedgar-ms Jun 27, 2024

hudson-ai Jun 27, 2024

riedgar-ms Jun 27, 2024

hudson-ai Jun 27, 2024

riedgar-ms Jun 27, 2024

hudson-ai Jun 27, 2024

paulbkoch commented Jun 29, 2024

hudson-ai commented Jul 2, 2024

Harsha-Nori commented Jul 3, 2024

hudson-ai commented Jul 3, 2024

riedgar-ms commented Jul 10, 2024

		@@ -966,10 +966,6 @@ def _re_with_temperature(grammar, temperature, visited_set):
		# return ModelVariable(name)


		def active_role_end() -> ModelVariable:

		# Import in function to guard against circular import
		from ..library._role import RoleBlock

[WIP] Structured model state #929

Are you sure you want to change the base?

[WIP] Structured model state #929

Conversation

hudson-ai commented Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

riedgar-ms left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulbkoch commented Jun 29, 2024

hudson-ai commented Jul 2, 2024

Harsha-Nori commented Jul 3, 2024

hudson-ai commented Jul 3, 2024

riedgar-ms commented Jul 10, 2024

hudson-ai commented Jun 25, 2024 •

edited

Loading