-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Structured model state #929
base: main
Are you sure you want to change the base?
Conversation
The existing implementation wouldn't properly pop the selected role off the opened_blocks dict, so roles could be closed twice. We'd need a stateful implementation that could modify the Model object more directly... Removing it for now
@@ -966,10 +966,6 @@ def _re_with_temperature(grammar, temperature, visited_set): | |||
# return ModelVariable(name) | |||
|
|||
|
|||
def active_role_end() -> ModelVariable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of active_role_end
will actually produce the closing tag twice since it is essentially a stateless grammar and can't pop ContextBlock
s off of a model. Furthermore, generating active_role_end requires the model to produce special tags which will be hidden from future prompts... A better implementation of this may exist, but I'm not quite sure yet. Deleting it for now
# Import in function to guard against circular import | ||
from ..library._role import RoleBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that some may object to the import inside of a function... Is anyone else better at resolving issues with circular imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we haven't tried turning on ruff
or flake8
yet.....
def _html(self) -> str: | ||
raise NotImplementedError | ||
|
||
def __str__(self) -> str: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently undecided as to whether these should be methods or if these should just be data containers with external "Handler" classes to produce HTML, strings for prompts, pretty ANSI codes for terminal outputs, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely separate model state from how it gets displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the basic idea of this is great.
|
||
|
||
@dataclass(frozen=True, slots=True) | ||
class Object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want a better name....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely -- please consider this a sketch 😆
__slots__ = ["objects"] | ||
|
||
def __init__(self) -> None: | ||
self.objects: list[Object] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine there are constraints on this list - for example, a RoleCloser
must be followed by a RoleOpener
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closers may come arbitrarily far in the future, but there is logic inside the model class to ensure that they get closed... Currently we just have run-time checks for balanced openers/closers inside the chat models, and I'm not sure if we can do much better... But we may at least be able to have constraints that you can't open a role if one is already open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if there should be a nested structure - a list of Role
blocks, each of which could contain an arbitrary list of Text
, Image
etc. blocks. Or would that be overly complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought actually. On my first attempt, that felt pretty complicated, but it might not be so bad now that this skeleton is here. I'll experiment with it a bit :)
Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any. |
Thanks for taking a look @paulbkoch :) I have to think about this a bit, but my first impression is that there is nothing about structured model state that would prohibit special control sequences from the models themselves. Just thinking out loud...
The idea is just to make a logical distinction between the way that we do internal bookkeeping and how the contents of that bookkeeping are displayed to models, to ipython for formatting, etc. Is this reasonable? |
I had the same concern as @paulbkoch, but I think your reply makes sense here Hudson :). In thinking about this, I wanted to brainstorm broadly about what types of metadata/bookkeeping we may want to track -- even beyond what we have implemented today -- and what the atomic "unit" the metadata ties to is (e.g. each token, each char in the model state, each byte, etc.) A rough laundry list off the top of my head (we certainly don't need to implement all of these, but just want to brainstorm broadly):
If we get the design right here, it should get much easier to expose rich metrics to users and build alternative UIs. We could also shift over the (potentially buggy) token counting we're doing in the |
@Harsha-Nori love these ideas! I feel that all of these "logging" use cases, multimodality, chat vs. completion APIs, ... suggest that some kind of additional structure is needed (whether or not it ends up looking anything like what I drafted here). Whatever the right design is, it should probably be:
|
When it comes to performance, do bear in mind that we already have things like this hanging around: guidance/guidance/models/_openai.py Line 89 in 4f7bf9e
(other models have similar, if not quite identical code). If we make the internal model state richer, then we'll be able to generate conversations turns directly, rather than having to parse them out of a single state string. |
This PR primarily replaces the implementation of model state (
Model._state
) with a list of python objects. Currently, state is astr
with embedded "tags" that represent special embedded objects, e.g. html for formatting, images for multimodal models, etc.On the one hand, the existing implementation is "nice" because it allows adding these special objects as strings inside of normal (and even stateless) guidance functions without having do anything "dirty" like reach into private methods on
Model
-- everything is string concatenation.On the other hand, this means that this string has to be "parsed" to produce actual prompts (usually pretty straight-forwardly via
re.replace
), introspect on token probabilities, or extract image data from tags. This feels fragile as a model could easily produce strings that correspond to our tags and blow everything up. Furthermore, if we ever have a guidance function that produces "special" strings with formatting, etc. inside of a select block, we're actually asking the model to produce this special structure instead of just the actual textual content...A few TODOS:
silent
andmonospace
ModelState
that turns it into a list of{"role": "user", "message": ...
} objectsThank you @nking-1 for the help and feedback on this so far!
Note that this train of thought originated from the discussion on PR #905 for adding terminal colors. This isn't technically a blocker for that PR, but adding structured state will make that PR trivial :)