-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify chat message streaming in chat template #6120
base: main
Are you sure you want to change the base?
Conversation
Thanks @MackinnonBuck! This is certainly an improvement to clarity in some respects, however if I'm understanding it correctly, it also appears to come at a significant perf cost. Perhaps it's possible to resolve that while retaining some of the clarity improvements? The previous approach used an admittedly unusual pub-sub mechanism so that, for each streaming chunk, only the single I experimented by starting with the message "what's in the kit" and then clicking the first suggestion until I produced a conversation with 10 user messages. I think this is a realistic length of conversation, producing 1000-2000 streaming chunks.
Maybe it would be possible to get some of the benefits of the new approach without the drawbacks by inlining |
To be honest, even the previous approach's 1284 renders (for a 10-user-message conversation) is pushing at the boundaries, given that
Of course, this has to be balanced against the need for this template to work for newcomers, and hence limit the amount of code and sophistication in the rendering mechanism. Not saying a more general redesign should happen in this PR, just tracking thoughts about it. |
Thanks for the insight, @SteveSandersonMS!
Yeah, this was something that crossed my mind when making these changes, but it was getting a little late by the time I put out the PR and didn't get around to thinking about it further. Perhaps I should have called that out. I appreciate that you took some measurements to quantify the perf difference, and I agree this should be addressed before we take this change (if we decide to do so). The JS interop approach sounds interesting - I might try out something like that, and if it turns out to be too sophisticated for template code, I'll add back the pub/sub mechanism that existed previously. However, I probably won't get to that for a couple days while addressing other high-priority items. |
8929016
to
f71526d
Compare
This PR is a proposal for improved handling of handle chat message streaming in the chat template.
Recently, the
IChatClient
interface was updated to discourage chat client implementations from manipulating the provided chat history. This created two problems for the chat template:FunctionInvocationContent
being automatically added to the chat history byFunctionInvokingChatClient
. This no longer happens, so the chat template now has to manually augment its chat history with those messages while reading the streaming response. Discussion here.Problem 2 goes away if the chat template stores in-progress messages separately from the "committed" chat history. It just renders two message lists in the UI: the in-progress list and the committed list.
Problem 1 then goes away if you change the in-progress list to store
ChatResponseUpdate
s directly rather than mapping them toChatMessage
s. When streaming ends, the second list gets added to the first list in the form of chat messages. Separate UI logic can decide how to display content fromChatResponseUpdate
s. This is the approach that this PR demonstrates.Microsoft Reviewers: Open in CodeFlow