Simplify chat message streaming in chat template #6120

MackinnonBuck · 2025-03-15T02:03:42Z

This PR is a proposal for improved handling of handle chat message streaming in the chat template.

Recently, the IChatClient interface was updated to discourage chat client implementations from manipulating the provided chat history. This created two problems for the chat template:

The chat template was relying on messages containing FunctionInvocationContent being automatically added to the chat history by FunctionInvokingChatClient. This no longer happens, so the chat template now has to manually augment its chat history with those messages while reading the streaming response. Discussion here.
However, chat clients may not expect the caller to manipulate the message list while producing a streaming response. This requires the chat template to clone the message list before passing it to the chat client. Discussion here.

Problem 2 goes away if the chat template stores in-progress messages separately from the "committed" chat history. It just renders two message lists in the UI: the in-progress list and the committed list.

Problem 1 then goes away if you change the in-progress list to store ChatResponseUpdates directly rather than mapping them to ChatMessages. When streaming ends, the second list gets added to the first list in the form of chat messages. Separate UI logic can decide how to display content from ChatResponseUpdates. This is the approach that this PR demonstrates.

Microsoft Reviewers: Open in CodeFlow

SteveSandersonMS · 2025-03-17T10:53:25Z

Thanks @MackinnonBuck!

This is certainly an improvement to clarity in some respects, however if I'm understanding it correctly, it also appears to come at a significant perf cost. Perhaps it's possible to resolve that while retaining some of the clarity improvements?

The previous approach used an admittedly unusual pub-sub mechanism so that, for each streaming chunk, only the single ChatMessageItem displaying that chunk would re-render. Everything else was left alone. But in the approach in this PR, every incoming chunk causes the whole existing conversation to re-render in full, producing an O(N^2) effect on the number of renders.

I experimented by starting with the message "what's in the kit" and then clicking the first suggestion until I produced a conversation with 10 user messages. I think this is a realistic length of conversation, producing 1000-2000 streaming chunks.

With the previous approach, ChatMessageItem rendered 1284 times
With the proposed approach, ChatMessageItem rendered 25,696 times (and this grows quadratically as the conversation grows)

Maybe it would be possible to get some of the benefits of the new approach without the drawbacks by inlining ChatAssistantContentItem back into ChatMessageItem (so that there's no need for differentiating Content and Text parameters there) and retaining the pub-sub signalling from Chat to ChatMessageItem, but retaining the new mechanism of tracking currentResponseUpdates separately from the committed history.

SteveSandersonMS · 2025-03-17T10:59:54Z

To be honest, even the previous approach's 1284 renders (for a 10-user-message conversation) is pushing at the boundaries, given that ChatMessageItem isn't completely trivial to render and diff. We may want to at least think through what approach we'd use if we were determined to minimize the expense of this. For example:

We could have the <assistant-message> part be an independent component and be the subscriber in the pub-sub system, so we literally only re-render that single element per-chunk, hence the rendering and diffing overhead drops to roughly zero and the whole cost is more or less the same as just sending a websocket message from server to client on each chunk
Or, we could bypass Blazor rendering entirely on a per-chunk basis, and have it use JS interop to trigger an update to the text inside the <assistant-message>. I think eShopSupport does that, or did at some point.

Of course, this has to be balanced against the need for this template to work for newcomers, and hence limit the amount of code and sophistication in the rendering mechanism. Not saying a more general redesign should happen in this PR, just tracking thoughts about it.

MackinnonBuck · 2025-03-17T18:44:03Z

Thanks for the insight, @SteveSandersonMS!

This is certainly an improvement to clarity in some respects, however if I'm understanding it correctly, it also appears to come at a significant perf cost. Perhaps it's possible to resolve that while retaining some of the clarity improvements?

The previous approach used an admittedly unusual pub-sub mechanism so that, for each streaming chunk, only the single ChatMessageItem displaying that chunk would re-render. Everything else was left alone. But in the approach in this PR, every incoming chunk causes the whole existing conversation to re-render in full, producing an O(N^2) effect on the number of renders.

Yeah, this was something that crossed my mind when making these changes, but it was getting a little late by the time I put out the PR and didn't get around to thinking about it further. Perhaps I should have called that out. I appreciate that you took some measurements to quantify the perf difference, and I agree this should be addressed before we take this change (if we decide to do so).

The JS interop approach sounds interesting - I might try out something like that, and if it turns out to be too sophisticated for template code, I'll add back the pub/sub mechanism that existed previously. However, I probably won't get to that for a couple days while addressing other high-priority items.

Render chat response updates directly

0ca5bda

MackinnonBuck self-assigned this Mar 15, 2025

MackinnonBuck requested a review from a team as a code owner March 15, 2025 02:03

github-actions bot added the area-ai-templates Microsoft.Extensions.AI.Templates label Mar 15, 2025

Merge branch 'main' into mbuck/simplify-chat-streaming

f71526d

MackinnonBuck force-pushed the mbuck/simplify-chat-streaming branch from 8929016 to f71526d Compare March 17, 2025 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify chat message streaming in chat template #6120

Simplify chat message streaming in chat template #6120

MackinnonBuck commented Mar 15, 2025 •

edited by dotnet-policy-service bot

Loading

SteveSandersonMS commented Mar 17, 2025 •

edited

Loading

SteveSandersonMS commented Mar 17, 2025 •

edited

Loading

MackinnonBuck commented Mar 17, 2025

Simplify chat message streaming in chat template #6120

Are you sure you want to change the base?

Simplify chat message streaming in chat template #6120

Conversation

MackinnonBuck commented Mar 15, 2025 • edited by dotnet-policy-service bot Loading

Microsoft Reviewers: Open in CodeFlow

SteveSandersonMS commented Mar 17, 2025 • edited Loading

SteveSandersonMS commented Mar 17, 2025 • edited Loading

MackinnonBuck commented Mar 17, 2025

MackinnonBuck commented Mar 15, 2025 •

edited by dotnet-policy-service bot

Loading

SteveSandersonMS commented Mar 17, 2025 •

edited

Loading

SteveSandersonMS commented Mar 17, 2025 •

edited

Loading