fix(ai): strip reasoning blocks before sending history to LLM#282
Open
sk5268 wants to merge 1 commit into
Open
fix(ai): strip reasoning blocks before sending history to LLM#282sk5268 wants to merge 1 commit into
sk5268 wants to merge 1 commit into
Conversation
Sending reasoning/thinking blocks back in the message history was causing API failures, particularly noticeable with all Cerebras models. It's generally not standard practice to send reasoning blocks back anyway, since most providers expect them to be omitted from subsequent turns. Also moved the filter to run *before* the message compaction step. Previously, reasoning tokens were eating up the context budget and causing older messages to drop prematurely, only for the reasoning blocks to get stripped out right after.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Strips reasoning blocks from the message history before sending it back to the LLM, and moves this filter to run prior to message compaction.
Why
Echoing reasoning/thinking blocks back in the history was causing API failures, particularly with Cerebras models, as most providers do not accept reasoning blocks in inbound requests. Furthermore, because these blocks were previously filtered after compaction, reasoning tokens were unnecessarily eating up the context budget and causing older, valid messages to be dropped prematurely.
How
.map()+.filter()pass over the converted history that removes all content parts withtype: "reasoning"from assistant messages before they reach the LLM.compactModelMessagesDetailedso compaction only counts the actual tokens being sent.Testing
pnpm exec tsc --noEmitcleansrc-tauri/)cargo checkcleanpnpm tauri devScreenshots / GIFs