-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reasoning in the UI #4559
base: main
Are you sure you want to change the base?
Add support for reasoning in the UI #4559
Conversation
✅ Deploy Preview for continuedev canceled.
|
@@ -174,7 +174,8 @@ function autodetectTemplateType(model: string): TemplateType | undefined { | |||
lower.includes("pplx") || | |||
lower.includes("gemini") || | |||
lower.includes("grok") || | |||
lower.includes("moonshot") | |||
lower.includes("moonshot") || | |||
lower.includes("deepseek-reasoner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid deepseek-reasoner using _streamComplete in core/llm/index.ts
so that the ChatMessage with content
and reasoning_content
are both preserved.
@@ -373,11 +374,45 @@ function autodetectPromptTemplates( | |||
return templates; | |||
} | |||
|
|||
const PROVIDER_SUPPORTS_THINKING: string[] = ["anthropic", "openai", "deepseek"]; | |||
|
|||
const MODEL_SUPPORTS_THINKING: string[] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think support for other proxy providers like OpenRouter could be added as well. I haven't looked into it.
title: string | undefined, | ||
capabilities: ModelCapability | undefined, | ||
): boolean { | ||
if (capabilities?.thinking !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the capabilities: thinking is necessary. Thinking support needs to be hardcoded in some places anyway, so I don't know if there's a reasonable way to try to force it to be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think your intuition here is accurate. It's not quite as simple as image uploads.
If you're reasonably confident, I would be onboard to remove it.
return (await encoding.encode(part.thinking ?? "")).length; | ||
} else if (part.type === "redacted_thinking") { | ||
// For redacted thinking, don't count any tokens | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"All extended thinking tokens (including redacted thinking tokens) are billed as output tokens and count toward your rate limits."
But they would have to be counted from the API's response:
"usage": {
"input_tokens": 2095,
"output_tokens": 503
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here - mind linking to the docs you're referencing?
Is your point that redacted_thinking
is actually billed, but we're unable to properly count it here since this is for input tokens, not output?
@@ -124,7 +125,7 @@ class FreeTrial extends BaseLLM { | |||
} | |||
return { | |||
type: "text", | |||
text: part.text, | |||
text: (part as TextMessagePart).text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new Message API has thinking
and redacted_thinking
types now, so wherever the types were causing errors, I just assumed they'd be TextMessagePart
s as they've previously been.
import { ChatMessage, CompletionOptions, TextMessagePart } from ".."; | ||
|
||
// Extend OpenAI API types to support DeepSeek reasoning_content field | ||
interface DeepSeekDelta { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types for OpenAI's messages are imported from an external library, so to support DeepSeek's reasoning_content
I needed to create those interfaces elsewhere. I'm not sure if this is the best place for them, but it works.
@@ -17,12 +17,19 @@ export function stripImages(messageContent: MessageContent): string { | |||
.join("\n"); | |||
} | |||
|
|||
export function stripThinking(content: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think tags are handled differently now. They're always included in the message, but stripped from the UI
docs/docs/json-reference.md
Outdated
@@ -36,6 +36,8 @@ Each model has specific configuration options tailored to its provider and funct | |||
- `engine`: Engine for Azure OpenAI requests. | |||
- `capabilities`: Override auto-detected capabilities: | |||
- `uploadImage`: Boolean indicating if the model supports image uploads. | |||
- `tools`: Boolean indicating if the model supports tool use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tools was missing so I added that along with the new thinking capability
@@ -59,6 +61,19 @@ Example: | |||
"title": "GPT-4o", | |||
"provider": "openai", | |||
"apiKey": "<YOUR_OPENAI_API_KEY>" | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new example showcasing a model with thinking capabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the docs update 👌
@@ -284,6 +285,8 @@ const Layout = () => { | |||
/> | |||
|
|||
<GridDiv className=""> | |||
{/* Initialize model-specific settings when model changes */} | |||
<ModelSettingsInitializer /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of a hack. I needed the UI to fetch the reasoning_effort
and thinking
options on the initial load, so that they get set in the UI based on the user's config, but so that the user could still change them without changing the config. AI generated this and put them here. It worked but it might be a silly place to do something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
I think what we should do is just call useModelThinkingSettings();
from Chat.tsx
. I think it should achieve the same outcome without adding an unnecessary UI component here.
|
||
<StyledMarkdownPreview | ||
isRenderingInStepContainer | ||
source={stripImages(props.item.message.content)} | ||
source={renderChatMessage(props.item.message)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renderChatMessage calls stripImages but now also strips think tags.
@@ -106,123 +119,123 @@ function InputToolbar(props: InputToolbarProps) { | |||
<StyledDiv | |||
isHidden={props.hidden} | |||
onClick={props.onClick} | |||
className="find-widget-skip flex" | |||
className="find-widget-skip flex flex-col" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model selection / enter button are now on their own row, to make room for the rest of the buttons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, the toolbar was getting quite cluttered already - but, @sestinj has an incoming PR that adds a new "notch" above the input, similar to what we have in Edit mode, to address this. Amongst other changes it moves the "Tool usage" button up there. So let's keep it as is without the flex-col
.
<ToggleThinkingButton disabled={!thinkingSupported} /> | ||
</div> | ||
<div className="-mb-1 flex w-full items-center gap-2 whitespace-nowrap"> | ||
<ModelSelect /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model select now takes the full width of the remaining space in the container, meaning that there's no need to try to set a reasonable max width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also unnecessary based on my comment above
return ( | ||
<Transition | ||
show={show} | ||
enter="transition duration-100 ease-out" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy of the PopoverTransition
component. I made the popovers on thinking and tool use toggle buttons be relative to the Chat box instead of the buttons, so that they can be easily positioned and can fit better on small view sizes, but the scale transform can't calculate the position while scaling, causing them to jump, so I removed the scaling animation from those buttons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically a copy of ToggleToolsButton
|
||
// Get provider from default model | ||
const provider = defaultModel?.provider || ""; | ||
const hasThinkingOptions = provider !== "deepseek"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some provider/model specific logic in here that could probably be placed somewhere else. Basically some models can't be toggled off, whilst some models don't have configuration options, so certain elements/interactions are disabled based on the provider/model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For models that can't turn off reasoning, I think the current implementation feels a bit noisy. I'm not sure what the solution here is though - maybe with @sestinj 's new notch UI we have some more options.
@@ -83,93 +83,93 @@ export default function ToolDropdown(props: ToolDropdownProps) { | |||
{useTools && !isDisabled && ( | |||
<> | |||
<span className="hidden align-top sm:flex">Tools</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly formatting changes seemingly from Prettier. There were a few changes like the hover events are now bound on the parent container and not the icon/content, before you could hover the edges and not see the background change.
<StyledListboxButton | ||
data-testid="model-select-button" | ||
ref={buttonRef} | ||
className="h-[18px] overflow-hidden" | ||
style={{ padding: 0 }} | ||
onClick={calculatePosition} | ||
> | ||
<div className="flex max-w-[33vw] items-center gap-0.5 text-gray-400 transition-colors duration-200"> | ||
<span className="truncate"> | ||
<div className="flex w-fit min-w-0 items-center gap-0.5 text-gray-400 transition-colors duration-200"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes allow the model select to scale to the parent container
@@ -433,54 +433,60 @@ export function Chat() { | |||
contextItems={item.contextItems} | |||
toolCallId={item.message.toolCallId} | |||
/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sure that if the API returns a message along with tool use, that both are shown.
for (const message of action.payload) { | ||
const lastItem = state.history[state.history.length - 1]; | ||
const lastMessage = lastItem.message; | ||
// Simplified condition to keep thinking blocks and tool calls together in the same message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the major change. Session slice now should handle all Message API types, and keep collecting different parts to the same Assistant message, so that thinking and tool use work together properly, since Anthropic requires that you send back the thinking message along with tool use.
if (messageContent.includes("<think>")) { | ||
// Check if the message content is an array with parts | ||
if ( | ||
Array.isArray(message.content) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is basically handling the content as parts, aka the Messages API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks solid but there is just so much logic here that I think we really need to break this out into some utils that are smaller/easier to read/more testable. Again, the code around isn't the cleanest, but it would be really helpful for maintenance/debugging.
} | ||
|
||
// For other content types, use renderChatMessage | ||
const messageContent = renderChatMessage(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part handles what's more typical of OpenAI / DeepSeeks APIs
const fullContent = lastMessage.content as string; | ||
|
||
// If we find <think> tags, extract the content for the reasoning field | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly we handle think tags
}; | ||
|
||
// Handle the special case for anthropic-beta | ||
this.setBetaHeaders(headers, shouldCacheSystemMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows the config to add headers to the request and intelligently merge them with beta headers that continue adds for caching.
Resolves #4339 |
20dfb8a
to
09c32b7
Compare
Add UI settings to set reasoning effort / token budget Add UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1
Fix thinking icon color not switching back to gray
09c32b7
to
a7a153c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @FallDownTheSystem , apologies for the slow review here.
This is an awesome contribution! I've been curious to play around with different thinking efforts but haven't been able to since we're missing this feature, so it was cool to finally get to try it out while testing.
Things look solid overall but my main concern with merging this is that the logic it touches it quite sensitive, and there is a lot of new logic being introduced here that is in quite large functions. I mentioned this in a few comments but the existing codebase in these areas wasn't that clean to begin with, but I think given that nobody on the team will be familiar with this new logic, it's important to make sure it's especially maintainable.
Lastly, I'd recommend just ignoring the merge conflicts any UI changes for now. There will be more conflicts occur because of @sestinj 's upcoming PR, and I think we'll want to update the UI in this PR based on those changes.
Feel free to DM me/Nate on Discord if you'd prefer to go back and forth more easily 👍
@@ -360,6 +375,7 @@ export interface UserChatMessage { | |||
export interface AssistantChatMessage { | |||
role: "assistant"; | |||
content: MessageContent; | |||
reasoning_content?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasoning_content?: string; | |
reasoningContent?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After going through this PR I realized that these properties are snake_case
to match what the Anthropic API is expecting - however, I'd still prefer to keep them camelCase
in our TS interfaces, both here and elsewhere in the PR.
type: "enabled" | "disabled"; | ||
budget_tokens?: number; | ||
}; | ||
reasoning_effort?: "high" | "medium" | "low"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this into an enum instead? Bit of a nitpick but will make any future refactors easier.
@@ -373,11 +374,45 @@ function autodetectPromptTemplates( | |||
return templates; | |||
} | |||
|
|||
const PROVIDER_SUPPORTS_THINKING: string[] = ["anthropic", "openai", "deepseek"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pull this logic out into a separate file, e.g. core/llm/thinking.ts
? Bit of a nitpick/we should have done this with other parts of this file, but trying to tidy up the codebase more recently.
title: string | undefined, | ||
capabilities: ModelCapability | undefined, | ||
): boolean { | ||
if (capabilities?.thinking !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think your intuition here is accurate. It's not quite as simple as image uploads.
If you're reasonably confident, I would be onboard to remove it.
return (await encoding.encode(part.thinking ?? "")).length; | ||
} else if (part.type === "redacted_thinking") { | ||
// For redacted thinking, don't count any tokens | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here - mind linking to the docs you're referencing?
Is your point that redacted_thinking
is actually billed, but we're unable to properly count it here since this is for input tokens, not output?
}} | ||
> | ||
<div | ||
className={`h-2 w-2 rounded-full ${level === "low" ? "bg-red-400" : level === "medium" ? "bg-yellow-400" : "bg-green-400"} mr-2`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, let's just use text-lightgray
|
||
// Get provider from default model | ||
const provider = defaultModel?.provider || ""; | ||
const hasThinkingOptions = provider !== "deepseek"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For models that can't turn off reasoning, I think the current implementation feels a bit noisy. I'm not sure what the solution here is though - maybe with @sestinj 's new notch UI we have some more options.
* It runs at a high level in the component tree to ensure model-specific | ||
* settings are loaded only once per model selection. | ||
*/ | ||
export const useModelThinkingSettings = () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this logic should probably live in a thunk that responds to changes in the config.defaultModelTitle
property. Then we can get rid of both this hook and the UI component that triggers it.
budgetTokens: number; // Min 1024, max is below maxTokens | ||
}; | ||
openai: { | ||
reasoningEffort: "low" | "medium" | "high"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably re-use the types from core
here
Additionally, I don't think we want to be too provider specific here. If there a way we could structure this instead to just have both budgetTokens
and reasoningEffort
as options, or some other approach that isn't opinionated/structured according to particular providers?
if (messageContent.includes("<think>")) { | ||
// Check if the message content is an array with parts | ||
if ( | ||
Array.isArray(message.content) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks solid but there is just so much logic here that I think we really need to break this out into some utils that are smaller/easier to read/more testable. Again, the code around isn't the cleanest, but it would be really helpful for maintenance/debugging.
Description
This PR adds support for both showing thinking tokens in the chat as well as controlling the reasoning effort for supported models. I'm opening this PR to get feedback from the Continue team, I'm sure there are some design decisions here that you may disagree with and want to change. I'm also okay with the Continue team taking over this branch and developing on top of it.
I'm adding comments in the PR to explain some of the changes.
thinking
andredacted_thinking
message types"anthropic-beta": "output-128k-2025-02-19"
128k maxOutput can be enabled.Checklist
Screenshots
This shows most of the changes.
Recording.2025-03-09.135230.mp4
Testing instructions
Test the following models:
"provider": "deepseek", "model": "deepseek-reasoner"
"provider": "openai", "model": "o3-mini"
"provider": "openai", "model": "o1"
"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"
"provider": "openai", "model": "gpt-4o"
completionOptions
should have schema'd definitions for reasoning_effort for the OpenAI models andthinking
for AnthropicTests:
ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
stream
is set tofalse
for Sonnet 3.7. (Tool use is not supported in Continue when not streaming afaik)xs
breakpointsessionSlice
FreeTrial
,Gemini
, andWatsonX
core/llm's, so those should be testedstring
only, meaning bothcontent
andreasoning_content
couldn't be passed down to the UI.