Add support for reasoning in the UI #4559

FallDownTheSystem · 2025-03-09T12:33:33Z

Description

This PR adds support for both showing thinking tokens in the chat as well as controlling the reasoning effort for supported models. I'm opening this PR to get feedback from the Continue team, I'm sure there are some design decisions here that you may disagree with and want to change. I'm also okay with the Continue team taking over this branch and developing on top of it.

I'm adding comments in the PR to explain some of the changes.

Added UI settings to set reasoning effort / token budget
Added UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1
- Supports tool use with thinking and redacted_thinking message types
Improved UI scaling on smaller sizes to better fit the new Thinking button
Added support for requestOptions headers to be passed to the Anthropic provider, so that "anthropic-beta": "output-128k-2025-02-19" 128k maxOutput can be enabled.

Checklist

The relevant docs, if any, have been updated or created
- Updated yaml/json references for new config options.
The relevant tests, if any, have been updated or created
- Couldn't get tests to run on Windows 11 machine at all, regardless if it was this or the main branch. But the tests on the PR passed.
- If there are new tests that should be created, let me know.

Screenshots

This shows most of the changes.

Recording.2025-03-09.135230.mp4

Testing instructions

Test the following models:

"provider": "deepseek", "model": "deepseek-reasoner"
"provider": "openai", "model": "o3-mini"
"provider": "openai", "model": "o1"
"provider": "anthropic", "model": "claude-3-7-sonnet-20250219"
A non-thinking model like "provider": "openai", "model": "gpt-4o"

completionOptions should have schema'd definitions for reasoning_effort for the OpenAI models and thinking for Anthropic

"thinking": {
    "type": "enabled",
    "budget_tokens": 4096
}

Tests:

The Thinking button is not disabled for these models.
The thinking output is shown for Anthropic Claude 3.7 Sonnet and DeepSeek Reasoner
Thinking can only be toggled off for Sonnet 3.7
Sonnet 3.7 works along with tool use for both thinking and redacted thinking, in multi-turn conversations.
- For redacted thinking, prompt the API with this magic string ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
Test that the thinking and redacted_thinking works even if stream is set to false for Sonnet 3.7. (Tool use is not supported in Continue when not streaming afaik)
Thinking options popover shows up and shows the correct settings for Anthropic and OpenAI
The UI scales nicely even at the smallest view sizes
- This incles Tool use popover, which now stacks the options when below xs breakpoint
Test that non-thinking models still work as expected, specifically in the UI considering the numerous changes made to sessionSlice
- Because of the new Message API types, there were some type annotations added to FreeTrial, Gemini, and WatsonX core/llm's, so those should be tested
DeepSeek reasoner no longer uses promptTemplates because that limits the completions to string only, meaning both content and reasoning_content couldn't be passed down to the UI.
Test that the Reasoning tokens are still shown even for non-thinking models if you ask the AI to put some text inside think tags like:

<think>Put something here</think>
Rest of the message here

Test that json.config autocompletes the completion options correctly based on the schema definitions.

netlify · 2025-03-09T12:34:01Z

✅ Deploy Preview for continuedev canceled.

Name	Link
🔨 Latest commit	`3b930d8`
🔍 Latest deploy log	https://app.netlify.com/sites/continuedev/deploys/67d30ebdb0329300081da92e

FallDownTheSystem · 2025-03-09T12:37:28Z

core/llm/autodetect.ts

@@ -174,7 +174,8 @@ function autodetectTemplateType(model: string): TemplateType | undefined {
    lower.includes("pplx") ||
    lower.includes("gemini") ||
    lower.includes("grok") ||
-    lower.includes("moonshot")
+    lower.includes("moonshot") ||
+    lower.includes("deepseek-reasoner")


This is to avoid deepseek-reasoner using _streamComplete in core/llm/index.ts so that the ChatMessage with content and reasoning_content are both preserved.

FallDownTheSystem · 2025-03-09T12:38:08Z

core/llm/autodetect.ts

@@ -373,11 +374,45 @@ function autodetectPromptTemplates(
  return templates;
 }

+const PROVIDER_SUPPORTS_THINKING: string[] = ["anthropic", "openai", "deepseek"];
+
+const MODEL_SUPPORTS_THINKING: string[] = [


I think support for other proxy providers like OpenRouter could be added as well. I haven't looked into it.

FallDownTheSystem · 2025-03-09T12:39:09Z

core/llm/autodetect.ts

+  title: string | undefined,
+  capabilities: ModelCapability | undefined,
+): boolean {
+  if (capabilities?.thinking !== undefined) {


I'm not sure if the capabilities: thinking is necessary. Thinking support needs to be hardcoded in some places anyway, so I don't know if there's a reasonable way to try to force it to be enabled.

Yeah I think your intuition here is accurate. It's not quite as simple as image uploads.

If you're reasonably confident, I would be onboard to remove it.

FallDownTheSystem · 2025-03-09T12:41:39Z

core/llm/countTokens.ts

+        return (await encoding.encode(part.thinking ?? "")).length;
+      } else if (part.type === "redacted_thinking") {
+        // For redacted thinking, don't count any tokens
+        return 0;


"All extended thinking tokens (including redacted thinking tokens) are billed as output tokens and count toward your rate limits."

But they would have to be counted from the API's response:

"usage": { "input_tokens": 2095, "output_tokens": 503 }

I'm a little confused here - mind linking to the docs you're referencing?
Is your point that redacted_thinking is actually billed, but we're unable to properly count it here since this is for input tokens, not output?

FallDownTheSystem · 2025-03-09T12:43:46Z

core/llm/llms/FreeTrial.ts

@@ -124,7 +125,7 @@ class FreeTrial extends BaseLLM {
      }
      return {
        type: "text",
-        text: part.text,
+        text: (part as TextMessagePart).text,


The new Message API has thinking and redacted_thinking types now, so wherever the types were causing errors, I just assumed they'd be TextMessageParts as they've previously been.

FallDownTheSystem · 2025-03-09T12:45:05Z

core/llm/openaiTypeConverters.ts

+import { ChatMessage, CompletionOptions, TextMessagePart } from "..";
+
+// Extend OpenAI API types to support DeepSeek reasoning_content field
+interface DeepSeekDelta {


The types for OpenAI's messages are imported from an external library, so to support DeepSeek's reasoning_content I needed to create those interfaces elsewhere. I'm not sure if this is the best place for them, but it works.

FallDownTheSystem · 2025-03-09T12:45:49Z

core/util/messageContent.ts

@@ -17,12 +17,19 @@ export function stripImages(messageContent: MessageContent): string {
    .join("\n");
 }

+export function stripThinking(content: string): string {


think tags are handled differently now. They're always included in the message, but stripped from the UI

FallDownTheSystem · 2025-03-09T12:48:56Z

docs/docs/json-reference.md

@@ -36,6 +36,8 @@ Each model has specific configuration options tailored to its provider and funct
 - `engine`: Engine for Azure OpenAI requests.
 - `capabilities`: Override auto-detected capabilities:
  - `uploadImage`: Boolean indicating if the model supports image uploads.
+  - `tools`: Boolean indicating if the model supports tool use.


Tools was missing so I added that along with the new thinking capability

FallDownTheSystem · 2025-03-09T12:49:26Z

docs/docs/json-reference.md

@@ -59,6 +61,19 @@ Example:
      "title": "GPT-4o",
      "provider": "openai",
      "apiKey": "<YOUR_OPENAI_API_KEY>"
+    },
+    {


A new example showcasing a model with thinking capabilities

Appreciate the docs update 👌

FallDownTheSystem · 2025-03-09T12:51:50Z

gui/src/components/Layout.tsx

@@ -284,6 +285,8 @@ const Layout = () => {
          />

          <GridDiv className="">
+            {/* Initialize model-specific settings when model changes */}
+            <ModelSettingsInitializer />


Kind of a hack. I needed the UI to fetch the reasoning_effort and thinking options on the initial load, so that they get set in the UI based on the user's config, but so that the user could still change them without changing the config. AI generated this and put them here. It worked but it might be a silly place to do something like this.

Thanks for the explanation!

I think what we should do is just call useModelThinkingSettings(); from Chat.tsx. I think it should achieve the same outcome without adding an unnecessary UI component here.

FallDownTheSystem · 2025-03-09T12:52:22Z

gui/src/components/StepContainer/StepContainer.tsx


            <StyledMarkdownPreview
              isRenderingInStepContainer
-              source={stripImages(props.item.message.content)}
+              source={renderChatMessage(props.item.message)}


renderChatMessage calls stripImages but now also strips think tags.

FallDownTheSystem · 2025-03-09T12:53:09Z

gui/src/components/mainInput/InputToolbar.tsx

@@ -106,123 +119,123 @@ function InputToolbar(props: InputToolbarProps) {
      <StyledDiv
        isHidden={props.hidden}
        onClick={props.onClick}
-        className="find-widget-skip flex"
+        className="find-widget-skip flex flex-col"


Model selection / enter button are now on their own row, to make room for the rest of the buttons.

This is a good idea, the toolbar was getting quite cluttered already - but, @sestinj has an incoming PR that adds a new "notch" above the input, similar to what we have in Edit mode, to address this. Amongst other changes it moves the "Tool usage" button up there. So let's keep it as is without the flex-col.

FallDownTheSystem · 2025-03-09T12:54:08Z

gui/src/components/mainInput/InputToolbar.tsx

+          <ToggleThinkingButton disabled={!thinkingSupported} />
+        </div>
+        <div className="-mb-1 flex w-full items-center gap-2 whitespace-nowrap">
+          <ModelSelect />


The model select now takes the full width of the remaining space in the container, meaning that there's no need to try to set a reasonable max width.

I think this is also unnecessary based on my comment above

FallDownTheSystem · 2025-03-09T12:57:37Z

gui/src/components/mainInput/InputToolbar/PopoverNoMoveTransition.tsx

+  return (
+    <Transition
+      show={show}
+      enter="transition duration-100 ease-out"


This is a copy of the PopoverTransition component. I made the popovers on thinking and tool use toggle buttons be relative to the Chat box instead of the buttons, so that they can be easily positioned and can fit better on small view sizes, but the scale transform can't calculate the position while scaling, causing them to jump, so I removed the scaling animation from those buttons.

FallDownTheSystem · 2025-03-09T12:57:58Z

gui/src/components/mainInput/InputToolbar/ToggleThinkingButton.tsx

Basically a copy of ToggleToolsButton

FallDownTheSystem · 2025-03-09T12:59:06Z

gui/src/components/mainInput/InputToolbar/ToggleThinkingButton.tsx

+
+  // Get provider from default model
+  const provider = defaultModel?.provider || "";
+  const hasThinkingOptions = provider !== "deepseek";


There are some provider/model specific logic in here that could probably be placed somewhere else. Basically some models can't be toggled off, whilst some models don't have configuration options, so certain elements/interactions are disabled based on the provider/model.

For models that can't turn off reasoning, I think the current implementation feels a bit noisy. I'm not sure what the solution here is though - maybe with @sestinj 's new notch UI we have some more options.

FallDownTheSystem · 2025-03-09T13:01:22Z

gui/src/components/mainInput/InputToolbar/ToggleToolsButton.tsx

@@ -83,93 +83,93 @@ export default function ToolDropdown(props: ToolDropdownProps) {
        {useTools && !isDisabled && (
          <>
            <span className="hidden align-top sm:flex">Tools</span>


Mostly formatting changes seemingly from Prettier. There were a few changes like the hover events are now bound on the parent container and not the icon/content, before you could hover the edges and not see the background change.

FallDownTheSystem · 2025-03-09T13:02:37Z

gui/src/components/modelSelection/ModelSelect.tsx

        <StyledListboxButton
          data-testid="model-select-button"
          ref={buttonRef}
          className="h-[18px] overflow-hidden"
          style={{ padding: 0 }}
          onClick={calculatePosition}
        >
-          <div className="flex max-w-[33vw] items-center gap-0.5 text-gray-400 transition-colors duration-200">
-            <span className="truncate">
+          <div className="flex w-fit min-w-0 items-center gap-0.5 text-gray-400 transition-colors duration-200">


These changes allow the model select to scale to the parent container

FallDownTheSystem · 2025-03-09T13:03:22Z

gui/src/pages/gui/Chat.tsx

@@ -433,54 +433,60 @@ export function Chat() {
                  contextItems={item.contextItems}
                  toolCallId={item.message.toolCallId}
                />


This change makes sure that if the API returns a message along with tool use, that both are shown.

FallDownTheSystem · 2025-03-09T13:04:45Z

gui/src/redux/slices/sessionSlice.ts

        for (const message of action.payload) {
          const lastItem = state.history[state.history.length - 1];
          const lastMessage = lastItem.message;
+          // Simplified condition to keep thinking blocks and tool calls together in the same message


This is the major change. Session slice now should handle all Message API types, and keep collecting different parts to the same Assistant message, so that thinking and tool use work together properly, since Anthropic requires that you send back the thinking message along with tool use.

FallDownTheSystem · 2025-03-09T13:06:40Z

gui/src/redux/slices/sessionSlice.ts

-              if (messageContent.includes("<think>")) {
+              // Check if the message content is an array with parts
+              if (
+                Array.isArray(message.content) &&


This part is basically handling the content as parts, aka the Messages API

This all looks solid but there is just so much logic here that I think we really need to break this out into some utils that are smaller/easier to read/more testable. Again, the code around isn't the cleanest, but it would be really helpful for maintenance/debugging.

FallDownTheSystem · 2025-03-09T13:07:01Z

gui/src/redux/slices/sessionSlice.ts

+                }
+
+                // For other content types, use renderChatMessage
+                const messageContent = renderChatMessage(message);


This part handles what's more typical of OpenAI / DeepSeeks APIs

FallDownTheSystem · 2025-03-09T13:07:43Z

gui/src/redux/slices/sessionSlice.ts

+                const fullContent = lastMessage.content as string;
+
+                // If we find <think> tags, extract the content for the reasoning field
+                if (


Lastly we handle think tags

core/llm/llms/Anthropic.ts

FallDownTheSystem · 2025-03-09T13:11:49Z

core/llm/llms/Anthropic.ts

+    };
+
+    // Handle the special case for anthropic-beta
+    this.setBetaHeaders(headers, shouldCacheSystemMessage);


This change allows the config to add headers to the request and intelligently merge them with beta headers that continue adds for caching.

FallDownTheSystem · 2025-03-10T07:25:23Z

Resolves #4339

Add UI settings to set reasoning effort / token budget Add UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1

Fix thinking icon color not switching back to gray

Patrick-Erichsen

Hey @FallDownTheSystem , apologies for the slow review here.

This is an awesome contribution! I've been curious to play around with different thinking efforts but haven't been able to since we're missing this feature, so it was cool to finally get to try it out while testing.

Things look solid overall but my main concern with merging this is that the logic it touches it quite sensitive, and there is a lot of new logic being introduced here that is in quite large functions. I mentioned this in a few comments but the existing codebase in these areas wasn't that clean to begin with, but I think given that nobody on the team will be familiar with this new logic, it's important to make sure it's especially maintainable.

Lastly, I'd recommend just ignoring the merge conflicts any UI changes for now. There will be more conflicts occur because of @sestinj 's upcoming PR, and I think we'll want to update the UI in this PR based on those changes.

Feel free to DM me/Nate on Discord if you'd prefer to go back and forth more easily 👍

Patrick-Erichsen · 2025-03-22T02:36:47Z

core/index.d.ts

@@ -360,6 +375,7 @@ export interface UserChatMessage {
 export interface AssistantChatMessage {
  role: "assistant";
  content: MessageContent;
+  reasoning_content?: string;


Suggested change

reasoning_content?: string;

reasoningContent?: string;

After going through this PR I realized that these properties are snake_case to match what the Anthropic API is expecting - however, I'd still prefer to keep them camelCase in our TS interfaces, both here and elsewhere in the PR.

Patrick-Erichsen · 2025-03-22T02:38:35Z

core/index.d.ts

+    type: "enabled" | "disabled";
+    budget_tokens?: number;
+  };
+  reasoning_effort?: "high" | "medium" | "low";


Could we make this into an enum instead? Bit of a nitpick but will make any future refactors easier.

Patrick-Erichsen · 2025-03-22T02:40:21Z

core/llm/autodetect.ts

@@ -373,11 +374,45 @@ function autodetectPromptTemplates(
  return templates;
 }

+const PROVIDER_SUPPORTS_THINKING: string[] = ["anthropic", "openai", "deepseek"];


Could we pull this logic out into a separate file, e.g. core/llm/thinking.ts? Bit of a nitpick/we should have done this with other parts of this file, but trying to tidy up the codebase more recently.

Patrick-Erichsen · 2025-03-22T02:41:55Z

core/llm/autodetect.ts

+  title: string | undefined,
+  capabilities: ModelCapability | undefined,
+): boolean {
+  if (capabilities?.thinking !== undefined) {


Yeah I think your intuition here is accurate. It's not quite as simple as image uploads.

If you're reasonably confident, I would be onboard to remove it.

Patrick-Erichsen · 2025-03-22T02:43:56Z

core/llm/countTokens.ts

+        return (await encoding.encode(part.thinking ?? "")).length;
+      } else if (part.type === "redacted_thinking") {
+        // For redacted thinking, don't count any tokens
+        return 0;


I'm a little confused here - mind linking to the docs you're referencing?
Is your point that redacted_thinking is actually billed, but we're unable to properly count it here since this is for input tokens, not output?

Patrick-Erichsen · 2025-03-22T03:31:30Z

gui/src/components/mainInput/InputToolbar/ToggleThinkingButton.tsx

+                                  }}
+                                >
+                                  <div
+                                    className={`h-2 w-2 rounded-full ${level === "low" ? "bg-red-400" : level === "medium" ? "bg-yellow-400" : "bg-green-400"} mr-2`}


Similarly here, let's just use text-lightgray

Patrick-Erichsen · 2025-03-22T03:35:05Z

gui/src/components/mainInput/InputToolbar/ToggleThinkingButton.tsx

+
+  // Get provider from default model
+  const provider = defaultModel?.provider || "";
+  const hasThinkingOptions = provider !== "deepseek";


For models that can't turn off reasoning, I think the current implementation feels a bit noisy. I'm not sure what the solution here is though - maybe with @sestinj 's new notch UI we have some more options.

Patrick-Erichsen · 2025-03-22T03:42:06Z

gui/src/hooks/useModelThinkingSettings.ts

+ * It runs at a high level in the component tree to ensure model-specific 
+ * settings are loaded only once per model selection.
+ */
+export const useModelThinkingSettings = () => {


All of this logic should probably live in a thunk that responds to changes in the config.defaultModelTitle property. Then we can get rid of both this hook and the UI component that triggers it.

Patrick-Erichsen · 2025-03-22T03:46:42Z

gui/src/redux/slices/uiSlice.ts

+      budgetTokens: number; // Min 1024, max is below maxTokens
+    };
+    openai: {
+      reasoningEffort: "low" | "medium" | "high";


We should probably re-use the types from core here

Additionally, I don't think we want to be too provider specific here. If there a way we could structure this instead to just have both budgetTokens and reasoningEffort as options, or some other approach that isn't opinionated/structured according to particular providers?

Patrick-Erichsen · 2025-03-22T04:03:44Z

gui/src/redux/slices/sessionSlice.ts

-              if (messageContent.includes("<think>")) {
+              // Check if the message content is an array with parts
+              if (
+                Array.isArray(message.content) &&


This all looks solid but there is just so much logic here that I think we really need to break this out into some utils that are smaller/easier to read/more testable. Again, the code around isn't the cleanest, but it would be really helpful for maintenance/debugging.

FallDownTheSystem commented Mar 9, 2025

View reviewed changes

gui/src/components/mainInput/InputToolbar/ToggleThinkingButton.tsx

Copy link

Contributor Author

FallDownTheSystem Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically a copy of ToggleToolsButton

FallDownTheSystem commented Mar 9, 2025

View reviewed changes

core/llm/llms/Anthropic.ts Outdated Show resolved Hide resolved

FallDownTheSystem commented Mar 9, 2025

View reviewed changes

FallDownTheSystem force-pushed the reasoning-tokens branch from 20dfb8a to 09c32b7 Compare March 10, 2025 07:41

FallDownTheSystem mentioned this pull request Mar 10, 2025

Handle extended thinking with Anthropic provider, process streaming thinking blocks and show them in the output tab #4426

Merged

2 tasks

Patrick-Erichsen self-requested a review March 10, 2025 16:46

FallDownTheSystem added 2 commits March 13, 2025 18:34

Add support for reasoning in the UI

409761d

Add UI settings to set reasoning effort / token budget Add UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1

Remove unused code

a7a153c

Fix thinking icon color not switching back to gray

FallDownTheSystem force-pushed the reasoning-tokens branch from 09c32b7 to a7a153c Compare March 13, 2025 16:35

Improve default settings for thinking

3b930d8

SpeedCrash100 mentioned this pull request Mar 16, 2025

feat: Openrouter reasoning support #4672

Closed

FallDownTheSystem marked this pull request as draft March 17, 2025 15:47

roryeckel mentioned this pull request Mar 18, 2025

o3-mini API endpoint not supported #3916

Open

3 tasks

Patrick-Erichsen requested changes Mar 22, 2025

View reviewed changes

Add support for reasoning in the UI #4559

Are you sure you want to change the base?

Add support for reasoning in the UI #4559

Conversation

FallDownTheSystem commented Mar 9, 2025 • edited Loading

Description

Checklist

Screenshots

Testing instructions

netlify bot commented Mar 9, 2025 • edited Loading

✅ Deploy Preview for continuedev canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FallDownTheSystem Mar 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FallDownTheSystem Mar 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FallDownTheSystem commented Mar 10, 2025

Patrick-Erichsen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FallDownTheSystem commented Mar 9, 2025 •

edited

Loading

netlify bot commented Mar 9, 2025 •

edited

Loading

FallDownTheSystem Mar 9, 2025 •

edited

Loading

FallDownTheSystem Mar 9, 2025 •

edited

Loading

Patrick-Erichsen left a comment •

edited

Loading