chat: surface reasoning content and finish_reason#9
Merged
Conversation
Reasoning models (DeepSeek-R1, Moonshot Kimi, Anthropic extended thinking, OpenRouter) put their chain-of-thought in a separate field that was previously being dropped on the floor. When the model burned its budget on hidden reasoning, callers saw content="" with no indication anything went wrong. * Add $thinking and $finish_reason to the chat() return list. Normalized across providers: reasoning_content (DeepSeek/Moonshot/ vLLM/SGLang) and reasoning (OpenRouter) for OAI-compatible; thinking blocks scanned out of content[] for Anthropic. stop_reason mapped to OpenAI vocabulary (end_turn -> stop, max_tokens -> length). * Warn on the silent-truncation case: finish_reason == "length" with empty content but populated thinking. Actionable signal is "raise max_tokens". * Streaming path captures reasoning deltas and the trailing finish_reason on a best-effort basis. * Unit tests for both helpers (.normalize_anthropic_stop_reason, .warn_if_truncated).
CI was failing on a stale claude-3-5-sonnet-latest assertion left over from before the default was bumped to claude-sonnet-4-6.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reasoning models (DeepSeek-R1, Moonshot Kimi k2.5/k2.6, Anthropic extended thinking, OpenRouter) return their chain-of-thought in a separate field that
chat()was silently dropping. When the model burned its entiremax_tokensbudget on hidden reasoning, callers sawcontent == ""with no indication anything went wrong.This was hit live: tiny's morning briefing started returning empty Matrix messages on 2026-05-07 because kimi-k2.6 with
max_tokens: 800ran out of tokens mid-reasoning and never emitted a user-facing answer.Changes
chat()now returns$thinkingand$finish_reasonin addition to existing fields. Additive; existing callers reading$content,$model,$usage,$historyare unchanged.$thinkingis normalized across providers:message$reasoning_content(DeepSeek/Moonshot/vLLM/SGLang), falling back tomessage$reasoning(OpenRouter) on OAI-compatible endpoints;content[]blocks wheretype == "thinking"for Anthropic.$finish_reasonis normalized to OpenAI vocabulary across providers. Anthropic'send_turn->stop,max_tokens->length. Other values (tool_use,stop_sequence, etc.) pass through.warning()whenfinish_reason == "length"and content is empty but thinking is populated. The actionable signal is "raise max_tokens"; previously this was silent.reasoning_contentdeltas and the trailingfinish_reasonon a best-effort basis.Verified live
11 new unit tests cover
.normalize_anthropic_stop_reason()and.warn_if_truncated()(pure functions, no API calls).Pre-existing issues not addressed here
inst/tinytest/test_config.Rstill expectsclaude-3-5-sonnet-latest; needs updating toclaude-sonnet-4-6sinced07b4b6. Failing before this PR, still failing on this branch. Separate fix.Test plan
tinytest::run_test_file("inst/tinytest/test_reasoning.R")passes (11/11)