Skip to content

chat: surface reasoning content and finish_reason#9

Merged
TroyHernandez merged 2 commits into
mainfrom
reasoning-content
May 8, 2026
Merged

chat: surface reasoning content and finish_reason#9
TroyHernandez merged 2 commits into
mainfrom
reasoning-content

Conversation

@TroyHernandez
Copy link
Copy Markdown
Contributor

Summary

Reasoning models (DeepSeek-R1, Moonshot Kimi k2.5/k2.6, Anthropic extended thinking, OpenRouter) return their chain-of-thought in a separate field that chat() was silently dropping. When the model burned its entire max_tokens budget on hidden reasoning, callers saw content == "" with no indication anything went wrong.

This was hit live: tiny's morning briefing started returning empty Matrix messages on 2026-05-07 because kimi-k2.6 with max_tokens: 800 ran out of tokens mid-reasoning and never emitted a user-facing answer.

Changes

  • chat() now returns $thinking and $finish_reason in addition to existing fields. Additive; existing callers reading $content, $model, $usage, $history are unchanged.
  • $thinking is normalized across providers: message$reasoning_content (DeepSeek/Moonshot/vLLM/SGLang), falling back to message$reasoning (OpenRouter) on OAI-compatible endpoints; content[] blocks where type == "thinking" for Anthropic.
  • $finish_reason is normalized to OpenAI vocabulary across providers. Anthropic's end_turn -> stop, max_tokens -> length. Other values (tool_use, stop_sequence, etc.) pass through.
  • New warning() when finish_reason == "length" and content is empty but thinking is populated. The actionable signal is "raise max_tokens"; previously this was silent.
  • Streaming path captures reasoning_content deltas and the trailing finish_reason on a best-effort basis.

Verified live

Scenario Result
kimi-k2.5, max_tokens=50 (truncated) warns; $content="", $thinking 209 chars, $finish_reason="length"
kimi-k2.5, max_tokens=4000 (normal) no warning; both content + thinking populated; $finish_reason="stop"
claude-sonnet-4-6 (no extended thinking) content populated; $thinking=NULL; $finish_reason="stop" (normalized from end_turn)
Anthropic mock with thinking + text blocks text/thinking split correctly; stop_reason normalization round-trips

11 new unit tests cover .normalize_anthropic_stop_reason() and .warn_if_truncated() (pure functions, no API calls).

Pre-existing issues not addressed here

  • inst/tinytest/test_config.R still expects claude-3-5-sonnet-latest; needs updating to claude-sonnet-4-6 since d07b4b6. Failing before this PR, still failing on this branch. Separate fix.

Test plan

  • tinytest::run_test_file("inst/tinytest/test_reasoning.R") passes (11/11)
  • Live test against Moonshot with kimi-k2.5 (truncated + normal cases)
  • Live test against Anthropic with claude-sonnet-4-6
  • Mocked Anthropic extraction logic with synthetic content blocks

Reasoning models (DeepSeek-R1, Moonshot Kimi, Anthropic extended
thinking, OpenRouter) put their chain-of-thought in a separate field
that was previously being dropped on the floor. When the model burned
its budget on hidden reasoning, callers saw content="" with no
indication anything went wrong.

* Add $thinking and $finish_reason to the chat() return list.
  Normalized across providers: reasoning_content (DeepSeek/Moonshot/
  vLLM/SGLang) and reasoning (OpenRouter) for OAI-compatible; thinking
  blocks scanned out of content[] for Anthropic. stop_reason mapped to
  OpenAI vocabulary (end_turn -> stop, max_tokens -> length).
* Warn on the silent-truncation case: finish_reason == "length" with
  empty content but populated thinking. Actionable signal is "raise
  max_tokens".
* Streaming path captures reasoning deltas and the trailing
  finish_reason on a best-effort basis.
* Unit tests for both helpers (.normalize_anthropic_stop_reason,
  .warn_if_truncated).
CI was failing on a stale claude-3-5-sonnet-latest assertion left over
from before the default was bumped to claude-sonnet-4-6.
@TroyHernandez TroyHernandez merged commit 28d8a70 into main May 8, 2026
4 checks passed
@TroyHernandez TroyHernandez deleted the reasoning-content branch May 8, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant