chat: surface reasoning content and finish_reason by TroyHernandez · Pull Request #9 · cornball-ai/llm.api

TroyHernandez · 2026-05-08T15:55:23Z

Summary

Reasoning models (DeepSeek-R1, Moonshot Kimi k2.5/k2.6, Anthropic extended thinking, OpenRouter) return their chain-of-thought in a separate field that chat() was silently dropping. When the model burned its entire max_tokens budget on hidden reasoning, callers saw content == "" with no indication anything went wrong.

This was hit live: tiny's morning briefing started returning empty Matrix messages on 2026-05-07 because kimi-k2.6 with max_tokens: 800 ran out of tokens mid-reasoning and never emitted a user-facing answer.

Changes

chat() now returns $thinking and $finish_reason in addition to existing fields. Additive; existing callers reading $content, $model, $usage, $history are unchanged.
$thinking is normalized across providers: message$reasoning_content (DeepSeek/Moonshot/vLLM/SGLang), falling back to message$reasoning (OpenRouter) on OAI-compatible endpoints; content[] blocks where type == "thinking" for Anthropic.
$finish_reason is normalized to OpenAI vocabulary across providers. Anthropic's end_turn -> stop, max_tokens -> length. Other values (tool_use, stop_sequence, etc.) pass through.
New warning() when finish_reason == "length" and content is empty but thinking is populated. The actionable signal is "raise max_tokens"; previously this was silent.
Streaming path captures reasoning_content deltas and the trailing finish_reason on a best-effort basis.

Verified live

Scenario	Result
kimi-k2.5, max_tokens=50 (truncated)	warns; $content="", $thinking 209 chars, $finish_reason="length"
kimi-k2.5, max_tokens=4000 (normal)	no warning; both content + thinking populated; $finish_reason="stop"
claude-sonnet-4-6 (no extended thinking)	content populated; $thinking=NULL; $finish_reason="stop" (normalized from end_turn)
Anthropic mock with thinking + text blocks	text/thinking split correctly; stop_reason normalization round-trips

11 new unit tests cover .normalize_anthropic_stop_reason() and .warn_if_truncated() (pure functions, no API calls).

Pre-existing issues not addressed here

inst/tinytest/test_config.R still expects claude-3-5-sonnet-latest; needs updating to claude-sonnet-4-6 since d07b4b6. Failing before this PR, still failing on this branch. Separate fix.

Test plan

tinytest::run_test_file("inst/tinytest/test_reasoning.R") passes (11/11)
Live test against Moonshot with kimi-k2.5 (truncated + normal cases)
Live test against Anthropic with claude-sonnet-4-6
Mocked Anthropic extraction logic with synthetic content blocks

Reasoning models (DeepSeek-R1, Moonshot Kimi, Anthropic extended thinking, OpenRouter) put their chain-of-thought in a separate field that was previously being dropped on the floor. When the model burned its budget on hidden reasoning, callers saw content="" with no indication anything went wrong. * Add $thinking and $finish_reason to the chat() return list. Normalized across providers: reasoning_content (DeepSeek/Moonshot/ vLLM/SGLang) and reasoning (OpenRouter) for OAI-compatible; thinking blocks scanned out of content[] for Anthropic. stop_reason mapped to OpenAI vocabulary (end_turn -> stop, max_tokens -> length). * Warn on the silent-truncation case: finish_reason == "length" with empty content but populated thinking. Actionable signal is "raise max_tokens". * Streaming path captures reasoning deltas and the trailing finish_reason on a best-effort basis. * Unit tests for both helpers (.normalize_anthropic_stop_reason, .warn_if_truncated).

CI was failing on a stale claude-3-5-sonnet-latest assertion left over from before the default was bumped to claude-sonnet-4-6.

TroyHernandez added 2 commits May 8, 2026 10:47

tests: align anthropic default model with d07b4b6 bump

4abedbe

CI was failing on a stale claude-3-5-sonnet-latest assertion left over from before the default was bumped to claude-sonnet-4-6.

TroyHernandez merged commit 28d8a70 into main May 8, 2026
4 checks passed

TroyHernandez deleted the reasoning-content branch May 8, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat: surface reasoning content and finish_reason#9

chat: surface reasoning content and finish_reason#9
TroyHernandez merged 2 commits into
mainfrom
reasoning-content

TroyHernandez commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TroyHernandez commented May 8, 2026

Summary

Changes

Verified live

Pre-existing issues not addressed here

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant