Add history_tool_calls() helpers + ollama id fix#11
Merged
Conversation
Walking the history llm.api returns required consumers to know that
Anthropic uses content blocks ({type: tool_use} / {type: tool_result})
while OpenAI / moonshot / ollama use a separate tool_calls field plus
role='tool' result messages. corteza had to handle both shapes in
three places (count, render, unfinished-detect); other consumers
would hit the same wall.
Keep history provider-native (it's what each API expects on the next
turn) and add canonical helpers instead.
history_tool_calls(history) returns a list of paired records:
id, name, arguments, result, completed,
call_message_index, result_message_index, provider_shape
history_count_tool_calls(history, completed_only = FALSE) is a thin
wrapper that's enough for trigger logic that just needs counts.
Bonus fix: .agent_ollama() now writes the synthesized tool-call id
back into the assistant message when the upstream response omits one.
Without this, assistant.tool_calls[i].id and the corresponding
role='tool' tool_call_id could disagree, breaking history walks that
pair calls with results.
Tests: 36 new offline asserts covering both shapes, completed and
unfinished cases, mixed-shape history, and the count_only filter.
Full suite: 92 / 92 pass.
Codex's review noted the helper tests don't directly exercise the .agent_ollama() branch where the synthesized id gets written back into assistant_message[i]. Adding a targeted offline test: - Stubs llm.api:::.post_json via assignInNamespace to return an Ollama-shaped response with a tool_call missing the id field. - Calls .agent_ollama directly (avoids needing to handle the agent loop or a real network). - Asserts the synthesized id appears in BOTH the canonical tool_calls list AND assistant_message (the bug). - Restores the real .post_json before returning. - End-to-end: feeds the assistant_message into a fake history with a matching role='tool' message and asserts history_tool_calls() pairs them correctly via the synthesized id. 8 new asserts. Full suite: 100 / 100 pass.
Merged
5 tasks
TroyHernandez
added a commit
to cornball-ai/corteza
that referenced
this pull request
May 10, 2026
#68) * archival: delegate to llm.api::history_tool_calls() with fallback Three archival walks were Anthropic-shape-only: - archival_count_tool_calls (fixed in #67 to handle both shapes manually). - archival_slice_has_unfinished_tool_use (still Anthropic-only — always returned FALSE on moonshot/kimi/ollama). - archival_render_block / archival_history_entry_to_text (also Anthropic-only — assistant tool_calls were invisible to the summary prompt and to the on-disk JSONL dump, so summaries on OpenAI-shape providers missed the actual tool work). llm.api 0.1.3 (cornball-ai/llm.api#11) ships history_tool_calls(history) returning canonical records that pair calls with results regardless of provider shape. Delegate to it when available; fall back to a byte-for-byte equivalent local walk when it isn't. Once corteza pins Imports: llm.api (>= 0.1.3) at release time the fallback can be deleted. archival_render_transcript now builds the summary prompt off the canonical record walk: tool calls render as [tool_use: name(args)] [tool_result: text] in both shapes, with a (pending) marker for unfinished calls. The on-disk archival_history_entry_to_text picks up OpenAI tool_calls fields and role='tool' result messages so JSONL dumps from non-Anthropic sessions are no longer empty assistant entries. Tests: - test_archival_history_shim.R — fallback covers both shapes, the unfinished case, and proves delegate-vs-fallback parity on a fixture (4 + 6 + 4 + 6 + 4 asserts). - test_archival_render.R — adds OpenAI-shape transcript and entry_to_text coverage (6 new asserts). Full suite: 1366/1366 pass. * archival: fix entry_to_text — multi-char content + double role=='tool' render Two low-severity bugs Codex caught in the OpenAI-shape entry-to-text rewrite: 1. `if (nzchar(cnt))` errored when content was a length-N character vector (the condition is length-N, R warns and uses only the first element on R<=4.1, errors on >=4.2). Collapse first, then nzchar the result. 2. role=='tool' messages double-rendered: first the raw content landed via `is.character(cnt)`, then `[tool_result: ...]` got appended at the bottom. Large tool outputs ended up duplicated in the on-disk JSONL. Restructured: handle role=='tool' first as its own branch, then fall through to the generic character / list-of-blocks branches. Block text fields also get collapse-then-nzchar treatment so a multi-line text block doesn't trip the same bug. Tests: 4 new regression asserts (role=='tool' renders once; multi-element character content collapses correctly). Full suite: 1369/1369 pass. --------- Co-authored-by: llamaR <test@example.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Walking the message history that `agent()` returns required consumers to know two distinct provider shapes:
corteza recently hit this in three places (call counting, transcript rendering, unfinished-call detection) and would've kept reinventing the same walk. Other consumers will hit the same wall.
Design
History stays provider-native. `agent()` sends history straight back into the next provider request (`R/agent.R:70`); canonicalizing it would create a translation layer on every turn. Each provider's API requires its own shape on input.
Add canonical helpers instead.
```r
history_tool_calls(history)
list of records:
id, name, arguments, result, completed,
call_message_index, result_message_index, provider_shape
history_count_tool_calls(history, completed_only = FALSE)
thin wrapper for trigger logic that just needs counts
```
Each record carries enough to drive count, render, and unfinished-detect off one walk. `provider_shape` is `"anthropic"` or `"openai"` — useful when a consumer needs to branch on shape (rare).
Bonus fix: ollama id write-back
`.agent_ollama()` synthesizes a call id when the upstream response omits one (some Ollama models do). The synthesized id was flowing into the canonical `tool_calls` list but not back into the assistant message stored in history. Result: `assistant.tool_calls[i].id` and the corresponding `role = "tool"` message's `tool_call_id` could disagree, breaking history walks that pair calls with results. Latent bug, exposed the moment a consumer tried to pair them — which the new helper does.
Test plan
Why this PR exists
Right after corteza shipped its archival runtime, a moonshot test session showed `/agents` empty even after multi-tool turns. `archival_count_tool_calls` only knew Anthropic content blocks; on moonshot the count returned 0 and archival never fired. Patched that one site in corteza, but two more (`archival_slice_has_unfinished_tool_use`, `archival_render_block`) had the same blind spot. Codex AI's review pointed out that this is fundamentally a layering issue: llm.api is the abstraction, llm.api should expose the canonical walk. Hence this PR.