Skip to content

Add history_tool_calls() helpers + ollama id fix#11

Merged
TroyHernandez merged 2 commits into
mainfrom
history-tool-calls-helpers
May 10, 2026
Merged

Add history_tool_calls() helpers + ollama id fix#11
TroyHernandez merged 2 commits into
mainfrom
history-tool-calls-helpers

Conversation

@TroyHernandez
Copy link
Copy Markdown
Contributor

Summary

Walking the message history that `agent()` returns required consumers to know two distinct provider shapes:

  • Anthropic: assistant content is a list of typed blocks; tool calls are `{type: "tool_use"}` blocks; results are `{type: "tool_result"}` blocks in user messages.
  • OpenAI / moonshot / ollama: assistant has a separate `tool_calls` field; results are individual `role: "tool"` messages.

corteza recently hit this in three places (call counting, transcript rendering, unfinished-call detection) and would've kept reinventing the same walk. Other consumers will hit the same wall.

Design

History stays provider-native. `agent()` sends history straight back into the next provider request (`R/agent.R:70`); canonicalizing it would create a translation layer on every turn. Each provider's API requires its own shape on input.

Add canonical helpers instead.

```r
history_tool_calls(history)

list of records:

id, name, arguments, result, completed,

call_message_index, result_message_index, provider_shape

history_count_tool_calls(history, completed_only = FALSE)

thin wrapper for trigger logic that just needs counts

```

Each record carries enough to drive count, render, and unfinished-detect off one walk. `provider_shape` is `"anthropic"` or `"openai"` — useful when a consumer needs to branch on shape (rare).

Bonus fix: ollama id write-back

`.agent_ollama()` synthesizes a call id when the upstream response omits one (some Ollama models do). The synthesized id was flowing into the canonical `tool_calls` list but not back into the assistant message stored in history. Result: `assistant.tool_calls[i].id` and the corresponding `role = "tool"` message's `tool_call_id` could disagree, breaking history walks that pair calls with results. Latent bug, exposed the moment a consumer tried to pair them — which the new helper does.

Test plan

  • Offline: `test_history_tool_calls.R` — 36 new asserts covering Anthropic shape, OpenAI shape (used by moonshot / kimi / ollama), unfinished calls in both shapes, the `completed_only` filter, mixed-shape history, and trivial no-tool histories.
  • Full `tinytest::test_package("llm.api")` — 92 / 92 pass.
  • Live integration via corteza after this lands: replace `archival_count_tool_calls`, `archival_slice_has_unfinished_tool_use`, and the OpenAI branch of `archival_render_block` with calls to `llm.api::history_tool_calls()`.

Why this PR exists

Right after corteza shipped its archival runtime, a moonshot test session showed `/agents` empty even after multi-tool turns. `archival_count_tool_calls` only knew Anthropic content blocks; on moonshot the count returned 0 and archival never fired. Patched that one site in corteza, but two more (`archival_slice_has_unfinished_tool_use`, `archival_render_block`) had the same blind spot. Codex AI's review pointed out that this is fundamentally a layering issue: llm.api is the abstraction, llm.api should expose the canonical walk. Hence this PR.

Walking the history llm.api returns required consumers to know that
Anthropic uses content blocks ({type: tool_use} / {type: tool_result})
while OpenAI / moonshot / ollama use a separate tool_calls field plus
role='tool' result messages. corteza had to handle both shapes in
three places (count, render, unfinished-detect); other consumers
would hit the same wall.

Keep history provider-native (it's what each API expects on the next
turn) and add canonical helpers instead.

history_tool_calls(history) returns a list of paired records:
  id, name, arguments, result, completed,
  call_message_index, result_message_index, provider_shape

history_count_tool_calls(history, completed_only = FALSE) is a thin
wrapper that's enough for trigger logic that just needs counts.

Bonus fix: .agent_ollama() now writes the synthesized tool-call id
back into the assistant message when the upstream response omits one.
Without this, assistant.tool_calls[i].id and the corresponding
role='tool' tool_call_id could disagree, breaking history walks that
pair calls with results.

Tests: 36 new offline asserts covering both shapes, completed and
unfinished cases, mixed-shape history, and the count_only filter.
Full suite: 92 / 92 pass.
Codex's review noted the helper tests don't directly exercise the
.agent_ollama() branch where the synthesized id gets written back
into assistant_message[i]. Adding a targeted offline
test:

- Stubs llm.api:::.post_json via assignInNamespace to return an
  Ollama-shaped response with a tool_call missing the id field.
- Calls .agent_ollama directly (avoids needing to handle the agent
  loop or a real network).
- Asserts the synthesized id appears in BOTH the canonical tool_calls
  list AND assistant_message (the bug).
- Restores the real .post_json before returning.
- End-to-end: feeds the assistant_message into a fake history with a
  matching role='tool' message and asserts history_tool_calls() pairs
  them correctly via the synthesized id.

8 new asserts. Full suite: 100 / 100 pass.
@TroyHernandez TroyHernandez merged commit 57b5705 into main May 10, 2026
3 of 4 checks passed
@TroyHernandez TroyHernandez deleted the history-tool-calls-helpers branch May 10, 2026 04:45
TroyHernandez added a commit to cornball-ai/corteza that referenced this pull request May 10, 2026
#68)

* archival: delegate to llm.api::history_tool_calls() with fallback

Three archival walks were Anthropic-shape-only:

- archival_count_tool_calls (fixed in #67 to handle both shapes
  manually).
- archival_slice_has_unfinished_tool_use (still Anthropic-only —
  always returned FALSE on moonshot/kimi/ollama).
- archival_render_block / archival_history_entry_to_text (also
  Anthropic-only — assistant tool_calls were invisible to the
  summary prompt and to the on-disk JSONL dump, so summaries on
  OpenAI-shape providers missed the actual tool work).

llm.api 0.1.3 (cornball-ai/llm.api#11) ships
history_tool_calls(history) returning canonical records that pair
calls with results regardless of provider shape. Delegate to it
when available; fall back to a byte-for-byte equivalent local walk
when it isn't. Once corteza pins Imports: llm.api (>= 0.1.3) at
release time the fallback can be deleted.

archival_render_transcript now builds the summary prompt off the
canonical record walk: tool calls render as
  [tool_use: name(args)]
  [tool_result: text]
in both shapes, with a (pending) marker for unfinished calls. The
on-disk archival_history_entry_to_text picks up OpenAI tool_calls
fields and role='tool' result messages so JSONL dumps from
non-Anthropic sessions are no longer empty assistant entries.

Tests:
- test_archival_history_shim.R — fallback covers both shapes, the
  unfinished case, and proves delegate-vs-fallback parity on a
  fixture (4 + 6 + 4 + 6 + 4 asserts).
- test_archival_render.R — adds OpenAI-shape transcript and
  entry_to_text coverage (6 new asserts).
Full suite: 1366/1366 pass.

* archival: fix entry_to_text — multi-char content + double role=='tool' render

Two low-severity bugs Codex caught in the OpenAI-shape entry-to-text
rewrite:

1. `if (nzchar(cnt))` errored when content was a length-N character
   vector (the condition is length-N, R warns and uses only the first
   element on R<=4.1, errors on >=4.2). Collapse first, then nzchar
   the result.

2. role=='tool' messages double-rendered: first the raw content
   landed via `is.character(cnt)`, then `[tool_result: ...]` got
   appended at the bottom. Large tool outputs ended up duplicated in
   the on-disk JSONL.

Restructured: handle role=='tool' first as its own branch, then fall
through to the generic character / list-of-blocks branches. Block
text fields also get collapse-then-nzchar treatment so a multi-line
text block doesn't trip the same bug.

Tests: 4 new regression asserts (role=='tool' renders once;
multi-element character content collapses correctly). Full suite:
1369/1369 pass.

---------

Co-authored-by: llamaR <test@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant