Skip to content

WIP: stop persisting empty assistant messages, tolerate corrupt tool_use args#123

Draft
birdayz wants to merge 2 commits into
mainfrom
jb/empty-content-poisoning
Draft

WIP: stop persisting empty assistant messages, tolerate corrupt tool_use args#123
birdayz wants to merge 2 commits into
mainfrom
jb/empty-content-poisoning

Conversation

@birdayz
Copy link
Copy Markdown
Contributor

@birdayz birdayz commented Apr 28, 2026

What

Two related changes that together unwedge sessions hit by the empty-content / corrupt-tool-args failure modes around #116.

  1. Agent loop skips persistence when len(resp.Message.Content) == 0. Empty assistant messages no longer land in session state, where Anthropic and most other providers reject every subsequent replay with messages.N.content: Field required.
  2. ToolRequest.ArgumentsAsObject() collapses nil / invalid Arguments to an empty object, so request mappers can feed provider tool_use APIs without faceplanting on truncated streaming state. Bedrock and Gemini request mappers adopt it. Anthropic and OpenAI Compatible request mappers still need to switch over — that is the WIP part.

Why

#116 didn't introduce a new bug — it exposed a pre-existing one. Pre-#116, a stream cut off mid-tool-args left partial JSON in Content, so the message was non-empty and the persistence path was fine; the corruption surfaced differently (the JSON-parse-error wedge that #116 already covers). Post-#116, the partial tool_use is correctly dropped, but if it was the only block, Content is empty — and the agent loop appends that empty Message to session state, locking the session forever.

Sources of empty content observed in production:

In all of these the FinishReason carries the truth (Length, ContentFilter, etc.) and the agent loop's existing terminal-reason handling fires. The bug was that persistence ran before the terminal-reason check, so the empty Message landed in session state regardless of how cleanly the loop terminated. Stayed wedged across pod restarts because the empty Message was already persisted.

The corrupt-tool-args side of the same problem: a session that already has truncated JSON in its history needs to be replayable. Erroring at json.Unmarshal time means the model can never recover, even though the paired tool_result carries the original parse error and the model has enough context to retry.

Implementation details

Persistence guard at the session-store boundary (agent/llmagent/llmagent.go). One-line check: skip the append when content is empty. The MessageEvent still fires (observers see what happened) and the FinishReason still propagates (terminal-reason handling fires below). Single chokepoint for "what gets persisted" — guard it there, not in every provider. Same shape as adk-go's AppendEvent honouring Event.Partial.

Provider response mappers do not error on empty content. Empty content is a real wire state; the FinishReason already encodes it correctly. Erroring at the mapper would swallow the Length / ContentFilter signal that the agent loop needs to terminate cleanly. Earlier iteration of this PR enforced the invariant at the mapper — reverted in favour of the session-store guard.

ToolRequest.ArgumentsAsObject() in llm/part.go. Decodes Arguments into a JSON object, falling back to an empty object on nil bytes or invalid JSON. The paired tool_result already carries the original parse error so the model has context to retry instead of every replay dying in mapping.

WIP scope. Anthropic and OpenAI Compatible request mappers still call json.Unmarshal directly on tool arguments and will fail on truncated state. The wedged-session reproducer in providers/anthropic/request_mapper_test.go is currently red and pins the next step.

References

@birdayz birdayz force-pushed the jb/empty-content-poisoning branch 8 times, most recently from d88383c to 99088ec Compare April 28, 2026 12:05
birdayz added 2 commits April 30, 2026 09:17
Sibling case to PR #116. Where #116 dealt with corrupt content
(truncated tool args during streaming), this addresses absent content:
empty assistant messages landing in session state, where Anthropic and
most other providers reject any subsequent replay with
`messages.N.content: Field required` and the conversation is
permanently wedged.

## Why

#116 didn't introduce a new bug — it exposed a pre-existing one.
Pre-#116, a stream cut off mid-tool-args left partial JSON in
`Content`, so the message was non-empty and the persistence path was
fine; the corruption surfaced differently (the JSON-parse-error wedge
that #116 already covers). Post-#116, the partial tool_use is
correctly dropped, but if it was the only block, `Content` is empty
— and the agent loop appends that empty `Message` to session state,
locking the session forever.

Sources of empty content we have observed:

- max_tokens hit before any content block was emitted
- the only content block was a partial tool_use that stream
  finalisation correctly dropped (#116)
- refusal / safety filter with no text emitted

In all of these the FinishReason carries the truth (Length,
ContentFilter, etc.) and the agent loop's existing terminal-reason
handling fires. The bug was that the persistence step at line 273
ran *before* the terminal-reason check at line 285, so the empty
Message landed in session state regardless of how cleanly the loop
otherwise terminated.

Discovered after #116 deployed: an agent session that had been
running for ~30 minutes started failing every call with
`messages.15.content: Field required`. Stayed wedged across pod
restarts because the empty Message was already persisted.

## Fix

One-line guard at the session-store boundary
(agent/llmagent/llmagent.go:273): skip persistence when
`len(resp.Message.Content) == 0`. The MessageEvent still fires
(observers see what happened) and the FinishReason still propagates
(terminal-reason handling fires below); only persistence is skipped.

Same shape as adk-go's `AppendEvent` honouring `Event.Partial`. The
session-store boundary is the single chokepoint for "what gets
persisted" — guard it there, don't try to enforce the invariant in
every provider.

Provider response mappers are unchanged. Empty content is a real
state in the wild (truncation, refusal, dropped partial blocks); the
FinishReason already encodes it correctly. Erroring at the provider
would swallow the Length / ContentFilter signal that the agent loop
needs to terminate cleanly.

## Tests

agent/llmagent/empty_response_test.go locks in:
- empty Message not persisted to session
- non-empty Message still persisted (negative control — guard must
  not over-fire)
- FinishReason still propagates (Length terminal handling fires)
- MessageEvent still fires (observers see the empty response)

## References

PR #116: surface truncation instead of corrupting state when tool_use is cut off
…ng at providers

Two related changes that together let wedged sessions recover and
narrow provider responsibility:

1. ToolRequest.ArgumentsAsObject() — decode Arguments into a JSON
   object, falling back to an empty object on nil bytes or invalid
   JSON. Lets request mappers feed provider APIs (Anthropic
   tool_use.input, Bedrock tool_use.Input, Gemini function_call.args)
   without faceplanting on truncated streaming state. The paired
   tool_result already carries the original parse error so the model
   has context to retry.

2. Drop empty-content error in provider response mappers. Empty
   content is a real wire state (max_tokens before any block,
   refusal, dropped partial tool_use). FinishReason already encodes
   it correctly; erroring at the mapper swallowed Length /
   ContentFilter signals the agent loop needs. The session-store
   guard added in the previous commit is the right place for the
   invariant.

Bedrock and Gemini request mappers switch to ArgumentsAsObject().
Anthropic and OpenAI Compatible request mappers still use raw
json.Unmarshal — TODO before lift.

Tests:
- llm/part_test.go: ArgumentsAsObject contract
- providers/anthropic/request_mapper_test.go: wedged-session replay
  reproducer (currently red — anthropic request mapper not yet
  switched over)

WIP: anthropic and openaicompat request mappers still need to adopt
ArgumentsAsObject for the wedged-session replay path to work
end-to-end.
@birdayz birdayz force-pushed the jb/empty-content-poisoning branch from 99088ec to bcbf132 Compare April 30, 2026 07:18
@birdayz birdayz changed the title fix: surface and stop persisting empty-content responses WIP: stop persisting empty assistant messages, tolerate corrupt tool_use args Apr 30, 2026
@birdayz birdayz marked this pull request as draft April 30, 2026 07:18
@blacksmith-sh
Copy link
Copy Markdown

blacksmith-sh Bot commented Apr 30, 2026

Found 2 test failures on Blacksmith runners:

Failures

Test View Logs
github.com/redpanda-data/ai-sdk-go/plugins/retry/TestRequestMapper_EmptyToolArguments View Logs
github.com/redpanda-data/ai-sdk-go/plugins/retry/TestRequestMapper_WedgedSessionReplay View Logs

Fix in Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant