WIP: stop persisting empty assistant messages, tolerate corrupt tool_use args#123
Draft
birdayz wants to merge 2 commits into
Draft
WIP: stop persisting empty assistant messages, tolerate corrupt tool_use args#123birdayz wants to merge 2 commits into
birdayz wants to merge 2 commits into
Conversation
d88383c to
99088ec
Compare
Sibling case to PR #116. Where #116 dealt with corrupt content (truncated tool args during streaming), this addresses absent content: empty assistant messages landing in session state, where Anthropic and most other providers reject any subsequent replay with `messages.N.content: Field required` and the conversation is permanently wedged. ## Why #116 didn't introduce a new bug — it exposed a pre-existing one. Pre-#116, a stream cut off mid-tool-args left partial JSON in `Content`, so the message was non-empty and the persistence path was fine; the corruption surfaced differently (the JSON-parse-error wedge that #116 already covers). Post-#116, the partial tool_use is correctly dropped, but if it was the only block, `Content` is empty — and the agent loop appends that empty `Message` to session state, locking the session forever. Sources of empty content we have observed: - max_tokens hit before any content block was emitted - the only content block was a partial tool_use that stream finalisation correctly dropped (#116) - refusal / safety filter with no text emitted In all of these the FinishReason carries the truth (Length, ContentFilter, etc.) and the agent loop's existing terminal-reason handling fires. The bug was that the persistence step at line 273 ran *before* the terminal-reason check at line 285, so the empty Message landed in session state regardless of how cleanly the loop otherwise terminated. Discovered after #116 deployed: an agent session that had been running for ~30 minutes started failing every call with `messages.15.content: Field required`. Stayed wedged across pod restarts because the empty Message was already persisted. ## Fix One-line guard at the session-store boundary (agent/llmagent/llmagent.go:273): skip persistence when `len(resp.Message.Content) == 0`. The MessageEvent still fires (observers see what happened) and the FinishReason still propagates (terminal-reason handling fires below); only persistence is skipped. Same shape as adk-go's `AppendEvent` honouring `Event.Partial`. The session-store boundary is the single chokepoint for "what gets persisted" — guard it there, don't try to enforce the invariant in every provider. Provider response mappers are unchanged. Empty content is a real state in the wild (truncation, refusal, dropped partial blocks); the FinishReason already encodes it correctly. Erroring at the provider would swallow the Length / ContentFilter signal that the agent loop needs to terminate cleanly. ## Tests agent/llmagent/empty_response_test.go locks in: - empty Message not persisted to session - non-empty Message still persisted (negative control — guard must not over-fire) - FinishReason still propagates (Length terminal handling fires) - MessageEvent still fires (observers see the empty response) ## References PR #116: surface truncation instead of corrupting state when tool_use is cut off
…ng at providers Two related changes that together let wedged sessions recover and narrow provider responsibility: 1. ToolRequest.ArgumentsAsObject() — decode Arguments into a JSON object, falling back to an empty object on nil bytes or invalid JSON. Lets request mappers feed provider APIs (Anthropic tool_use.input, Bedrock tool_use.Input, Gemini function_call.args) without faceplanting on truncated streaming state. The paired tool_result already carries the original parse error so the model has context to retry. 2. Drop empty-content error in provider response mappers. Empty content is a real wire state (max_tokens before any block, refusal, dropped partial tool_use). FinishReason already encodes it correctly; erroring at the mapper swallowed Length / ContentFilter signals the agent loop needs. The session-store guard added in the previous commit is the right place for the invariant. Bedrock and Gemini request mappers switch to ArgumentsAsObject(). Anthropic and OpenAI Compatible request mappers still use raw json.Unmarshal — TODO before lift. Tests: - llm/part_test.go: ArgumentsAsObject contract - providers/anthropic/request_mapper_test.go: wedged-session replay reproducer (currently red — anthropic request mapper not yet switched over) WIP: anthropic and openaicompat request mappers still need to adopt ArgumentsAsObject for the wedged-session replay path to work end-to-end.
99088ec to
bcbf132
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two related changes that together unwedge sessions hit by the empty-content / corrupt-tool-args failure modes around #116.
len(resp.Message.Content) == 0. Empty assistant messages no longer land in session state, where Anthropic and most other providers reject every subsequent replay withmessages.N.content: Field required.ToolRequest.ArgumentsAsObject()collapses nil / invalid Arguments to an empty object, so request mappers can feed provider tool_use APIs without faceplanting on truncated streaming state. Bedrock and Gemini request mappers adopt it. Anthropic and OpenAI Compatible request mappers still need to switch over — that is the WIP part.Why
#116 didn't introduce a new bug — it exposed a pre-existing one. Pre-#116, a stream cut off mid-tool-args left partial JSON in
Content, so the message was non-empty and the persistence path was fine; the corruption surfaced differently (the JSON-parse-error wedge that #116 already covers). Post-#116, the partial tool_use is correctly dropped, but if it was the only block,Contentis empty — and the agent loop appends that emptyMessageto session state, locking the session forever.Sources of empty content observed in production:
In all of these the FinishReason carries the truth (Length, ContentFilter, etc.) and the agent loop's existing terminal-reason handling fires. The bug was that persistence ran before the terminal-reason check, so the empty Message landed in session state regardless of how cleanly the loop terminated. Stayed wedged across pod restarts because the empty Message was already persisted.
The corrupt-tool-args side of the same problem: a session that already has truncated JSON in its history needs to be replayable. Erroring at
json.Unmarshaltime means the model can never recover, even though the pairedtool_resultcarries the original parse error and the model has enough context to retry.Implementation details
Persistence guard at the session-store boundary (
agent/llmagent/llmagent.go). One-line check: skip the append when content is empty. TheMessageEventstill fires (observers see what happened) and the FinishReason still propagates (terminal-reason handling fires below). Single chokepoint for "what gets persisted" — guard it there, not in every provider. Same shape as adk-go'sAppendEventhonouringEvent.Partial.Provider response mappers do not error on empty content. Empty content is a real wire state; the FinishReason already encodes it correctly. Erroring at the mapper would swallow the Length / ContentFilter signal that the agent loop needs to terminate cleanly. Earlier iteration of this PR enforced the invariant at the mapper — reverted in favour of the session-store guard.
ToolRequest.ArgumentsAsObject()inllm/part.go. Decodes Arguments into a JSON object, falling back to an empty object on nil bytes or invalid JSON. The pairedtool_resultalready carries the original parse error so the model has context to retry instead of every replay dying in mapping.WIP scope. Anthropic and OpenAI Compatible request mappers still call
json.Unmarshaldirectly on tool arguments and will fail on truncated state. The wedged-session reproducer inproviders/anthropic/request_mapper_test.gois currently red and pins the next step.References
docs/empty-content-persistence.md— design notes on where the invariant should live