Skip to content

fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503)#3378

Open
18062706139fcz wants to merge 1 commit into
bytedance:mainfrom
18062706139fcz:fix/summarization-stream-pollution-2503
Open

fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503)#3378
18062706139fcz wants to merge 1 commit into
bytedance:mainfrom
18062706139fcz:fix/summarization-stream-pollution-2503

Conversation

@18062706139fcz
Copy link
Copy Markdown
Contributor

Problem

Fixes #2503. The SummarizationMiddleware runs its summary LLM call inside a before_model hook. Without a nostream tag, the summary tokens are captured by LangGraph's messages-tuple stream callback and broadcast to the frontend as a phantom AI message.

Fix

  • Build a dedicated summary model copy tagged with "nostream" so the LangGraph streaming handler skips it (TAG_NOSTREAM).
  • The tag is merged on top of any existing tags (e.g. middleware:summarize) instead of overwriting them, so RunJournal / tracing attribution is preserved (RunnableBinding.with_config does a shallow merge).
  • Override _create_summary / _acreate_summary to invoke the tagged model directly, rather than temporarily swapping the shared self.model. The previous swap approach would leak the RunnableBinding across concurrent runs and break parent logic that inspects the raw model (profile / _get_ls_params).

Tests

Added regression tests covering:

  • nostream tagging on the summary model
  • concurrent-run isolation (shared self.model is never mutated mid-await)
  • raw model preservation for parent profile inspection
  • existing-tag merge (middleware:summarize + nostream, no duplicates)

All tests pass: pytest tests/test_summarization_middleware.py → 28 passed.

…ream messages (bytedance#2503)

The SummarizationMiddleware runs its summary LLM call inside a before_model
hook. Without a nostream tag the summary tokens were captured by LangGraph's
messages-tuple stream callback and broadcast to the frontend as a phantom AI
message.

Generate a dedicated summary model copy tagged with "nostream" (merged on top
of any existing tags such as "middleware:summarize" so RunJournal attribution
is preserved) and override _create_summary / _acreate_summary to invoke it
directly. This avoids temporarily swapping the shared self.model, which would
otherwise leak the RunnableBinding across concurrent runs and break parent
logic that inspects the raw model (profile / _get_ls_params).

Add regression tests covering nostream tagging, concurrent-run isolation, raw
model preservation, and existing-tag merge.
@github-actions github-actions Bot added needs-validation Touches front/back contract surface; needs real-path validation risk:high High risk: backend API, agents, sandbox, auth, deps, CI size/M PR changes 100-300 lines area:agents Agents, subagents, graph wiring, prompts, langgraph.json and removed size/M PR changes 100-300 lines risk:high High risk: backend API, agents, sandbox, auth, deps, CI needs-validation Touches front/back contract surface; needs real-path validation labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:agents Agents, subagents, graph wiring, prompts, langgraph.json

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SummarizationMiddleware 的内部 LLM 调用污染对话流

1 participant