fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503) by 18062706139fcz · Pull Request #3378 · bytedance/deer-flow

18062706139fcz · 2026-06-04T03:08:38Z

Problem

Fixes #2503. The SummarizationMiddleware runs its summary LLM call inside a before_model hook. Without a nostream tag, the summary tokens are captured by LangGraph's messages-tuple stream callback and broadcast to the frontend as a phantom AI message.

Fix

Build a dedicated summary model copy tagged with "nostream" so the LangGraph streaming handler skips it (TAG_NOSTREAM).
The tag is merged on top of any existing tags (e.g. middleware:summarize) instead of overwriting them, so RunJournal / tracing attribution is preserved (RunnableBinding.with_config does a shallow merge).
Override _create_summary / _acreate_summary to invoke the tagged model directly, rather than temporarily swapping the shared self.model. The previous swap approach would leak the RunnableBinding across concurrent runs and break parent logic that inspects the raw model (profile / _get_ls_params).

Tests

Added regression tests covering:

nostream tagging on the summary model
concurrent-run isolation (shared self.model is never mutated mid-await)
raw model preservation for parent profile inspection
existing-tag merge (middleware:summarize + nostream, no duplicates)

All tests pass: pytest tests/test_summarization_middleware.py → 28 passed.

…ream messages (bytedance#2503) The SummarizationMiddleware runs its summary LLM call inside a before_model hook. Without a nostream tag the summary tokens were captured by LangGraph's messages-tuple stream callback and broadcast to the frontend as a phantom AI message. Generate a dedicated summary model copy tagged with "nostream" (merged on top of any existing tags such as "middleware:summarize" so RunJournal attribution is preserved) and override _create_summary / _acreate_summary to invoke it directly. This avoids temporarily swapping the shared self.model, which would otherwise leak the RunnableBinding across concurrent runs and break parent logic that inspects the raw model (profile / _get_ls_params). Add regression tests covering nostream tagging, concurrent-run isolation, raw model preservation, and existing-tag merge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503)#3378

fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503)#3378
18062706139fcz wants to merge 1 commit into
bytedance:mainfrom
18062706139fcz:fix/summarization-stream-pollution-2503

18062706139fcz commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

18062706139fcz commented Jun 4, 2026

Problem

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant