feat(langchain): sanitize Summarization Content to remove extra metadata #34570

keenborder786 · 2026-01-01T20:05:59Z

Simple Change to sanitize the Messages and remove extra metadata to save on extra tokens. I saw there is a PR fix(langchain): sanitize SummarizationMiddleware history to prevent token overflow #34557 for this, but that completely get's rid of Tool Call information which might be important for Summarization.
Fixes SummarizationMiddleware includes metadata in prompt causing context length overflow #34517

mdrxy · 2026-01-05T21:05:45Z

Thanks for the PR! The sanitization approach does help reduce token usage, but after investigation we found a simpler solution that fully resolves the issue.

The root cause: When {messages} is formatted into the summary prompt, Python's str() is called on the message list, which includes all Pydantic metadata fields (additional_kwargs={}, response_metadata={}, etc.) - even when empty.

The sanitization approach reduces the ratio from 2.4x to ~1.8x, but doesn't fully invert the inequality. Using get_buffer_string() (which already exists in langchain-core) produces output like:

Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
AI: It's 72°F in NYC!

This is more token-efficient and aligns with how count_tokens_approximately estimates tokens internally.

Superseding with a simpler fix in #34607 that uses get_buffer_string() directly in _create_summary and _acreate_summary. The change is just:

formatted_messages = get_buffer_string(trimmed_messages)
response = self.model.invoke(self.summary_prompt.format(messages=formatted_messages))

Thanks for working on this - your PR helped identify that metadata was the issue!

#34607) Fixes #34517 Supersedes #34557, #34570 Fixes token inflation in `SummarizationMiddleware` that caused context window overflow during summarization. **Root cause:** When formatting messages for the summary prompt, `str(messages)` was implicitly called, which includes all Pydantic metadata fields (`usage_metadata`, `response_metadata`, `additional_kwargs`, etc.). This caused the stringified representation to use ~2.5x more tokens than `count_tokens_approximately` estimates. **Problem:** - Summarization triggers at 85% of context window based on `count_tokens_approximately` - But `str(messages)` in the prompt uses 2.5x more tokens - Results in `ContextLengthExceeded` **Fix:** Use `get_buffer_string()` to format messages, which produces compact output: ``` Human: What's the weather? AI: Let me check...[tool_calls] Tool: 72°F and sunny ``` Instead of verbose Pydantic repr: ```python [HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...] ```

feat: Filter Summarization

ded3b22

github-actions bot added langchain `langchain` package issues & PRs feature For PRs that implement a new feature; NOT A FEATURE REQUEST labels Jan 1, 2026

keenborder786 and others added 6 commits January 2, 2026 01:12

fix: Linting error

e1b91ac

Merge branch 'master' into chore/filter_summarization

6a91773

fix: lint error

65b3b39

Merge branch 'master' into chore/filter_summarization

105350d

linting error

8acbe9c

Merge branch 'master' into chore/filter_summarization

c805db9

mdrxy self-assigned this Jan 5, 2026

mdrxy mentioned this pull request Jan 5, 2026

fix(langchain): use get_buffer_string for message summarization #34607

Merged

mdrxy closed this Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(langchain): sanitize Summarization Content to remove extra metadata #34570

feat(langchain): sanitize Summarization Content to remove extra metadata #34570

Uh oh!

keenborder786 commented Jan 1, 2026

Uh oh!

mdrxy commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(langchain): sanitize Summarization Content to remove extra metadata #34570

feat(langchain): sanitize Summarization Content to remove extra metadata #34570

Uh oh!

Conversation

keenborder786 commented Jan 1, 2026

Uh oh!

mdrxy commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants