Skip to content

Conversation

@keenborder786
Copy link
Contributor

@github-actions github-actions bot added langchain `langchain` package issues & PRs feature For PRs that implement a new feature; NOT A FEATURE REQUEST labels Jan 1, 2026
@mdrxy
Copy link
Member

mdrxy commented Jan 5, 2026

Thanks for the PR! The sanitization approach does help reduce token usage, but after investigation we found a simpler solution that fully resolves the issue.

The root cause: When {messages} is formatted into the summary prompt, Python's str() is called on the message list, which includes all Pydantic metadata fields (additional_kwargs={}, response_metadata={}, etc.) - even when empty.

The sanitization approach reduces the ratio from 2.4x to ~1.8x, but doesn't fully invert the inequality. Using get_buffer_string() (which already exists in langchain-core) produces output like:

Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
AI: It's 72°F in NYC!

This is more token-efficient and aligns with how count_tokens_approximately estimates tokens internally.

Superseding with a simpler fix in #34607 that uses get_buffer_string() directly in _create_summary and _acreate_summary. The change is just:

formatted_messages = get_buffer_string(trimmed_messages)
response = self.model.invoke(self.summary_prompt.format(messages=formatted_messages))

Thanks for working on this - your PR helped identify that metadata was the issue!

@mdrxy mdrxy closed this Jan 5, 2026
mdrxy added a commit that referenced this pull request Jan 7, 2026
#34607)

Fixes #34517

Supersedes #34557, #34570

Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.

**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.

**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`

**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:

```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```

Instead of verbose Pydantic repr:

```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature For PRs that implement a new feature; NOT A FEATURE REQUEST langchain `langchain` package issues & PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SummarizationMiddleware includes metadata in prompt causing context length overflow

2 participants