Skip to content

Conversation

@sridhar852002
Copy link

Description

Fixes #34517.

Context
Currently, SummarizationMiddleware sends the full message objects to the LLM when generating a summary. This includes all metadata (like token_usage, logprobs, etc.), which wastes a significant amount of tokens. This triggers ContextLengthExceeded errors even when the actual text content is small.

Changes

  • Added _format_clean_history to strip metadata before sending history to the summarization model.
  • The middleware now converts history to a simple Role: Content string format to save tokens.
  • Added logic to flatten multimodal content (e.g. image blocks) into text to prevent formatting errors.
  • Added unit tests to cover these cases.

Checklist

  • Ran reproduction script to verify the fix
  • Added unit tests
  • Ran make lint and make format

@github-actions github-actions bot added langchain `langchain` package issues & PRs fix For PRs that implement a fix labels Jan 1, 2026
@sridhar852002 sridhar852002 changed the title fix(agents): sanitize SummarizationMiddleware history to prevent token overflow fix(langchain): sanitize SummarizationMiddleware history to prevent token overflow Jan 1, 2026
@github-actions github-actions bot added fix For PRs that implement a fix and removed fix For PRs that implement a fix labels Jan 1, 2026
@mdrxy mdrxy self-assigned this Jan 5, 2026
@mdrxy mdrxy changed the title fix(langchain): sanitize SummarizationMiddleware history to prevent token overflow fix(langchain): sanitize SummarizationMiddleware history to prevent token overflow Jan 5, 2026
@github-actions github-actions bot added fix For PRs that implement a fix and removed fix For PRs that implement a fix labels Jan 5, 2026
@mdrxy
Copy link
Member

mdrxy commented Jan 5, 2026

Thanks for the contribution @sridhar852002! Your analysis of the root cause was spot on - the issue is that str(messages) includes all metadata which inflates the token count significantly.

langchain-core already has a utility designed exactly for this purpose: get_buffer_string(). It converts messages to a clean Role: Content format (similar to what your _format_clean_history does) and is already used throughout the codebase for this exact scenario.

Implementing the fix in #34607, which is a simpler change (just updating the import and using the existing utility) while achieving the same result.

Closing this in favor of that approach, but thanks again for identifying the issue and proposing a solution!

@mdrxy mdrxy closed this Jan 5, 2026
mdrxy added a commit that referenced this pull request Jan 7, 2026
#34607)

Fixes #34517

Supersedes #34557, #34570

Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.

**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.

**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`

**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:

```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```

Instead of verbose Pydantic repr:

```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix For PRs that implement a fix langchain `langchain` package issues & PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SummarizationMiddleware includes metadata in prompt causing context length overflow

2 participants