fix: context window exceeded #165

jirastorza · 2025-10-21T08:11:03Z

Fix: Improved Context Management and Context Window Handling

Previously, context size was managed only by _clip, which failed when even the last message exceeded the model’s context window. In those cases, it could return all messages. In other cases it can lead to drop the user query entirely, leading to responses built only from tool messages without the original user query.

This update introduces robust context limiting and proportional chunk allocation:

New _limit_chunkspans
- Allocates available context tokens proportionally across tools based on their retrieved chunk sizes.
- Enforces the model’s context limit while taking into account other messages number of tokens (user, system, tool messages, TEMPLATE).
- Logs warnings when chunks are truncated due to window constraints.
Updated _clip
- Ensures the user query is always preserved after clipping (if possible).
- Warns when no messages fit or when clipping would remove the user query.
Updated add_context
- Added a new config required parameter, so ChunkSpans can be limited.
Updated test_rag
The file was updated so that chunk_spans is only asserted when the LLM does not start with "llama-cpp-python". This change was made because the test could fail when all chunk spans are dropped by _limit_chunkspans, especially when using the sat-1l-sm sentence splitter.
Integration
- _limit_chunkspans applied in add_context and _run_tools to constrain retrieved chunk spans before building messages.

These prevent context overflows, and ensure that context truncation is handled with clear warnings.

…ceeded

Copilot

Pull Request Overview

This PR fixes a critical bug where the context window token limit could be exceeded when messages are too large, and implements proactive chunk truncation to prevent oversized prompts from being sent to the model.

Key Changes:

Added limit_chunkspans() function to proactively truncate retrieved chunks before adding them to context, with automatic per-tool adjustment when multiple tool calls occur
Enhanced _clip() function with explicit edge case handling to return an empty list when even the last message exceeds the token limit
Added _num_queries tracking to RAGLiteConfig to enable dynamic token limit adjustment across multiple tool calls

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/raglite/_rag.py	Implements chunk truncation logic, edge case handling in message clipping, and tool call query counting
src/raglite/_config.py	Adds `_num_queries` field to track number of concurrent queries for token limit calculations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/raglite/_rag.py

emilradix

I think we need better handling for truncating multiple tool calls

src/raglite/_rag.py

src/raglite/_config.py

src/raglite/_rag.py

…d chunkspans.

…ontext window.

jirastorza added 4 commits October 15, 2025 13:35

feat: change sentence splitter to sat-1l-sm

c1a2e78

fix: _clip function, so context window is never exceeded.

34720cc

Merge remote-tracking branch 'origin/main' into fix/context_window_ex…

b280285

…ceeded

fix: set sat-3l-sm as sentence splitter by default.

6a39e23

jirastorza requested a review from emilradix October 21, 2025 08:11

fix: change warning for logging to avoid test case fail.

c843600

jirastorza marked this pull request as draft October 21, 2025 08:58

fix: adapt retrieve_context to avoid context exceeding.

f2b52e0

jirastorza closed this Oct 21, 2025

jirastorza reopened this Oct 21, 2025

jirastorza changed the title ~~Fix/context window exceeded~~ fix: context window exceeded Oct 21, 2025

jirastorza self-assigned this Oct 22, 2025

jirastorza marked this pull request as ready for review October 22, 2025 07:58

emilradix requested a review from Copilot October 22, 2025 08:02

Copilot AI reviewed Oct 22, 2025

View reviewed changes

src/raglite/_rag.py Outdated Show resolved Hide resolved

src/raglite/_rag.py Outdated Show resolved Hide resolved

src/raglite/_rag.py Outdated Show resolved Hide resolved

emilradix reviewed Oct 22, 2025

View reviewed changes

src/raglite/_rag.py Outdated Show resolved Hide resolved

src/raglite/_config.py Outdated Show resolved Hide resolved

src/raglite/_rag.py Outdated Show resolved Hide resolved

Robbe-Superlinear self-requested a review October 22, 2025 08:35

jirastorza added 2 commits October 23, 2025 08:10

fix: proportional tool call available tokens.

d795ae0

fix: add a small buffer to each tool call token limit..

a304a0b

jirastorza marked this pull request as draft October 23, 2025 08:20

jirastorza added 10 commits October 23, 2025 08:55

fix: check in _clip that the last user query is not being clipped.

b24b4b1

fix: modify _clip, if user query fits, include it.

26f3d30

fix: limit tool tokens proportionally to the toekns of their retrieve…

7665d6b

…d chunkspans.

fix: limit tool tokens proportionally to the toekns of their retrieve…

1916957

…d chunkspans.

fix: change chunkspan token count logic.

7ac6b9c

fix: increase CONTEXT_BUFFER.

d5ed59c

fix: calculate buffer for _limit_chunk_spans

106ee51

fix: change README and add_context calls.

ec3467b

fix: add tool_calls to context buffer.

13a41da

fix: manual rag test returns empty retrieved chunks due to exceeded c…

70c5bbe

…ontext window.

jirastorza closed this Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: context window exceeded #165

fix: context window exceeded #165

jirastorza commented Oct 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: context window exceeded #165

fix: context window exceeded #165

Conversation

jirastorza commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: Improved Context Management and Context Window Handling

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jirastorza commented Oct 21, 2025 •

edited

Loading