Skip to content

Conversation

@jirastorza
Copy link

@jirastorza jirastorza commented Oct 21, 2025

Fix: Improved Context Management and Context Window Handling

Previously, context size was managed only by _clip, which failed when even the last message exceeded the model’s context window. In those cases, it could return all messages. In other cases it can lead to drop the user query entirely, leading to responses built only from tool messages without the original user query.

This update introduces robust context limiting and proportional chunk allocation:

  • New _limit_chunkspans

    • Allocates available context tokens proportionally across tools based on their retrieved chunk sizes.
    • Enforces the model’s context limit while taking into account other messages number of tokens (user, system, tool messages, TEMPLATE).
    • Logs warnings when chunks are truncated due to window constraints.
  • Updated _clip

    • Ensures the user query is always preserved after clipping (if possible).
    • Warns when no messages fit or when clipping would remove the user query.
  • Updated add_context

    • Added a new config required parameter, so ChunkSpans can be limited.
  • Updated test_rag

  • The file was updated so that chunk_spans is only asserted when the LLM does not start with "llama-cpp-python". This change was made because the test could fail when all chunk spans are dropped by _limit_chunkspans, especially when using the sat-1l-sm sentence splitter.

  • Integration

    • _limit_chunkspans applied in add_context and _run_tools to constrain retrieved chunk spans before building messages.

These prevent context overflows, and ensure that context truncation is handled with clear warnings.

@jirastorza jirastorza requested a review from emilradix October 21, 2025 08:11
@jirastorza jirastorza marked this pull request as draft October 21, 2025 08:58
@jirastorza jirastorza closed this Oct 21, 2025
@jirastorza jirastorza reopened this Oct 21, 2025
@jirastorza jirastorza changed the title Fix/context window exceeded fix: context window exceeded Oct 21, 2025
@jirastorza jirastorza self-assigned this Oct 22, 2025
@jirastorza jirastorza marked this pull request as ready for review October 22, 2025 07:58
@emilradix emilradix requested a review from Copilot October 22, 2025 08:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug where the context window token limit could be exceeded when messages are too large, and implements proactive chunk truncation to prevent oversized prompts from being sent to the model.

Key Changes:

  • Added limit_chunkspans() function to proactively truncate retrieved chunks before adding them to context, with automatic per-tool adjustment when multiple tool calls occur
  • Enhanced _clip() function with explicit edge case handling to return an empty list when even the last message exceeds the token limit
  • Added _num_queries tracking to RAGLiteConfig to enable dynamic token limit adjustment across multiple tool calls

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/raglite/_rag.py Implements chunk truncation logic, edge case handling in message clipping, and tool call query counting
src/raglite/_config.py Adds _num_queries field to track number of concurrent queries for token limit calculations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@emilradix emilradix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need better handling for truncating multiple tool calls

@Robbe-Superlinear Robbe-Superlinear self-requested a review October 22, 2025 08:35
@jirastorza jirastorza marked this pull request as draft October 23, 2025 08:20
@jirastorza jirastorza closed this Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants