feat: change sentence splitter to sat-1l-sm and fix: context management. #164

jirastorza · 2025-10-15T13:50:24Z

This PR includes two fixes:

Fix: Improved Context Management and Context Window Handling

Previously, context size was managed only by _clip, which failed when even the last message exceeded the model’s context window. In those cases, it could return all messages. In other cases it can lead to drop the user query entirely, leading to responses built only from tool messages without the original user query.

This update introduces robust context limiting and proportional chunk allocation:

New _limit_chunkspans
- Allocates available context tokens proportionally across tools based on their retrieved chunk sizes.
- Enforces the model’s context limit while taking into account other messages number of tokens (user, system, tool messages, TEMPLATE).
- Logs warnings when chunks are truncated due to window constraints.
Updated _clip
- Ensures the user query is always preserved after clipping (if possible).
- Warns when no messages fit or when clipping would remove the user query.
Updated add_context
- Added a new config required parameter, so ChunkSpans can be limited.
Updated test_rag
- The file was updated so that chunk_spans is only asserted when the LLM does not start with "llama-cpp-python". This change was made because the test could fail when all chunk spans are dropped by _limit_chunkspans, especially when using the sat-1l-sm sentence splitter.
Integration
- _limit_chunkspans applied in add_context and _run_tools to constrain retrieved chunk spans before building messages.

These prevent context overflows, and ensure that context truncation is handled with clear warnings.

Fix: Change sentence splitter to `"sat-1l-sm"`

This pull request updates the sentence splitting logic by switching the SaT (Segment-any-Text) model used in _load_sat from "sat-3l-sm" to "sat-1l-sm" in src/raglite/_split_sentences.py.
The change reduces model size while maintaining comparable segmentation accuracy.

Benchmark results on the CUAD dataset (see attached plot) show that "sat-1l-sm" achieves similar performance to the larger "sat-3l-sm" model under the default Raglite Benchmark setup.
This makes it a better trade-off between efficiency and accuracy.

Model	English Score	Multilingual Score
sat-1l	88.5	84.3
sat-1l-sm	88.2	87.9
sat-3l-sm	96.5	93.5
source: https://github.com/segment-any-text/wtpsplit

We selected "sat-1l-sm" over "sat-1l" because it provides better multilingual performance with only a small trade-off in English accuracy.

jirastorza · 2025-10-15T14:15:44Z

Somehow changing the model has led to larger chunks, apparently leading to exceed the context window size (did not happen when benchmarking).

lsorber · 2025-10-19T12:52:34Z

My guess at why the tests are failing is that sat-1l-sm yields larger sentences and/or chunks, which end up taking up more context than we have room for in the tests. EDIT: Just saw your comment too @jirastorza!

emilradix · 2025-10-20T08:22:19Z

EDIT: the bug seems to be in the clip function, if the last message exceeds max_tokens then the first_message is set to zero, hence we return all messages
https://github.com/superlinear-ai/raglite/blob/main/src/raglite/_rag.py#L88-L92

So if I understand well, the reason the test is failing, is because in the config for the tests we are not specifying max_tokens of the model?
Edit: actually this should be automatically inferred from:
https://github.com/superlinear-ai/raglite/blob/main/src/raglite/_litellm.py#L329-L348
Which seems to be failing.

If this was detected correctly in test_rag.py, the text would get clipped at the max tokens?
https://github.com/superlinear-ai/raglite/blob/main/src/raglite/_rag.py#L188

@lsorber @jirastorza @Robbe-Superlinear Is this correct?

if so this leads to maybe a more important point: Surely we do not want to be just allowing context length to go above max token size, and then just clip the input down to the context size. We need to have some more clever way to limit this.

F.e. if we are passing chunk spans, the actual chunk needs to be in there, which as far as I can tell is not guaranteed (f.e. chunks at the end of a section)

jirastorza · 2025-10-20T09:04:24Z

The context window is being exceeded because the system prompt, retrieved chunk spans, and user prompt are all combined into a single message:

"""
---
You are a friendly and knowledgeable assistant that provides complete and insightful answers.
Whenever possible, use only the provided context to respond to the question at the end.
When responding, you MUST NOT reference the existence of the context, directly or indirectly.
Instead, you MUST treat the context as if its contents are entirely part of your working memory.
---

<context>
{context}
</context>

{user_prompt}
"""

In a new conversation, this results in only one message in the messages list. The _clip method is designed to remove entire messages from the left if the total token count exceeds the context window. However, when there is only one (very large) message, the cumulative token count calculation (cum_tokens > max_tokens) results in -np.searchsorted(cum_tokens, max_tokens) = 0, so the method returns the entire messages list without clipping anything. This means that if the message itself is too large, _clip cannot reduce its size, and the context window is still exceeded.

Additionally, although the maximum chunk size is set to 2048, the most relevant chunks are expanded with their neighbors using the retrieve_chunk_spans method. This means the actual retrieved context is much larger than expected, further increasing the risk of exceeding the context window.

edit: To address the context window issue, I will create a new branch named fix/context_window_exceeded and open a PR with a solution: when the context window is exceeded, the user will be warned and advised to reduce the number of retrieved chunks or use a model with a larger context window.

Do you agree? @emilradix @Robbe-Superlinear @lsorber

…ceeded

Copilot

Pull Request Overview

This PR changes the sentence splitting model from "sat-3l-sm" to "sat-1l-sm" to optimize the trade-off between model size and accuracy. Benchmark results show that the smaller model achieves comparable performance on the CUAD dataset while reducing resource requirements.

Key Changes:

Switched to a lighter-weight sentence segmentation model ("sat-1l-sm") that provides better multilingual support with minimal impact on English accuracy
Updated inline comment to reflect the new model's characteristics

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

…d chunkspans.

jirastorza · 2025-10-24T09:59:39Z

Let's first solve the exceeded context window problem in fix/context_window_exceeded branch. Once that is solved, the tests for this branch should pass.

…ontext window.

jirastorza · 2025-10-27T15:14:02Z

I merged fix/context_window_exceeded branch to this branch. So now we change the sentence splitter and solve the context window problem in the same PR.

…o sat-1l-sm

feat: change sentence splitter to sat-1l-sm

c1a2e78

jirastorza assigned jirastorza and Robbe-Superlinear Oct 15, 2025

jirastorza added 5 commits October 21, 2025 08:00

fix: _clip function, so context window is never exceeded.

34720cc

Merge remote-tracking branch 'origin/main' into fix/context_window_ex…

b280285

…ceeded

fix: set sat-3l-sm as sentence splitter by default.

6a39e23

fix: change warning for logging to avoid test case fail.

c843600

fix: adapt retrieve_context to avoid context exceeding.

f2b52e0

Robbe-Superlinear removed their assignment Oct 22, 2025

Robbe-Superlinear requested review from Robbe-Superlinear, Copilot and emilradix October 22, 2025 08:37

Copilot AI reviewed Oct 22, 2025

View reviewed changes

jirastorza added 6 commits October 23, 2025 08:10

fix: proportional tool call available tokens.

d795ae0

fix: add a small buffer to each tool call token limit..

a304a0b

fix: check in _clip that the last user query is not being clipped.

b24b4b1

fix: modify _clip, if user query fits, include it.

26f3d30

fix: limit tool tokens proportionally to the toekns of their retrieve…

7665d6b

…d chunkspans.

fix: limit tool tokens proportionally to the toekns of their retrieve…

1916957

…d chunkspans.

jirastorza marked this pull request as draft October 24, 2025 09:58

jirastorza added 5 commits October 27, 2025 08:29

fix: change chunkspan token count logic.

7ac6b9c

fix: increase CONTEXT_BUFFER.

d5ed59c

fix: calculate buffer for _limit_chunk_spans

106ee51

fix: change README and add_context calls.

ec3467b

fix: add tool_calls to context buffer.

13a41da

jirastorza added 2 commits October 27, 2025 13:55

fix: change sentence splitter back to sat-1l-sm.

56bef88

fix: manual rag test returns empty retrieved chunks due to exceeded c…

549a37d

…ontext window.

jirastorza closed this Oct 27, 2025

jirastorza reopened this Oct 27, 2025

jirastorza changed the title ~~feat: change sentence splitter to sat-1l-sm~~ feat: change sentence splitter to sat-1l-sm and fix context management. Oct 27, 2025

jirastorza changed the title ~~feat: change sentence splitter to sat-1l-sm and fix context management.~~ feat: change sentence splitter to sat-1l-sm and fix: context management. Oct 27, 2025

jirastorza added 2 commits October 27, 2025 15:36

fix: rerun tests

96917a0

fix: change expected sentences in test_split_sentences in accordace t…

8a74a3c

…o sat-1l-sm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: change sentence splitter to sat-1l-sm and fix: context management. #164

feat: change sentence splitter to sat-1l-sm and fix: context management. #164

jirastorza commented Oct 15, 2025 •

edited

Loading

Uh oh!

jirastorza commented Oct 15, 2025 •

edited

Loading

Uh oh!

lsorber commented Oct 19, 2025 •

edited

Loading

Uh oh!

emilradix commented Oct 20, 2025 •

edited

Loading

Uh oh!

jirastorza commented Oct 20, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

jirastorza commented Oct 24, 2025

Uh oh!

jirastorza commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: change sentence splitter to sat-1l-sm and fix: context management. #164

Are you sure you want to change the base?

feat: change sentence splitter to sat-1l-sm and fix: context management. #164

Conversation

jirastorza commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix: Improved Context Management and Context Window Handling

Fix: Change sentence splitter to "sat-1l-sm"

Uh oh!

jirastorza commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lsorber commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emilradix commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jirastorza commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

jirastorza commented Oct 24, 2025

Uh oh!

jirastorza commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jirastorza commented Oct 15, 2025 •

edited

Loading

Fix: Change sentence splitter to `"sat-1l-sm"`

jirastorza commented Oct 15, 2025 •

edited

Loading

lsorber commented Oct 19, 2025 •

edited

Loading

emilradix commented Oct 20, 2025 •

edited

Loading

jirastorza commented Oct 20, 2025 •

edited

Loading