-
Couldn't load subscription status.
- Fork 100
feat: change sentence splitter to sat-1l-sm and fix: context management. #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Somehow changing the model has led to larger chunks, apparently leading to exceed the context window size (did not happen when benchmarking). |
|
My guess at why the tests are failing is that |
|
EDIT: the bug seems to be in the clip function, if the last message exceeds max_tokens then the first_message is set to zero, hence we return all messages So if I understand well, the reason the test is failing, is because in the config for the tests we are not specifying max_tokens of the model? If this was detected correctly in test_rag.py, the text would get clipped at the max tokens? @lsorber @jirastorza @Robbe-Superlinear Is this correct? if so this leads to maybe a more important point: Surely we do not want to be just allowing context length to go above max token size, and then just clip the input down to the context size. We need to have some more clever way to limit this. F.e. if we are passing chunk spans, the actual chunk needs to be in there, which as far as I can tell is not guaranteed (f.e. chunks at the end of a section) |
|
The context window is being exceeded because the system prompt, retrieved chunk spans, and user prompt are all combined into a single message: In a new conversation, this results in only one message in the messages list. The Additionally, although the maximum chunk size is set to 2048, the most relevant chunks are expanded with their neighbors using the edit: To address the context window issue, I will create a new branch named Do you agree? @emilradix @Robbe-Superlinear @lsorber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the sentence splitting model from "sat-3l-sm" to "sat-1l-sm" to optimize the trade-off between model size and accuracy. Benchmark results show that the smaller model achieves comparable performance on the CUAD dataset while reducing resource requirements.
Key Changes:
- Switched to a lighter-weight sentence segmentation model (
"sat-1l-sm") that provides better multilingual support with minimal impact on English accuracy - Updated inline comment to reflect the new model's characteristics
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Let's first solve the exceeded context window problem in |
|
I merged |
This PR includes two fixes:
Fix: Improved Context Management and Context Window Handling
Previously, context size was managed only by
_clip, which failed when even the last message exceeded the model’s context window. In those cases, it could return all messages. In other cases it can lead to drop the user query entirely, leading to responses built only from tool messages without the original user query.This update introduces robust context limiting and proportional chunk allocation:
New
_limit_chunkspansUpdated
_clipUpdated
add_contextconfigrequired parameter, so ChunkSpans can be limited.Updated
test_rag"llama-cpp-python". This change was made because the test could fail when all chunk spans are dropped by_limit_chunkspans, especially when using thesat-1l-smsentence splitter.Integration
_limit_chunkspansapplied inadd_contextand_run_toolsto constrain retrieved chunk spans before building messages.These prevent context overflows, and ensure that context truncation is handled with clear warnings.
Fix: Change sentence splitter to
"sat-1l-sm"This pull request updates the sentence splitting logic by switching the SaT (Segment-any-Text) model used in _load_sat from
"sat-3l-sm"to"sat-1l-sm"insrc/raglite/_split_sentences.py.The change reduces model size while maintaining comparable segmentation accuracy.
Benchmark results on the CUAD dataset (see attached plot) show that
"sat-1l-sm"achieves similar performance to the larger"sat-3l-sm"model under the default Raglite Benchmark setup.This makes it a better trade-off between efficiency and accuracy.
We selected
"sat-1l-sm"over"sat-1l"because it provides better multilingual performance with only a small trade-off in English accuracy.