Add rolling window prompt truncation #1016

La-Loutre · 2025-08-09T04:28:21Z

Automatically truncate prompts to fit model context with safety margin.
Transient supports enabling/disabling or setting a custom fraction (0.1-1.0).

#995

Automatically truncate prompts to fit model context with safety margin. Transient supports enabling/disabling or setting a custom fraction (0.1-1.0).

meain · 2025-08-09T18:16:06Z

A better way to manage long chats might be to warn users when we are reaching the end of the context window and allowing them to call something to compact the chat. This is what tools like claude-code or opencode does.

La-Loutre · 2025-08-10T10:17:57Z

I don't think there is an objectively better way; it really depends on who is in front of the computer. For my use case, I do not want to call commands or modify the chat history. I just want what worked, the conversation with the LLM, to continue working seamlessly, without interruption.

It's disabled by default, so it will not be used without the user's understanding and choice.

bajsicki · 2025-08-12T11:17:16Z

I don't know about other APIs, but llama.cpp supports getting the context window via the /props endpoint. It's the n_ctx key.

Could query that for accuracy.

Here's some code that should be easy to adapt (I use this for automatically switching presets/ model backends with llama.cpp):

#+begin_src elisp :results none
(defun gptel-got--model ()
  "This function retrieves the model name from the GPTel backend.

It sends a GET request to the backend's models endpoint to fetch the
available models. The function returns the model name (string) from the
first entry in the backend's model list, which should be sufficient
for local, single-model inference."
  (let ((result (gptel--url-retrieve (concat "http://" (gptel-backend-host gptel-backend) "/v1/models") :method "GET")))
    (file-name-nondirectory(plist-get (aref (plist-get result :models) 0) :model))))

(defun gptel-got--match-model ()
  "Finds the closest matching model name from the backend's model list.

This function uses string distance (i.e. Levenshtein distance) to compare
the target model name (from `gptel-got--model') against all models defined
for the backend. It returns the best-matching model name (string) from
the backend."

  (let ((got-model (gptel-got--model))
        (min most-positive-fixnum)
        (best-match nil))
    (cl-loop for model in (gptel-backend-models gptel-backend)
             do (let ((distance (string-distance got-model (symbol-name model))))
                  (when (< distance min)
		    (setq min distance)
		    (setq best-match model))))
    best-match))
#+end_src

La-Loutre · 2025-08-14T18:58:53Z

It looks interesting to fetch from the server that info. Although I think it's a slightly different topic.
This PR truncate the window based on gptel context.
Now is the gptel context set by fetching info from the server itself is a different question.

Add rolling window prompt truncation

ce5c799

Automatically truncate prompts to fit model context with safety margin. Transient supports enabling/disabling or setting a custom fraction (0.1-1.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add rolling window prompt truncation #1016

Add rolling window prompt truncation #1016

Uh oh!

La-Loutre commented Aug 9, 2025

Uh oh!

meain commented Aug 9, 2025

Uh oh!

La-Loutre commented Aug 10, 2025

Uh oh!

bajsicki commented Aug 12, 2025

Uh oh!

La-Loutre commented Aug 14, 2025

Uh oh!

Uh oh!

Add rolling window prompt truncation #1016

Are you sure you want to change the base?

Add rolling window prompt truncation #1016

Uh oh!

Conversation

La-Loutre commented Aug 9, 2025

Uh oh!

meain commented Aug 9, 2025

Uh oh!

La-Loutre commented Aug 10, 2025

Uh oh!

bajsicki commented Aug 12, 2025

Uh oh!

La-Loutre commented Aug 14, 2025

Uh oh!

Uh oh!