Skip to content

Conversation

La-Loutre
Copy link

Automatically truncate prompts to fit model context with safety margin.
Transient supports enabling/disabling or setting a custom fraction (0.1-1.0).

#995

Automatically truncate prompts to fit model context with safety
margin.  Transient supports enabling/disabling or setting a custom
fraction (0.1-1.0).
@meain
Copy link
Contributor

meain commented Aug 9, 2025

A better way to manage long chats might be to warn users when we are reaching the end of the context window and allowing them to call something to compact the chat. This is what tools like claude-code or opencode does.

@La-Loutre
Copy link
Author

I don't think there is an objectively better way; it really depends on who is in front of the computer. For my use case, I do not want to call commands or modify the chat history. I just want what worked, the conversation with the LLM, to continue working seamlessly, without interruption.

It's disabled by default, so it will not be used without the user's understanding and choice.

@bajsicki
Copy link

I don't know about other APIs, but llama.cpp supports getting the context window via the /props endpoint. It's the n_ctx key.

Could query that for accuracy.

Here's some code that should be easy to adapt (I use this for automatically switching presets/ model backends with llama.cpp):

#+begin_src elisp :results none
(defun gptel-got--model ()
  "This function retrieves the model name from the GPTel backend.

It sends a GET request to the backend's models endpoint to fetch the
available models. The function returns the model name (string) from the
first entry in the backend's model list, which should be sufficient
for local, single-model inference."
  (let ((result (gptel--url-retrieve (concat "http://" (gptel-backend-host gptel-backend) "/v1/models") :method "GET")))
    (file-name-nondirectory(plist-get (aref (plist-get result :models) 0) :model))))

(defun gptel-got--match-model ()
  "Finds the closest matching model name from the backend's model list.

This function uses string distance (i.e. Levenshtein distance) to compare
the target model name (from `gptel-got--model') against all models defined
for the backend. It returns the best-matching model name (string) from
the backend."

  (let ((got-model (gptel-got--model))
        (min most-positive-fixnum)
        (best-match nil))
    (cl-loop for model in (gptel-backend-models gptel-backend)
             do (let ((distance (string-distance got-model (symbol-name model))))
                  (when (< distance min)
		    (setq min distance)
		    (setq best-match model))))
    best-match))
#+end_src

@La-Loutre
Copy link
Author

It looks interesting to fetch from the server that info. Although I think it's a slightly different topic.
This PR truncate the window based on gptel context.
Now is the gptel context set by fetching info from the server itself is a different question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants