-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FIM for models using ChatML format #142
Comments
Thanks @ChrisDeadman. This is very interesting I will look into it for a future release. |
Hey @ChrisDeadman I tried this but didn't get much luck, how can we deal with prefix and suffix, can we write a |
Yes that should be possible, would make it easier for me to try out - if I find something usable I could make PR of a chatml template then. |
on second thought, the templates need to support some kind of flag which tells your response parser that the last part of the template (e.g. the 3 backticks) should be prepended to the actual model response before parsing it. |
Under python using huggingface transformer templates you can get the "generation prompt" like this:
Because the
So only the generation prompt is returned. Maybe something similar could be done with the hbs templates. |
Hey, sorry but I'm still unsure about it. If you could adapt it to an hbs template it might be clearer? I did try to adapt the fim completions to use templates but didn't know what format it should be. |
Ollama automatically wraps whatever you pass to the /generate endpoint with a template (unless you turn it off with The default mistral template is pretty boring - https://ollama.com/library/mistral:latest/blobs/e6836092461f - but a lot of them follow that same format - https://ollama.com/library/dolphincoder:latest/blobs/62fbfd9ed093. The ollama generate endpoint does allow overriding both the default model template and system message. Not sure if other systems (like vllm) do though. |
If I understand the hbs syntax correctly this should work for your existing fim stuff:
Just supply the 3 variables as args or pass only |
For chatml it could be something like this (not tested):
|
I understand I believe. I'll add another template into my PR (#174) on the next update @ChrisDeadman are you using ollama as the backend? (Or if not, what are you using?) |
I wrote a custom server - I added ollama compatible API to run this extension over it. |
Ah. Ollama does normally wrap the prompt passed to /generate with a model specific template, unless I think probably all autocomplete requests should use Long term, it'd be cool to be able to edit both the chat and FIM templates as HBS in vs code the same way command templates currently can be. For now I'll just add a extra template to the code. |
Hey, FYI, this should now work with any ChatML endpoint as the provider as I added the ability to edit and choose a custom FIM template. The first test would be using OpenAI API through LiteLLM with GPT3.5 or GPT4. I am still unsure if Ollama support ChatML? Also should I add Here is a template I have been using with GPT-4 with pretty good success:
|
imo this looks like a great approach 馃憤馃徏 |
Sorry if I just missed it, but I don't really understand how to make this work. I'm currently running a fairly large model through Ollama (https://ollama.com/wojtek/beyonder), and it'd be great if I could use it for FIM as well. Some additional info: Editor: VSCodium Model's HuggingFace page: https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF
Thanks 馃槄 馃檹 馃挏 |
So I tested this brifly with Llama-3 by selecting "custom template" in the FIM Provider settings and modifying the templates like so:
What I found is:
But other than that it seems to work 馃槂 Here is a screenshot: It would be nice to be able to repeat the last line of the prefix as the model response (e.g. |
Thanks @ChrisDeadman I think this can be arranged. Is there anything else datawise you'd like passed to the template? Many thanks, |
Thanks @rjmacarthy ! I cannot think of anything else that is missing at the moment, should be enough to support chatml and llama-3 templates imo. |
I've added the language to the template now, but I think there is some inconsistencies with how it works still which I need to iron out. I had mixed results still with llama3:8b. |
Did you also add an option to get the last line of |
No I didn't actually, I can add it. |
That would be awesome, I will do some tests when ready 馃槂 |
First of all: Your extension is awesome, thanks for all your effort in making it better constantly! 馃憤馃徏
FIM doesn't work for Mistral-7B-Instruct-v0.2-code-ft.
I know that the ChatML format is mostly suited for turn-based conversations.
However, except for the suggestions, you've already refactored your code to use the turn-based Ollama endpoint...
I get the reason why you have to use the generate endpoint of Ollama for FIM, which sucks a bit because you have to support all the different turn-templates in that case 馃
This is still a big issue for all client applications that want to control models to answer in a specific way.
If you would try out ChatML support tho, you could try appending the start of the expected response from the model after the template, e.g.:
I have tried this out manually and it works. Basically everything you write after
<|im_start|>assistant
will make the model think it started it's answer like that (works like that for basically all models, not just for ChatML-based models).For reference, this is the correct huggingface tokenizer template for ChatML:
The text was updated successfully, but these errors were encountered: