Replies: 2 comments
-
Support for Gemma was just added and released in 2.9.0 15 hours ago. #1734 |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ithax-wb config.yaml - name: gemma-2b-it
context_size: 2048
f16: true
gpu_layers: 90
mmap: true
trimsuffix:
- "\n"
parameters:
model: gemma-2b-it-q8_0.gguf
# temperature: 0.2
# top_k: 40
# top_p: 0.95
# seed: -1
template:
chat_message: chat
chat: chat-block
completion: completion chat.tmpl <start_of_turn>{{if eq .RoleName "assistant"}}model{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<end_of_turn> chat-block.tmpl <bos>{{.Input}}
<start_of_turn>model completion.tmpl {{.Input}} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to use Google's new opensource models Gemma 2B, Gemma 2B Instruct, Gemma 7B and Gemma 7B Instruct with LocalAI. I tried to build the yaml file by myself but I just can't get it to work. Can somebody help me?
Here is some information provided by google regarding the prompting: https://github.com/huggingface/blog/blob/main/gemma.md#prompt-format
Prompt format
The base models have no prompt format. Like other base models, they can be used to continue an input sequence with a plausible continuation or for zero-shot/few-shot inference. They are also a great foundation for fine-tuning on your own use cases. The Instruct versions have a very simple conversation structure:
<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
LaMDA<end_of_turn>
<start_of_turn>model
LaMDA who?<end_of_turn>
This format has to be exactly reproduced for effective use. We’ll later show how easy it is to reproduce the instruct prompt with the chat template available in transformers:
from transformers import AutoTokenizer, pipeline
import torch
model = "google/gemma-7b-it"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
messages = [
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
prompt,
max_new_tokens=256,
add_special_tokens=True,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Beta Was this translation helpful? Give feedback.
All reactions