Skip to content

Example of fine tuning llama-2 and exporting to ggml to run on cpu

Notifications You must be signed in to change notification settings

Oxen-AI/Llama-Fine-Tune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to fine tune and export to GGML

Grab the data

oxen clone http://hub.oxen.ai/ox/Dad-Jokes

Fine tune with LORA

time python fine_tune.py ~/Datasets/Dad-Jokes/data/train.parquet results/

Run trained model on GPU

python run_fine_tuned.py results/final_checkpoint/

Merge the LORA weights

python merge_lora_model.py results/final_checkpoint/ results/merged_model

Convert the merged model from hf to ggml

python ~/Code/3rdParty/strutive07/llama.cpp/convert.py results/merged_model/ --outtype f16 --outfile results/merged.bin --vocab-dir meta-llama/Llama-2-7b-hf --vocabtype hf

Run the ggml model on CPU

python run_on_cpu.py --model results/merged.bin --prompt prompts/joke_prompt.txt

Quantize the model to q8_0

~/Code/3rdParty/strutive07/llama.cpp/build/bin/quantize results/merged.bin results/merged_ggml_q8_0.bin q8_0

Run the quantized model

python run_on_cpu.py --model results/merged_ggml_q8_0.bin --prompt prompts/joke_prompt.txt

Quantize the model to q4_0

~/Code/3rdParty/strutive07/llama.cpp/build/bin/quantize results/merged.bin results/merged_ggml_q4_0.bin q4_0

Run the quantized model

python run_on_cpu.py --model results/merged_ggml_q8_0.bin --prompt prompts/joke_prompt.txt

About

Example of fine tuning llama-2 and exporting to ggml to run on cpu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages