Grab the data
oxen clone http://hub.oxen.ai/ox/Dad-Jokes
Fine tune with LORA
time python fine_tune.py ~/Datasets/Dad-Jokes/data/train.parquet results/
Run trained model on GPU
python run_fine_tuned.py results/final_checkpoint/
Merge the LORA weights
python merge_lora_model.py results/final_checkpoint/ results/merged_model
Convert the merged model from hf to ggml
python ~/Code/3rdParty/strutive07/llama.cpp/convert.py results/merged_model/ --outtype f16 --outfile results/merged.bin --vocab-dir meta-llama/Llama-2-7b-hf --vocabtype hf
Run the ggml model on CPU
python run_on_cpu.py --model results/merged.bin --prompt prompts/joke_prompt.txt
Quantize the model to q8_0
~/Code/3rdParty/strutive07/llama.cpp/build/bin/quantize results/merged.bin results/merged_ggml_q8_0.bin q8_0
Run the quantized model
python run_on_cpu.py --model results/merged_ggml_q8_0.bin --prompt prompts/joke_prompt.txt
Quantize the model to q4_0
~/Code/3rdParty/strutive07/llama.cpp/build/bin/quantize results/merged.bin results/merged_ggml_q4_0.bin q4_0
Run the quantized model
python run_on_cpu.py --model results/merged_ggml_q8_0.bin --prompt prompts/joke_prompt.txt