This notebook provides a comprehensive workflow to fine-tune Meta's LLaMA 2 language model using Hugging Face's transformers
and datasets
libraries. The pipeline includes data preparation, prompt formatting (especially for instruction-tuned models), tokenizer setup, LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, and final model training.
LLaMA 2 models are powerful foundational LLMs. Fine-tuning them on domain-specific or task-specific data enables significantly improved performance. This notebook implements:
- Installation of required libraries.
- Use of chat-style prompt templates.
- Instruction tuning using a defined input/output prompt schema.
- Use of Hugging Face datasets for loading and formatting data.
- Efficient fine-tuning using the LoRA technique.
- Model saving and inference.
- Orignal Dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco
- Reformat Dataset following the Llama 2 template with 1k sample: https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
- Complete Reformat Dataset following the Llama 2 template: https://huggingface.co/datasets/mlabonne/guanaco-llama2
For LLaMA 2 instruction tuning, we use the following prompt schema:
<|system|>
System prompt (optional)
<|user|>
User question or instruction
<|assistant|>
Model answer
This helps align the model’s behavior in a chat-like instruction-following manner.
The notebook installs the following key packages:
transformers
peft
accelerate
bitsandbytes
datasets
scipy
sentencepiece
The notebook automatically installs all the required Python libraries via pip.
Load datasets using load_dataset()
and format them using the chat prompt schema. Ensure your dataset has columns like instruction
, input
, and output
.
The tokenizer is initialized with the pretrained LLaMA 2 tokenizer. Prompts are tokenized with appropriate padding and truncation.
LoRA is used for parameter-efficient tuning. Key parameters configured:
r
: Low-rank dimension.alpha
: Scaling factor.target_modules
: Specific modules (e.g.,q_proj
,v_proj
) to be adapted.- Training uses Hugging Face's
Trainer
API.
After training, the adapter model and tokenizer are saved for inference.
- Base Model:
meta-llama/Llama-2-7b-chat-hf
- Trainer: Hugging Face
transformers.Trainer
- Training Technique: LoRA from
peft
- Dataset Format: Instruction-following with system/user/assistant roles
The notebook includes a visual of the prompt formatting style used in instruction-tuned LLaMA 2 models:
This fine-tuning approach is ideal for:
- Chatbots tailored to specific domains (e.g., healthcare, finance).
- Document summarization and Q&A systems.
- Conversational agents with domain expertise.
- Academic or organizational knowledge assistants.
You can run the notebook to obtain a fine-tuned LLaMA 2 model checkpoint that performs well on your custom task. The training process includes evaluation metrics and logs for analysis.
Feel free to fork, modify, and expand upon this notebook for your own applications!