LLaMA 3.1 8B Fine-tuning with Unsloth

Overview

This repository contains the implementation of fine-tuning LLaMA 3.1 8B using Unsloth's optimized training pipeline. The model is fine-tuned on the Alpaca dataset to enhance instruction-following capabilities.

Model Details

Base Model: unsloth/Meta-Llama-3.1-8B
Training Method: LoRA (Low-Rank Adaptation)
Quantization: 4-bit
Max Sequence Length: 2048 tokens

Training Configuration

- Batch Size per Device: 2
- Gradient Accumulation Steps: 4
- Total Batch Size: 8
- Training Steps: 60
- Learning Rate: 2e-4
- Optimizer: AdamW (8-bit)
- Weight Decay: 0.01
- LR Scheduler: Linear

LoRA Parameters

- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target Modules: 
  - Query Projection
  - Key Projection
  - Value Projection
  - Output Projection
  - Gate Projection
  - Up/Down Projections

SOTA Results & Performance Metrics

Training Efficiency

Training Time: 9.6 minutes (575.85 seconds)
Steps: 60
Samples Processed: 51,760

Model Performance

Initial Loss: 1.587
Best Loss: 0.734 (Step 38)
Final Loss: 0.885
Loss Reduction: 44.2%

Memory Optimization

Peak Memory Usage: 7.371 GB
Base Memory: 6.004 GB
Training Overhead: 1.367 GB
Memory Efficiency: Only 9.269% of available GPU memory
Maximum GPU Memory: 14.748 GB

Key Achievements

Achieved sub-1.0 loss within 10 training steps
Maintained stable loss despite small batch size
Memory-efficient training with 4-bit quantization
Fast convergence with minimal computational resources
Demonstrated strong few-shot learning capabilities in testing

Inference Performance

Example output (Fibonacci sequence continuation):

Input: 1, 1, 2, 3, 5, 8
Output: 13
Explanation: The next number in the fibonacci sequence is the sum 
of the previous two. The previous two numbers are 5 and 8, 
so the next number is 5 + 8 = 13.

Hardware Requirements

GPU: Tesla T4
VRAM: 14.748 GB
CUDA: 12.1
PyTorch: 2.5.1+cu121

Usage

Installation

pip install unsloth
pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Training

See Llama 3.1-8b.ipynb for complete training pipeline

Inference

from unsloth import FastLanguageModel

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    "path_to_saved_model",
    max_seq_length=2048,
    load_in_4bit=True
)

# Enable faster inference
model = FastLanguageModel.for_inference(model)

License

MIT

Acknowledgements

Unsloth for optimization framework
Meta AI for LLaMA model
Yahma for cleaned Alpaca dataset

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Llama 3.1-8b.ipynb		Llama 3.1-8b.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLaMA 3.1 8B Fine-tuning with Unsloth

Overview

Model Details

Training Configuration

LoRA Parameters

SOTA Results & Performance Metrics

Training Efficiency

Model Performance

Memory Optimization

Key Achievements

Inference Performance

Hardware Requirements

Usage

Installation

Training

Inference

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

MagellaX/llama3-8b-efficient-finetune

Folders and files

Latest commit

History

Repository files navigation

LLaMA 3.1 8B Fine-tuning with Unsloth

Overview

Model Details

Training Configuration

LoRA Parameters

SOTA Results & Performance Metrics

Training Efficiency

Model Performance

Memory Optimization

Key Achievements

Inference Performance

Hardware Requirements

Usage

Installation

Training

Inference

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages