Skip to content

MagellaX/llama3-8b-efficient-finetune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLaMA 3.1 8B Fine-tuning with Unsloth

Overview

This repository contains the implementation of fine-tuning LLaMA 3.1 8B using Unsloth's optimized training pipeline. The model is fine-tuned on the Alpaca dataset to enhance instruction-following capabilities.

Model Details

  • Base Model: unsloth/Meta-Llama-3.1-8B
  • Training Method: LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit
  • Max Sequence Length: 2048 tokens

Training Configuration

- Batch Size per Device: 2
- Gradient Accumulation Steps: 4
- Total Batch Size: 8
- Training Steps: 60
- Learning Rate: 2e-4
- Optimizer: AdamW (8-bit)
- Weight Decay: 0.01
- LR Scheduler: Linear

LoRA Parameters

- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target Modules: 
  - Query Projection
  - Key Projection
  - Value Projection
  - Output Projection
  - Gate Projection
  - Up/Down Projections

SOTA Results & Performance Metrics

Training Efficiency

  • Training Time: 9.6 minutes (575.85 seconds)
  • Steps: 60
  • Samples Processed: 51,760

Model Performance

  • Initial Loss: 1.587
  • Best Loss: 0.734 (Step 38)
  • Final Loss: 0.885
  • Loss Reduction: 44.2%

Memory Optimization

  • Peak Memory Usage: 7.371 GB
  • Base Memory: 6.004 GB
  • Training Overhead: 1.367 GB
  • Memory Efficiency: Only 9.269% of available GPU memory
  • Maximum GPU Memory: 14.748 GB

Key Achievements

  • Achieved sub-1.0 loss within 10 training steps
  • Maintained stable loss despite small batch size
  • Memory-efficient training with 4-bit quantization
  • Fast convergence with minimal computational resources
  • Demonstrated strong few-shot learning capabilities in testing

Inference Performance

Example output (Fibonacci sequence continuation):

Input: 1, 1, 2, 3, 5, 8
Output: 13
Explanation: The next number in the fibonacci sequence is the sum 
of the previous two. The previous two numbers are 5 and 8, 
so the next number is 5 + 8 = 13.

Hardware Requirements

  • GPU: Tesla T4
  • VRAM: 14.748 GB
  • CUDA: 12.1
  • PyTorch: 2.5.1+cu121

Usage

Installation

pip install unsloth
pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Training

See Llama 3.1-8b.ipynb for complete training pipeline

Inference

from unsloth import FastLanguageModel

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    "path_to_saved_model",
    max_seq_length=2048,
    load_in_4bit=True
)

# Enable faster inference
model = FastLanguageModel.for_inference(model)

License

MIT

Acknowledgements

  • Unsloth for optimization framework
  • Meta AI for LLaMA model
  • Yahma for cleaned Alpaca dataset

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published