-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Description
Problem
The 5090 tower (32GB VRAM, RTX 5090) has PEFT + transformers + bitsandbytes installed but NOT Unsloth. Unsloth is our preferred training framework (2x faster LoRA, memory efficient). Without it, fine-tuning is slower and uses more VRAM.
Additionally, the Qwen3.5-35B-A3B model is MoE (Mixture of Experts) which has specific LoRA training considerations.
MoE Training Challenges
- Memory: All 35B params loaded during training (not just 3B active). At FP16 = ~70GB — won't fit in 32GB VRAM without quantization
- Quantized training: Need QLoRA (4-bit base + LoRA adapters) — PEFT + bitsandbytes support this
- Target layers: LoRA attaches to shared attention layers (q_proj, k_proj, v_proj, o_proj). Expert FFN layers are usually frozen.
- Router gradients: The MoE gate/router may or may not need gradient updates during LoRA training
- Unsloth MoE: Unsloth added Qwen3 MoE support — need to verify with this specific model
Steps
- Install Unsloth on 5090 tower:
pip install unsloth - Verify Unsloth loads Qwen3.5-35B-A3B with 4-bit quantization
- Test LoRA training on a small dataset (10 examples, 1 epoch)
- Measure VRAM usage during training (target: <30GB with QLoRA)
- If Unsloth doesn't support MoE: fall back to PEFT directly
- Wire successful training path into Academy pipeline
Hardware
- RTX 5090: 32GB VRAM, CUDA 12, 170 SMs
- Model:
/home/joel/.continuum/models/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-Q4_K_M.gguf - Note: For training, need the Safetensors version (not GGUF). May need to download
Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilledin Safetensors format.
Related
- Evaluate Qwen3.5-35B-A3B as local inference model — Opus reasoning distilled, 3B active #417 (model evaluation)
- Academy: no full training session proven end-to-end #377 (full academy session e2e)
- Academy teacher requires cloud API key — train local teacher adapter #374 (local teacher — this model could BE the teacher)
- Ship a LoRA-tuned local model that passes coding challenges via our tool system #344 (ship LoRA-tuned model)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels