5090 tower: install Unsloth + verify MoE LoRA training works

## Problem

The 5090 tower (32GB VRAM, RTX 5090) has PEFT + transformers + bitsandbytes installed but NOT Unsloth. Unsloth is our preferred training framework (2x faster LoRA, memory efficient). Without it, fine-tuning is slower and uses more VRAM.

Additionally, the Qwen3.5-35B-A3B model is MoE (Mixture of Experts) which has specific LoRA training considerations.

## MoE Training Challenges

1. **Memory**: All 35B params loaded during training (not just 3B active). At FP16 = ~70GB — won't fit in 32GB VRAM without quantization
2. **Quantized training**: Need QLoRA (4-bit base + LoRA adapters) — PEFT + bitsandbytes support this
3. **Target layers**: LoRA attaches to shared attention layers (q_proj, k_proj, v_proj, o_proj). Expert FFN layers are usually frozen.
4. **Router gradients**: The MoE gate/router may or may not need gradient updates during LoRA training
5. **Unsloth MoE**: Unsloth added Qwen3 MoE support — need to verify with this specific model

## Steps

1. [ ] Install Unsloth on 5090 tower: `pip install unsloth`
2. [ ] Verify Unsloth loads Qwen3.5-35B-A3B with 4-bit quantization
3. [ ] Test LoRA training on a small dataset (10 examples, 1 epoch)
4. [ ] Measure VRAM usage during training (target: <30GB with QLoRA)
5. [ ] If Unsloth doesn't support MoE: fall back to PEFT directly
6. [ ] Wire successful training path into Academy pipeline

## Hardware

- RTX 5090: 32GB VRAM, CUDA 12, 170 SMs
- Model: `/home/joel/.continuum/models/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-Q4_K_M.gguf`
- Note: For training, need the Safetensors version (not GGUF). May need to download `Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled` in Safetensors format.

## Related
- #417 (model evaluation)
- #377 (full academy session e2e)
- #374 (local teacher — this model could BE the teacher)
- #344 (ship LoRA-tuned model)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5090 tower: install Unsloth + verify MoE LoRA training works #430

Problem

MoE Training Challenges

Steps

Hardware

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

5090 tower: install Unsloth + verify MoE LoRA training works #430

Description

Problem

MoE Training Challenges

Steps

Hardware

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions