-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
The Vision
Take a 35B MoE model. Extract just the code expert (~4-5B). Fine-tune it. Publish it.
OpenCode/VSCode users get a 4B model with the coding depth of a 35B.
Why This Is Huge
MoE models have N experts, but users typically need 1-2. A coder needs the code expert.
A reasoning tool needs the reasoning expert. A creative writer needs the creative expert.
Current reality: You download 35B and use 3B of it. Wasteful.
With MoE surgery: You download 4B and get 35B-quality in your domain.
Technical Approach
- Load the full 35B model (needs >32GB, use CPU offload or multi-GPU)
- Identify experts by domain — the router/gate network decides which expert activates for which tokens. Run coding prompts through the model, track which expert(s) activate most for code tokens
- Extract expert weights — save just the router + shared attention + target expert(s)
- Convert to dense — remove the MoE routing, make it a standard dense model
- Fine-tune the extracted expert — LoRA on top for Continuum tool system
- Publish —
continuum-ai/qwen3.5-4b-code-cont(extracted from 35B MoE code expert)
What Makes This Novel
- Nobody publishes individual extracted experts from MoE models
- The naming convention tells users exactly what they're getting
- Plasticity compaction on top → even smaller
- Candle can load the extracted dense model (no MoE support needed)
Size Estimates
| Config | Params | GGUF Q4_K_M | Fits MacBook Air? |
|---|---|---|---|
| Full MoE | 35B | 21.2GB | No |
| 1 expert extracted | ~4-5B | ~3GB | Yes |
| 2 experts extracted | ~8-9B | ~6GB | Yes |
| Compacted 1 expert | ~2-3B | ~2GB | Yes (phone?) |
Related
- MoE expert paging: load only the needed expert on demand, page rest from HF cache #433 (MoE expert paging — runtime version of this)
- Evaluate Qwen3.5-35B-A3B as local inference model — Opus reasoning distilled, 3B active #417 (Qwen3.5 evaluation)
- 5090 tower: install Unsloth + verify MoE LoRA training works #430 (MoE training)
- Plasticity compaction pipeline
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels