MoE surgery: extract individual experts for targeted training + tiny deployment

## The Vision

Take a 35B MoE model. Extract just the code expert (~4-5B). Fine-tune it. Publish it.
OpenCode/VSCode users get a 4B model with the coding depth of a 35B.

## Why This Is Huge

MoE models have N experts, but users typically need 1-2. A coder needs the code expert.
A reasoning tool needs the reasoning expert. A creative writer needs the creative expert.

**Current reality:** You download 35B and use 3B of it. Wasteful.
**With MoE surgery:** You download 4B and get 35B-quality in your domain.

## Technical Approach

1. **Load the full 35B model** (needs >32GB, use CPU offload or multi-GPU)
2. **Identify experts by domain** — the router/gate network decides which expert activates for which tokens. Run coding prompts through the model, track which expert(s) activate most for code tokens
3. **Extract expert weights** — save just the router + shared attention + target expert(s)
4. **Convert to dense** — remove the MoE routing, make it a standard dense model
5. **Fine-tune the extracted expert** — LoRA on top for Continuum tool system
6. **Publish** — `continuum-ai/qwen3.5-4b-code-cont` (extracted from 35B MoE code expert)

## What Makes This Novel

- Nobody publishes individual extracted experts from MoE models
- The naming convention tells users exactly what they're getting
- Plasticity compaction on top → even smaller
- Candle can load the extracted dense model (no MoE support needed)

## Size Estimates

| Config | Params | GGUF Q4_K_M | Fits MacBook Air? |
|--------|--------|-------------|-------------------|
| Full MoE | 35B | 21.2GB | No |
| 1 expert extracted | ~4-5B | ~3GB | Yes |
| 2 experts extracted | ~8-9B | ~6GB | Yes |
| Compacted 1 expert | ~2-3B | ~2GB | Yes (phone?) |

## Related
- #433 (MoE expert paging — runtime version of this)
- #417 (Qwen3.5 evaluation)
- #430 (MoE training)
- Plasticity compaction pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MoE surgery: extract individual experts for targeted training + tiny deployment #439

The Vision

Why This Is Huge

Technical Approach

What Makes This Novel

Size Estimates

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Config	Params	GGUF Q4_K_M	Fits MacBook Air?
Full MoE	35B	21.2GB	No
1 expert extracted	~4-5B	~3GB	Yes
2 experts extracted	~8-9B	~6GB	Yes
Compacted 1 expert	~2-3B	~2GB	Yes (phone?)

MoE surgery: extract individual experts for targeted training + tiny deployment #439

Description

The Vision

Why This Is Huge

Technical Approach

What Makes This Novel

Size Estimates

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions