-
Notifications
You must be signed in to change notification settings - Fork 0
Rust/Candle forge pipeline — replace Python with native inference + training #88
Copy link
Copy link
Open
Description
Why
Python is the liability. Every OOM, every memory leak, every GIL bottleneck traces back to Python's inability to control memory at the level we need. We already forked Candle and run inference natively in continuum. The forging pipeline should be the same.
What Moves to Rust
Phase 1: Inference-only operations (low risk)
- Model loading with explicit memory mapping (mmap, no Python allocator)
- Perplexity evaluation (forward pass only)
- Head importance computation (L2 norm of weight slices)
- Head pruning (zero weight slices or attention masks)
- Text generation for output samples
Phase 2: Training operations (needs Candle training support)
- LoRA adapter creation and forward/backward
- Gradient computation through attention + MLP layers
- AdamW optimizer with 8-bit state support
- Gradient checkpointing (selective recomputation)
Phase 3: Full pipeline
continuum-coreIPC command:model/forge- Orchestrated from jtag, executed in Rust worker
- Memory budget enforced at allocation time, not after OOM
Architecture
jtag model/forge --model Qwen/Qwen3.5-27B --domain code
→ IPC → continuum-core Rust worker
→ mmap model weights (zero-copy loading)
→ evaluate baseline (Candle forward pass)
→ for each cycle:
→ compute head importance (weight L2 norms)
→ zero pruned heads (direct tensor mutation)
→ LoRA training (Candle backward + AdamW)
→ evaluate post-training
→ save safetensors (direct write, no Python serialization)
→ generate samples
Candle Status
Our forked Candle already handles:
- Qwen model loading and inference
- Safetensors read/write
- CUDA kernel dispatch
- Mixed precision (fp16/bf16)
Still needed:
- LoRA adapter layer implementation
- Backward pass for attention + MLP (autograd)
- Optimizer state management
- Gradient checkpointing
Dependencies
- model/forge and model/compact as jtag commands — not scripts #83 — model/forge as jtag command
- Rust/Candle forge pipeline — replace Python with native inference + training #88 — Python forge pipeline (the thing this replaces)
- continuum Candle fork
See Also
- Rust/Candle forge pipeline — replace Python with native inference + training #88 — Python implementation (ship now, replace later)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels