Rust/Candle forge pipeline — replace Python with native inference + training

## Why

Python is the liability. Every OOM, every memory leak, every GIL bottleneck traces back to Python's inability to control memory at the level we need. We already forked Candle and run inference natively in continuum. The forging pipeline should be the same.

## What Moves to Rust

### Phase 1: Inference-only operations (low risk)
- Model loading with explicit memory mapping (mmap, no Python allocator)
- Perplexity evaluation (forward pass only)
- Head importance computation (L2 norm of weight slices)
- Head pruning (zero weight slices or attention masks)
- Text generation for output samples

### Phase 2: Training operations (needs Candle training support)
- LoRA adapter creation and forward/backward
- Gradient computation through attention + MLP layers
- AdamW optimizer with 8-bit state support
- Gradient checkpointing (selective recomputation)

### Phase 3: Full pipeline
- `continuum-core` IPC command: `model/forge`
- Orchestrated from jtag, executed in Rust worker
- Memory budget enforced at allocation time, not after OOM

## Architecture

```
jtag model/forge --model Qwen/Qwen3.5-27B --domain code
  → IPC → continuum-core Rust worker
    → mmap model weights (zero-copy loading)
    → evaluate baseline (Candle forward pass)
    → for each cycle:
        → compute head importance (weight L2 norms)
        → zero pruned heads (direct tensor mutation)
        → LoRA training (Candle backward + AdamW)
        → evaluate post-training
    → save safetensors (direct write, no Python serialization)
    → generate samples
```

## Candle Status

Our forked Candle already handles:
- Qwen model loading and inference
- Safetensors read/write
- CUDA kernel dispatch
- Mixed precision (fp16/bf16)

Still needed:
- LoRA adapter layer implementation
- Backward pass for attention + MLP (autograd)
- Optimizer state management
- Gradient checkpointing

## Dependencies
- #83 — model/forge as jtag command
- #88 — Python forge pipeline (the thing this replaces)
- continuum Candle fork

## See Also
- #88 — Python implementation (ship now, replace later)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust/Candle forge pipeline — replace Python with native inference + training #88

Why

What Moves to Rust

Phase 1: Inference-only operations (low risk)

Phase 2: Training operations (needs Candle training support)

Phase 3: Full pipeline

Architecture

Candle Status

Dependencies

See Also

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rust/Candle forge pipeline — replace Python with native inference + training #88

Description

Why

What Moves to Rust

Phase 1: Inference-only operations (low risk)

Phase 2: Training operations (needs Candle training support)

Phase 3: Full pipeline

Architecture

Candle Status

Dependencies

See Also

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions