Skip to content

Rust/Candle forge pipeline — replace Python with native inference + training #88

@joelteply

Description

@joelteply

Why

Python is the liability. Every OOM, every memory leak, every GIL bottleneck traces back to Python's inability to control memory at the level we need. We already forked Candle and run inference natively in continuum. The forging pipeline should be the same.

What Moves to Rust

Phase 1: Inference-only operations (low risk)

  • Model loading with explicit memory mapping (mmap, no Python allocator)
  • Perplexity evaluation (forward pass only)
  • Head importance computation (L2 norm of weight slices)
  • Head pruning (zero weight slices or attention masks)
  • Text generation for output samples

Phase 2: Training operations (needs Candle training support)

  • LoRA adapter creation and forward/backward
  • Gradient computation through attention + MLP layers
  • AdamW optimizer with 8-bit state support
  • Gradient checkpointing (selective recomputation)

Phase 3: Full pipeline

  • continuum-core IPC command: model/forge
  • Orchestrated from jtag, executed in Rust worker
  • Memory budget enforced at allocation time, not after OOM

Architecture

jtag model/forge --model Qwen/Qwen3.5-27B --domain code
  → IPC → continuum-core Rust worker
    → mmap model weights (zero-copy loading)
    → evaluate baseline (Candle forward pass)
    → for each cycle:
        → compute head importance (weight L2 norms)
        → zero pruned heads (direct tensor mutation)
        → LoRA training (Candle backward + AdamW)
        → evaluate post-training
    → save safetensors (direct write, no Python serialization)
    → generate samples

Candle Status

Our forked Candle already handles:

  • Qwen model loading and inference
  • Safetensors read/write
  • CUDA kernel dispatch
  • Mixed precision (fp16/bf16)

Still needed:

  • LoRA adapter layer implementation
  • Backward pass for attention + MLP (autograd)
  • Optimizer state management
  • Gradient checkpointing

Dependencies

See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions