Skip to content

Python forge pipeline — ship Qwen3.5 forged models now #89

@joelteply

Description

@joelteply

Status

scripts/forge_model.py v3 is committed and ready. Memory-tiered architecture:

  • Tier A (4B fp16): ~11GB VRAM, 21GB headroom
  • Tier B (9B fp16): ~21GB VRAM, 11GB headroom
  • Tier C (27B 4-bit): ~18GB VRAM, 14GB headroom

What's Done

  • LoRA training (r=16, alpha=32, q/k/v/o_proj)
  • Gradient accumulation (configurable per tier)
  • use_reentrant=False for LoRA + gradient checkpointing
  • Forward-hook pruning for 4-bit models
  • GQA-aware head importance and pruning
  • Nested config handling (Qwen3.5 VLM architecture)
  • 8-bit AdamW for Tier C
  • Domain required in model names
  • VRAM monitoring at phase boundaries
  • Auto tier selection

What's Left

  • Validate on Qwen3.5-4B (tower needs reboot from OOM crash)
  • Forge and publish: continuum-ai/qwen3.5-4b-general-forged
  • Forge and publish: continuum-ai/qwen3.5-9b-general-forged
  • Forge and publish: continuum-ai/qwen3.5-27b-general-forged (the headline)
  • Code domain: forge with code training data (The Stack / StarCoder)
  • Update model cards with domain-specific generation samples

Usage

python scripts/forge_model.py Qwen/Qwen3.5-4B --domain general
python scripts/forge_model.py Qwen/Qwen3.5-9B --domain general  
python scripts/forge_model.py Qwen/Qwen3.5-27B --domain general  # auto 4-bit
python publish_forged.py output/forged/qwen3.5-27b/ --domain general

This is temporary

Python is the fast path to published models. The real pipeline will be Rust/Candle — see #88.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions