Compile small open-source LLMs into FPGA-ready Verilog + weight files.
Most tools let you run a model on hardware. Fornax does something different — it translates the model into hardware.
HuggingFace Model → [ Fornax ] → Verilog + Weights
Inspired by what Taalas is doing with custom ASICs: instead of loading weights at runtime, the weights become the circuit.
Fornax brings this idea to the open-source world, targeting FPGA developers first.
✅ M4 is complete for Qwen2-0.5B (INT8).
What is validated today:
- End-to-end graph:
embedding -> 24 layers -> final_rmsnorm -> lm_head - Native hidden size:
DIM=896 - Strict attention path:
QK^T -> scale -> softmax -> scores@V - RTL exact match:
- token e2e with
vocab4k: PASS - token e2e with
vocab16k: PASS
- token e2e with
- PyTorch reference:
QK/Softmax/Context/LM Head logits:MAE=0,COS=1top1token: match- full vocab reference (
151936): PASS
1. Parse — extract weights + compute graph from any small HuggingFace model
2. Convert — quantize (INT8) + normalize operators into a standard IR
3. Generate — emit Verilog modules + .mem weight files
4. Verify — simulate with iverilog, compare against PyTorch reference
Block / layer-stack RTL:
./run_m3_validation.sh 128Strict attention (seq_len=1):
FORNAX_STRICT_ATTN=1 ./run_m3_validation.sh 128Strict attention (seq_len=4):
FORNAX_STRICT_ATTN=1 FORNAX_STRICT_SEQ_LEN=4 ./run_m3_validation.sh 128Multi-layer chain (example: 2 layers):
FORNAX_NUM_LAYERS=2 FORNAX_STRICT_ATTN=1 ./run_m3_validation.sh 128Strict attention + PyTorch reference metrics:
FORNAX_STRICT_ATTN=1 FORNAX_STRICT_SEQ_LEN=4 FORNAX_TORCH_REF=1 ./run_m3_validation.sh 128Token e2e RTL (staged vocab):
FORNAX_NUM_LAYERS=24 FORNAX_STRICT_ATTN=1 FORNAX_STRICT_SEQ_LEN=1 \
FORNAX_ENABLE_EMBED=1 FORNAX_ENABLE_FINAL_NORM=1 FORNAX_ENABLE_LM_HEAD=1 \
FORNAX_TOKEN_ID=42 FORNAX_VOCAB_LIMIT=4096 FORNAX_TORCH_REF=1 \
./run_m3_validation.sh 896FORNAX_NUM_LAYERS=24 FORNAX_STRICT_ATTN=1 FORNAX_STRICT_SEQ_LEN=1 \
FORNAX_ENABLE_EMBED=1 FORNAX_ENABLE_FINAL_NORM=1 FORNAX_ENABLE_LM_HEAD=1 \
FORNAX_TOKEN_ID=42 FORNAX_VOCAB_LIMIT=16384 FORNAX_TORCH_REF=1 \
./run_m3_validation.sh 896Full-vocab reference compare (no RTL run):
FORNAX_NUM_LAYERS=24 FORNAX_STRICT_ATTN=1 FORNAX_TOKEN_ID=42 \
./run_m4_full_vocab_ref.sh 896CI regression:
python verify/run_regression.py
python verify/run_regression.py --strict-attn
FORNAX_STRICT_SEQ_LEN=4 python verify/run_regression.py --strict-attnNotes:
run_m3_validation.shwrites tooutput/by default.- Use different
FORNAX_OUTPUT_DIRfor parallel or overnight runs.
FORNAX_OUTPUT_DIR=./artifacts/run_a FORNAX_NUM_LAYERS=24 FORNAX_STRICT_ATTN=1 ./run_m3_validation.sh 128
FORNAX_OUTPUT_DIR=./artifacts/run_b FORNAX_NUM_LAYERS=2 FORNAX_STRICT_ATTN=1 FORNAX_STRICT_SEQ_LEN=4 ./run_m3_validation.sh 896If parsed model data is not in ./output:
FORNAX_MODEL_DATA_DIR=/path/to/model_dump FORNAX_OUTPUT_DIR=./artifacts/run_a ./run_m3_validation.sh 128Debug mode:
# Default: no TRACE/LN-DEBUG/VCD (faster)
# Enable debug traces and wave dump only when needed:
FORNAX_DEBUG=1 ./run_m3_validation.sh 128Small open-source models under 2B parameters: Qwen2-0.5B, Llama 3.2 1B, Phi-3 Mini, Gemma 2B.
- No open-source tool converts Transformers to Verilog
- TVM is too general
- FINN only supports CNNs
- Taalas is closed source
Fornax fills that gap.
Fornax — Latin for "furnace". Where models are recast into silicon.