Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Approach D: TurboQuant-Guided Mixed Precision

**Status: Testing**

## Concept

Apply TurboQuant+ findings (ICLR 2026) to weight quantization:

1. **V compression is free** — V/O projection weights at int3 in middle layers
2. **All quality degradation comes from K** — Q/K projection weights at int5 (high precision)
3. **Boundary layers are sensitive** — first 2 + last 2 layers at int5 for all weights

## Bit Width Assignment

| Weight type | Boundary layers (0,1,9,10) | Middle layers (2-8) |
|-------------|---------------------------|---------------------|
| Q, K projections | int5 | int5 |
| V, O projections | int5 | **int3** |
| MLP weights | int5 | **int3** |

Effective average: ~4.2 bits/param (vs 5.0 for uniform int5)

## Expected Impact

- Smaller artifact → more headroom for bigger model or less pruning
- QAT-aligned: each CastedLinear uses its own clip range during training
- GPTQ-aware: Hessian-based quantization respects per-tensor bit widths

## Architecture

Same as Approach B (d=576, 33.6M params, MLP 3.5x) with mixed precision quantization.

## Rule Compliance

- GPTQ calibration within 600s training budget
- No TTT re-scoring
- Artifact < 16MB (asserted)
- Eval < 600s (asserted)
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
# Approach D: TurboQuant-guided mixed precision quantization
# V/O and MLP at int3 in middle layers, Q/K at int5 everywhere, boundary layers at int5
pip install --break-system-packages zstandard 2>/dev/null
NCCL_IB_DISABLE=1 SEED=${SEED:-1337} \
torchrun --standalone --nproc_per_node=8 train_gpt.py 2>&1 | tee /workspace/run_d.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"author": "ibarrajo",
"github_id": "ibarrajo",
"name": "TurboQuant-Guided Mixed Precision (V@int3, QK@int5, boundary@int5)",
"blurb": "Mixed precision quantization guided by TurboQuant+ findings: V/O projections and MLP at int3 in middle layers, Q/K at int5 everywhere, boundary layers (first 2 + last 2) at int5. Per-tensor QAT alignment. Based on ICLR 2026 findings that V compression is free and all quality loss comes from K compression.",
"date": "2026-03-31",
"val_bpb": 0.0,
"val_loss": 0.0,
"bytes_total": 0
}
Loading