Non-record: GPTQ-lite Scale Clamp Fix + 6-bit Packing + Depth Recurrence on Stack B by Rome-1 · Pull Request #1389 · openai/parameter-golf

Rome-1 · 2026-04-05T19:44:55Z

Summary

Non-record submission with three quantization contributions on the Stack B foundation (PR #1218 / #1260 lineage):

1. GPTQ-lite Scale Clamp Bug Fix

The original GPTQ-lite computes scale = (row_clip / clip_range).clamp_min(1/clip_range). For int6 (clip_range=31), this floors the scale at 1/31 ≈ 0.032 — but typical weight row maxima are O(0.01–0.05), so the clamp fires on most rows and wastes ~90% of quantization dynamic range. Fix: clamp_min(1e-7) (same as the int8 path). One-line fix, affects every int6-quantized tensor.

2. 6-bit Packing

Pack 4 int6 values into 3 bytes instead of storing in int8. 25% payload reduction for all int6 tensors, directly helps fit under 16MB. ~10 lines each for pack/unpack, negligible eval overhead.

3. Forced Int8 for Depth-Recurrence Shared Layers

When layers share weights (depth recurrence), quantization error compounds through each reuse. Solution: force int8 (127 levels) for shared layers while keeping int6 for non-shared layers. Small byte cost, meaningful quality gain.

Architecture

11L / 512d / 4096 vocab / 4x MLP / GQA (8h/4kv)
LeakyReLU(0.5)², partial RoPE (16/64), LN scale, XSA-all, EMA, MuonEq-R
Depth recurrence (shared MLP layers 4,5)
Mixed int6/int8 GPTQ-lite with scale fix + 6-bit packing + zstd-22

Status

Non-record — validated on 1xH100 NVL (717 steps, undertrained). No 8xH100 run yet. Submitting for the quantization findings which may help others.

Credits

Stack B: PR #1218, #1260. MuonEq-R/depth recurrence: @signalrush. XSA: @abaybektursun. LeakyReLU²: @parinzee, @sofiabod. GPTQ-lite baseline: PR #374, #1019.

…th recurrence Three quantization contributions on the Stack B foundation: 1. GPTQ-lite scale clamp bug fix: original clamp_min(1/clip_range) wastes ~90% of int6 dynamic range when weight magnitudes are small. Fix: clamp_min(1e-7). 2. 6-bit packing: pack 4 int6 values into 3 bytes (25% payload reduction). 3. Forced int8 for depth-recurrence shared layers: quantization error amplifies through weight reuse, so shared MLP layers get int8 while non-shared layers keep int6. Non-record submission — validated on 1xH100 NVL (717 steps, undertrained). No 8xH100 run yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: GPTQ-lite Scale Clamp Fix + 6-bit Packing + Depth Recurrence on Stack B#1389

Non-record: GPTQ-lite Scale Clamp Fix + 6-bit Packing + Depth Recurrence on Stack B#1389
Rome-1 wants to merge 1 commit intoopenai:mainfrom
Rome-1:submission/stackb-gptqlite-depth-recurrence

Rome-1 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rome-1 commented Apr 5, 2026

Summary

1. GPTQ-lite Scale Clamp Bug Fix

2. 6-bit Packing

3. Forced Int8 for Depth-Recurrence Shared Layers

Architecture

Status

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant