Skip to content

Non-record: TurboQuant mixed-precision int4/int5 (val_bpb=1.1521)#1238

Open
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:approach-d
Open

Non-record: TurboQuant mixed-precision int4/int5 (val_bpb=1.1521)#1238
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:approach-d

Conversation

@ibarrajo
Copy link
Copy Markdown

@ibarrajo ibarrajo commented Apr 1, 2026

Summary

  • TurboQuant: role-based mixed-precision quantization — Q/K at int5, V/O and MLP at int4 in middle layers, boundary layers at int5
  • D2 variant (int4 dominant): saves 2.1MB vs uniform int5 but costs +0.034 BPB
  • D1 variant (int3): too aggressive at 1.73 BPB — int3 destroys model quality
  • Useful negative result documenting weight quantization sensitivity by layer role

Results

Metric Value
val_bpb D2 (TTT s_0) 1.1521
val_bpb D2 (base) 1.1751
val_bpb D1 (int3) 1.73 (unusable)
Artifact size D2 13.4 MB (2.6 MB headroom)
Current SOTA 1.1147

Key Findings

  • Int4 is viable but costly: +0.034 BPB vs int5, saves 2.1MB artifact space
  • Int3 is not viable: destroys model quality entirely (1.73 BPB)
  • Weight quantization sensitivity differs from KV cache activation sensitivity: layers that tolerate KV cache quantization may not tolerate weight quantization at the same precision
  • Role-based allocation helps: Q/K projections are most sensitive to quantization; keeping them at int5 while dropping V/O/MLP to int4 is better than uniform int4
  • Non-record: 1.1521 does not beat SOTA of 1.1147; the 2.6MB savings could theoretically fund a larger model but the quality gap is too large

Rule Compliance

  • Training time < 600s
  • Eval time < 600s
  • Artifact < 16MB (13.4MB)
  • No val tokens in artifact
  • Score-first TTT only (s_0 reported)
  • Single-pass evaluation

🤖 Generated with Claude Code

…al_bpb=1.1521)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant