Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
98506aa
Create README.md
nothingLiva Mar 28, 2026
bff18b5
Create submission
nothingLiva Mar 28, 2026
4f5913c
Rename submission to submission.json
nothingLiva Mar 28, 2026
3ec3b84
Add files via upload
nothingLiva Mar 28, 2026
fa15c59
Update README.md
nothingLiva Mar 28, 2026
4a66e13
Update submission.json
nothingLiva Mar 30, 2026
3739b11
Update README.md
nothingLiva Apr 7, 2026
0a21c4a
Update submission.json
nothingLiva Apr 7, 2026
77299c3
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 7, 2026
7199788
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 7, 2026
3e91290
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 7, 2026
34d2e69
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 7, 2026
876108d
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 7, 2026
6f84c19
Update README.md
nothingLiva Apr 8, 2026
e0f9e06
Add files via upload
nothingLiva Apr 8, 2026
da4a1ac
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 8, 2026
826808f
Merge branch 'openai:main' into main
nothingLiva Apr 8, 2026
bd178dc
Create 2026-04-11_FreqWeightedGPTQ_AdaptPrecision_L10-INT8_LR1.4x_QK6…
nothingLiva Apr 11, 2026
f6540cf
Add files via upload
nothingLiva Apr 11, 2026
5dbb9b0
Delete records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_AdaptPrec…
nothingLiva Apr 11, 2026
d48357a
Delete records/track_10min_16mb/submission.json
nothingLiva Apr 11, 2026
f253267
Delete records/track_10min_16mb/train_gpt.py
nothingLiva Apr 11, 2026
6f92039
Delete records/track_10min_16mb/train_seed1337_log.txt
nothingLiva Apr 11, 2026
9263fa7
Delete records/track_10min_16mb/train_seed2024_log.txt
nothingLiva Apr 11, 2026
bdd4939
Delete records/track_10min_16mb/train_seed42_log.txt
nothingLiva Apr 11, 2026
6e6bec2
Create README.md
nothingLiva Apr 11, 2026
0fe57a0
Add files via upload
nothingLiva Apr 11, 2026
cc8cb03
Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…
nothingLiva Apr 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence

**val_bpb: 1.0954 (3-seed mean)**

## Results

| Seed | val_bpb | Artifact Size |
|------|---------|---------------|
| 1337 | 1.0953 | 15.82 MB |
| 42 | 1.0950 | 15.81 MB |
| 2024 | 1.0958 | 15.83 MB |
| **Mean** | **1.0954** | **~15.82 MB** |
| **Std** | **0.0004** | |

## Base

This submission builds on **PR #1435** (11L Depth Recurrence + BigramHash + EMA 0.9965, by AbhayAnandUCSD). Full credit to the original architecture.

## Innovations

### 1. Frequency-Weighted GPTQ Calibration (novel)
Standard GPTQ calibration treats all tokens equally when collecting Hessians. We weight activations from the top-100 most frequent tokens (covering ~53% of all text, per Zipf's law) with a 2x boost during Hessian accumulation. This biases GPTQ to minimize quantization error preferentially on high-frequency tokens at zero artifact size cost.

### 2. Frequency-Weighted Embedding Quantization (novel, NothingLiVa)
Top-100 most frequent tokens -> INT8, remaining 924 tokens -> INT6. High-frequency tokens disproportionately impact loss — allocating higher precision where it matters most.

### 3. Sandwich Layer 10 -> INT8
Final transformer layer quantized to INT8 instead of INT6, protecting signal quality before LM head. Uses ~0.75 MB of available headroom.

### 4. Hyperparameter Tuning
- LR 1.4x: matrix_lr 0.02 -> 0.028, scalar_lr 0.02 -> 0.028, tied_embed_lr 0.03 -> 0.042
- QK-Gain 6.0 (from 5.0): improved attention scaling
- Warmdown 0.60 (from 0.667): longer low-LR phase

## Training Command

```bash
RUN_ID=freqgptq_combo_s10 \
SEED=1337 \
MAX_WALLCLOCK_SECONDS=600 \
torchrun --standalone --nproc_per_node=8 train_gpt.py
```

## Hardware
8x NVIDIA H100 80GB SXM, ~590s training + ~97s sliding window eval

## Checklist
- [x] Artifact < 16,000,000 bytes (all 3 seeds)
- [x] Training < 600s wall clock
- [x] Causal sliding-window evaluation (stride=64)
- [x] Credit to base PR #1435 (AbhayAnandUCSD)

## Acknowledgments
- Base architecture: PR #1435 by AbhayAnandUCSD
- Base on Frequency-Weighted Embedding Quantization: Closed by me: PR #1042 (NothingLiVa)
- Frequency-Weighted GPTQ Calibration: new contribution (this PR)
- OpenAI for hosting the Parameter Golf challenge
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"name": "NothingLiVa",
"github_id": "nothingLiVa",
"val_bpb": 1.0954,
"val_bpb_seeds": [1.0953, 1.0950, 1.0958],
"seeds": [1337, 42, 2024],
"artifact_size_bytes": [15817827, 15811465, 15826942],
"train_time_seconds": 590,
"hardware": "8x H100 80GB SXM",
"base_pr": 1435,
"description": "Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence BPB 1.0954"
}
Loading