openai · nothingLiva · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026
diff --git a/records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/README.md b/records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/README.md
@@ -0,0 +1,57 @@
+# Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence
+
+**val_bpb: 1.0954 (3-seed mean)**
+
+## Results
+
+| Seed | val_bpb | Artifact Size |
+|------|---------|---------------|
+| 1337 | 1.0953 | 15.82 MB |
+| 42 | 1.0950 | 15.81 MB |
+| 2024 | 1.0958 | 15.83 MB |
+| **Mean** | **1.0954** | **~15.82 MB** |
+| **Std** | **0.0004** | |
+
+## Base
+
+This submission builds on **PR #1435** (11L Depth Recurrence + BigramHash + EMA 0.9965, by AbhayAnandUCSD). Full credit to the original architecture.
+
+## Innovations
+
+### 1. Frequency-Weighted GPTQ Calibration (novel)
+Standard GPTQ calibration treats all tokens equally when collecting Hessians. We weight activations from the top-100 most frequent tokens (covering ~53% of all text, per Zipf's law) with a 2x boost during Hessian accumulation. This biases GPTQ to minimize quantization error preferentially on high-frequency tokens at zero artifact size cost.
+
+### 2. Frequency-Weighted Embedding Quantization (novel, NothingLiVa)
+Top-100 most frequent tokens -> INT8, remaining 924 tokens -> INT6. High-frequency tokens disproportionately impact loss — allocating higher precision where it matters most.
+
+### 3. Sandwich Layer 10 -> INT8
+Final transformer layer quantized to INT8 instead of INT6, protecting signal quality before LM head. Uses ~0.75 MB of available headroom.
+
+### 4. Hyperparameter Tuning
+- LR 1.4x: matrix_lr 0.02 -> 0.028, scalar_lr 0.02 -> 0.028, tied_embed_lr 0.03 -> 0.042
+- QK-Gain 6.0 (from 5.0): improved attention scaling
+- Warmdown 0.60 (from 0.667): longer low-LR phase
+
+## Training Command
+
+```bash
+RUN_ID=freqgptq_combo_s10 \
+SEED=1337 \
+MAX_WALLCLOCK_SECONDS=600 \
+torchrun --standalone --nproc_per_node=8 train_gpt.py
+```
+
+## Hardware
+8x NVIDIA H100 80GB SXM, ~590s training + ~97s sliding window eval
+
+## Checklist
+- [x] Artifact < 16,000,000 bytes (all 3 seeds)
+- [x] Training < 600s wall clock
+- [x] Causal sliding-window evaluation (stride=64)
+- [x] Credit to base PR #1435 (AbhayAnandUCSD)
+
+## Acknowledgments
+- Base architecture: PR #1435 by AbhayAnandUCSD
+- Base on  Frequency-Weighted Embedding Quantization: Closed by me: PR #1042 (NothingLiVa)
+- Frequency-Weighted GPTQ Calibration: new contribution (this PR)
+- OpenAI for hosting the Parameter Golf challenge
diff --git a/records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/submission.json b/records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/submission.json
@@ -0,0 +1,12 @@
+{
+  "name": "NothingLiVa",
+  "github_id": "nothingLiVa",
+  "val_bpb": 1.0954,
+  "val_bpb_seeds": [1.0953, 1.0950, 1.0958],
+  "seeds": [1337, 42, 2024],
+  "artifact_size_bytes": [15817827, 15811465, 15826942],
+  "train_time_seconds": 590,
+  "hardware": "8x H100 80GB SXM",
+  "base_pr": 1435,
+  "description": "Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence BPB 1.0954"
+}