You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Gap Map: PR-by-PR analysis through Master Equation
Maps every top entry through BPB = L + Q + T + M:
- openai#700 solved M (mixer) but has worst L (training)
- openai#609 solved Q (quant) but has zero T and M (no eval pipeline)
- openai#549 solved L (training) but has zero M (no mixer)
- Nobody has optimized all four terms simultaneously
- Theoretical optimal = 1.052 (combine best of each)
- Our Track B path to 1.025 via recurrence + FiLM-only TTT + Mixer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Kevin Francis Tan <kf.tan@lightarchitects.io>
OUR TRACK A TARGET: Close L gap (add features) + close Q gap (Hessian GPTQ) + keep T+M
164
+
OUR TRACK B TARGET: Better L via recurrence (wider, same N) + better T via FiLM-only TTT + M
165
+
```
166
+
167
+
### PR-by-PR Through the Master Equation
168
+
169
+
**#700 (1.0541)**: Solved M(k). 5-expert Hedge Mixer = -0.032. 4-epoch TTT = -0.041. WORSE training (1.1254) but BEST eval. Bet: eval > training.
170
+
171
+
**#609 (1.1154)**: Solved Q(b,m). Full Hessian GPTQ = 0.003 gap. XSA-all = free. T and M = ZERO. Left 0.06 BPB on the table.
172
+
173
+
**#714 (1.1187)**: Solved a BUG. RoPE NTK fix. One constant worth more than all architecture innovations. std=0.00024.
174
+
175
+
**#549 (1.1194)**: Solved L(N,S). Best training: 83.4ms/step, 7,185 steps. LeakyReLU² = -0.0021. M = ZERO. Irony: best training + best eval don't coexist.
176
+
177
+
**#414 (1.1233)**: The foundation. Invented GPTQ-lite. Every submission forks this.
178
+
179
+
### Track B Path to 1.0
180
+
181
+
```
182
+
L(N=1.2× via recurrence at 640d, S=unlimited) ≈ 1.10 pre-quant
0 commit comments