Skip to content

Commit a39f61c

Browse files
Add Gap Map: PR-by-PR analysis through Master Equation
Maps every top entry through BPB = L + Q + T + M: - openai#700 solved M (mixer) but has worst L (training) - openai#609 solved Q (quant) but has zero T and M (no eval pipeline) - openai#549 solved L (training) but has zero M (no mixer) - Nobody has optimized all four terms simultaneously - Theoretical optimal = 1.052 (combine best of each) - Our Track B path to 1.025 via recurrence + FiLM-only TTT + Mixer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Kevin Francis Tan <kf.tan@lightarchitects.io>
1 parent 05c7ce3 commit a39f61c

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

SESSION-STATE.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,55 @@ Features implemented:
139139
All configurable via env vars. Default = simplified competitive config.
140140
```
141141

142+
## THE GAP MAP — Why Nobody Has Won Completely
143+
144+
Every top entry leaves at least one term of BPB = L + Q + T + M at zero.
145+
146+
```
147+
┌───────┬────────┬────────┬────────┬────────┬───────┬────────────────────────────┐
148+
│ Entry │ L(N,S) │ Q(b,m) │ T(p,e) │ M(k) │ Total │ Fell short on │
149+
├───────┼────────┼────────┼────────┼────────┼───────┼────────────────────────────┤
150+
│ #700 │ 1.130 │ +0.005 │ -0.041 │ -0.032 │ 1.054 │ L (worse training model) │
151+
│ #609 │ 1.118 │ +0.003 │ 0.000 │ 0.000 │ 1.115 │ T and M (no eval pipeline) │
152+
│ #714 │ 1.120 │ +0.004 │ -0.003 │ 0.000 │ 1.119 │ M (no mixer) │
153+
│ #549 │ 1.122 │ +0.004 │ -0.003 │ 0.000 │ 1.119 │ M (no mixer) │
154+
│ #414 │ 1.125 │ +0.004 │ 0.000 │ 0.000 │ 1.123 │ T and M (no eval pipeline) │
155+
├───────┼────────┼────────┼────────┼────────┼───────┼────────────────────────────┤
156+
│ OURS │ 1.186 │ +0.066 │ -0.020 │ (w/ T) │ 1.252 │ L (features), Q (Hessian) │
157+
└───────┴────────┴────────┴────────┴────────┴───────┴────────────────────────────┘
158+
159+
THEORETICAL OPTIMAL (nobody has built this):
160+
L: #549's training (1.1218) + Q: #609's GPTQ (+0.003) + T: #700's TTT (-0.041) + M: #700's Mixer (-0.032)
161+
= 1.052
162+
163+
OUR TRACK A TARGET: Close L gap (add features) + close Q gap (Hessian GPTQ) + keep T+M
164+
OUR TRACK B TARGET: Better L via recurrence (wider, same N) + better T via FiLM-only TTT + M
165+
```
166+
167+
### PR-by-PR Through the Master Equation
168+
169+
**#700 (1.0541)**: Solved M(k). 5-expert Hedge Mixer = -0.032. 4-epoch TTT = -0.041. WORSE training (1.1254) but BEST eval. Bet: eval > training.
170+
171+
**#609 (1.1154)**: Solved Q(b,m). Full Hessian GPTQ = 0.003 gap. XSA-all = free. T and M = ZERO. Left 0.06 BPB on the table.
172+
173+
**#714 (1.1187)**: Solved a BUG. RoPE NTK fix. One constant worth more than all architecture innovations. std=0.00024.
174+
175+
**#549 (1.1194)**: Solved L(N,S). Best training: 83.4ms/step, 7,185 steps. LeakyReLU² = -0.0021. M = ZERO. Irony: best training + best eval don't coexist.
176+
177+
**#414 (1.1233)**: The foundation. Invented GPTQ-lite. Every submission forks this.
178+
179+
### Track B Path to 1.0
180+
181+
```
182+
L(N=1.2× via recurrence at 640d, S=unlimited) ≈ 1.10 pre-quant
183+
Q(int5, GPTQ-lite + CROWN-Q) ≈ +0.005
184+
T(FiLM-only TTT, 6 epochs) ≈ -0.04
185+
M(5-expert Hedge Mixer) ≈ -0.04
186+
= 1.025
187+
```
188+
189+
---
190+
142191
## INFRASTRUCTURE
143192

144193
```

0 commit comments

Comments
 (0)