Add Gap Map: PR-by-PR analysis through Master Equation

theLightArchitect · claude · theLightArchitect · commit a39f61c529ae · 2026-03-27T10:04:38.000-07:00
Maps every top entry through BPB = L + Q + T + M: - openai#700 solved M (mixer) but has worst L (training) - openai#609 solved Q (quant) but has zero T and M (no eval pipeline) - openai#549 solved L (training) but has zero M (no mixer) - Nobody has optimized all four terms simultaneously - Theoretical optimal = 1.052 (combine best of each) - Our Track B path to 1.025 via recurrence + FiLM-only TTT + Mixer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Kevin Francis Tan <kf.tan@lightarchitects.io>
diff --git a/SESSION-STATE.md b/SESSION-STATE.md
@@ -139,6 +139,55 @@ Features implemented:
 All configurable via env vars. Default = simplified competitive config.
 ```
 
+## THE GAP MAP — Why Nobody Has Won Completely
+
+Every top entry leaves at least one term of BPB = L + Q + T + M at zero.
+
+```
+┌───────┬────────┬────────┬────────┬────────┬───────┬────────────────────────────┐
+│ Entry │ L(N,S) │ Q(b,m) │ T(p,e) │  M(k)  │ Total │       Fell short on        │
+├───────┼────────┼────────┼────────┼────────┼───────┼────────────────────────────┤
+│ #700  │ 1.130  │ +0.005 │ -0.041 │ -0.032 │ 1.054 │ L (worse training model)   │
+│ #609  │ 1.118  │ +0.003 │ 0.000  │ 0.000  │ 1.115 │ T and M (no eval pipeline) │
+│ #714  │ 1.120  │ +0.004 │ -0.003 │ 0.000  │ 1.119 │ M (no mixer)               │
+│ #549  │ 1.122  │ +0.004 │ -0.003 │ 0.000  │ 1.119 │ M (no mixer)               │
+│ #414  │ 1.125  │ +0.004 │ 0.000  │ 0.000  │ 1.123 │ T and M (no eval pipeline) │
+├───────┼────────┼────────┼────────┼────────┼───────┼────────────────────────────┤
+│ OURS  │ 1.186  │ +0.066 │ -0.020 │ (w/ T) │ 1.252 │ L (features), Q (Hessian)  │
+└───────┴────────┴────────┴────────┴────────┴───────┴────────────────────────────┘
+
+THEORETICAL OPTIMAL (nobody has built this):
+  L: #549's training (1.1218) + Q: #609's GPTQ (+0.003) + T: #700's TTT (-0.041) + M: #700's Mixer (-0.032)
+  = 1.052
+
+OUR TRACK A TARGET: Close L gap (add features) + close Q gap (Hessian GPTQ) + keep T+M
+OUR TRACK B TARGET: Better L via recurrence (wider, same N) + better T via FiLM-only TTT + M
+```
+
+### PR-by-PR Through the Master Equation
+
+**#700 (1.0541)**: Solved M(k). 5-expert Hedge Mixer = -0.032. 4-epoch TTT = -0.041. WORSE training (1.1254) but BEST eval. Bet: eval > training.
+
+**#609 (1.1154)**: Solved Q(b,m). Full Hessian GPTQ = 0.003 gap. XSA-all = free. T and M = ZERO. Left 0.06 BPB on the table.
+
+**#714 (1.1187)**: Solved a BUG. RoPE NTK fix. One constant worth more than all architecture innovations. std=0.00024.
+
+**#549 (1.1194)**: Solved L(N,S). Best training: 83.4ms/step, 7,185 steps. LeakyReLU² = -0.0021. M = ZERO. Irony: best training + best eval don't coexist.
+
+**#414 (1.1233)**: The foundation. Invented GPTQ-lite. Every submission forks this.
+
+### Track B Path to 1.0
+
+```
+L(N=1.2× via recurrence at 640d, S=unlimited) ≈ 1.10 pre-quant
+Q(int5, GPTQ-lite + CROWN-Q)                  ≈ +0.005
+T(FiLM-only TTT, 6 epochs)                    ≈ -0.04
+M(5-expert Hedge Mixer)                        ≈ -0.04
+= 1.025
+```
+
+---
+
 ## INFRASTRUCTURE
 
 ```