Non-record: Phase 1 Legal Score-First TTT + Meta-TTT (FOMAML) — awaiting compute#494
Non-record: Phase 1 Legal Score-First TTT + Meta-TTT (FOMAML) — awaiting compute#494george11642 wants to merge 1 commit intoopenai:mainfrom
Conversation
Non-record submission building on PR openai#462's architecture with: - XSA on all 11 layers (was 4) - Cosine TTT 30 epochs with per-layer LR groups - GPTQ-lite optimal clip percentile search - Legal score-first TTT protocol - Meta-TTT (FOMAML) in development Awaiting compute for validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: Phase 1 Legal Score-First TTT + Meta-TTT (FOMAML) — awaiting computeBPB: 0.012 (cache parse — may be delta/std, not val_bpb; check PR title) | Compliance: FLAG — Pre-Quant TTT runs multi-epoch on What I found in the code (head SHA At line 977 the pre-quant TTT function takes Per Issue #402 and Issue #677 (@valerio-oai, 2026-03-27), TTT is valid only if each token is scored BEFORE the adapter trains on it; multi-epoch TTT that scores only on the final pass is explicitly called out as invalid. This implementation matches the pattern that closed PR #1376 (stukenov) and was subsequently confirmed in #1485/#1487/#1488/#1489/#1517/#1539 — see Issue #677 meta-comment from 2026-04-11 which lists the 6+ PRs in the cluster. Contrast with the legal Pre-Quant TTT pattern (e.g. PR #1416 / PR #1423 lineage): those train the adapter on a held-out slice of training data (not CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=68533 B, SMOKE_TEST_PASS Verdict: COMPLIANCE FLAG — same pattern as the closed Pre-Quant TTT cluster. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as #1376 and the rest of the cluster. A resubmission with the TTT function taking a training-data slice instead of Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=68533 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Summary
Non-record submission building on PR #462's architecture (Star-ReLU + U-Net + XSA + AdamW TTT).
Awaiting compute credits for validation. BPB not yet measured.
Techniques
Phase 1 (implemented):
Phase 2 (in development):
Architecture
Expected Results
Based on ablations from community PRs:
Will validate with 3 seeds on 8xH100 once compute is available.
Test plan
Generated with Claude Code