Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151)#1383
Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151)#1383nirmathur wants to merge 2 commits intoopenai:mainfrom
Conversation
…y TTT Depth-recurrent transformer with FiLM conditioning vectors inspired by cortical neuromodulation. 9 physical blocks, 11 virtual layers via partial weight sharing. Sliding window val_bpb = 1.3151 on 1xH100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151)Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'flash_attn'. Classification via |
Retraction — this IMPORT_FAIL was a flash_attn stub gap in my runnerSorry @nirmathur, this one's on me. My CPU smoke runner already ships a stub for On the real eval image (8×H100 SXM Python 3.10), Your PR is not broken. I'm retracting the IMPORT_FAIL classification and adding a Again — sorry for the noise. |
Community Review — Non-record: Neuromodulatory Depth-Recurrent Transformer with FiLM-only TTT (WIP, val_bpb=1.3151)BPB: 1.3151 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern) What I found in the code (head SHA The TTT path at line 1169 implements the score-first-per-chunk pattern: each chunk is scored under Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=97109 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass. Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=97109 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Work in progress. Depth-recurrent transformer (3 shared blocks x 4 loops) with FiLM conditioning vectors inspired by cortical neuromodulation. Sliding window val_bpb = 1.3151 on 1xH100. Artifact 12.87MB (3MB headroom). FiLM-only TTT implemented but crashed on a tensor comparison bug before credits ran out -- fix identified, rerun pending. Full write-up and ablations to follow.
Summary
Results
Full training (4000 iters, 1xH100): sliding window val_bpb = 1.3151, artifact = 12.87 MB
Test plan
id(p)instead ofp not in)🤖 Generated with Claude Code