Add RIOM v1 shared-depth recurrence non-record submission#523
Add RIOM v1 shared-depth recurrence non-record submission#523hesong0222-dev wants to merge 2 commits intoopenai:mainfrom
Conversation
Community Review — Add RIOM v1 shared-depth recurrence non-record submissionCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via |
|
Thanks for the smoke test. I pushed a fix for the import blocker on the RIOM v1 non-record script. The update guards the MLX and What I verified locally after the change:
If you re-run the compliance pipeline, the import step should now pass cleanly. |
|
Follow-up: I re-ran the checks under Python 3.10 specifically.
|
Summary
This draft PR adds a non-record RIOM-style submission under
records/track_non_record_16mb/2026-03-23_RIOM_v1_recur.The core idea is shared-depth recurrence: replace 9 distinct transformer blocks with 3 shared blocks repeated 3 times, with lightweight learned recurrence gates. The tokenizer and dataset remain unchanged from the official SP1024 FineWeb setup, and the change is isolated to the record-local
train_gpt.py.What actually ran
This package is based on a real Apple Silicon MLX development run.
501,048,576official validation tokensval_loss=5.4143,val_bpb=3.2439val_loss=5.42207763,val_bpb=3.2486200851,4042,273,4372,324,841Why this is interesting even if non-record
On the same local development prefix, this recurrent package improved
val_bpbover the local baseline package while sharply reducing counted artifact size. That is the main RIOM hypothesis in compact form: more effective depth per byte through parameter sharing.Limitations
VAL_MAX_TOKENS; it should be rerun withVAL_MAX_TOKENS=0before any serious upstream submission.train_gpt.pypath.