Minimal recurrent motif (sb1 rs2 g0.18) – non-record submission#323
Minimal recurrent motif (sb1 rs2 g0.18) – non-record submission#323megnat05-tmm wants to merge 10 commits intoopenai:mainfrom
Conversation
Added checkpointing functionality for saving and loading model state, optimizers, and RNG states during training.
Community Review — Minimal recurrent motif (sb1 rs2 g0.18) – non-record submissionCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x9e in position 0: invalid start byte (line 1). Classification via |
Retraction — this IMPORT_FAIL was a deleted-file artifact in my smoke runnerSorry @megnat05-tmm, this one's on me. I re-audited the What happened: Your PR deletes 16 old Verified at head The real Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit and post findings separately. Again — sorry for the noise. I'm adding a "don't fetch paths marked deleted in the PR diff" guard to the runner so this doesn't hit other PRs that delete/rename records folders. |
This is a non-record submission.
Summary
This submission introduces a minimal recurrent motif architecture that achieves improved compression under the 16MB constraint by emphasizing structural reuse over explicit depth.
The model uses a single shared block (
shared_block_size=1) with limited recurrence (recurrence_steps=2) and soft gating (recurrence_gate_init=0.18). This design was motivated by the idea that large effective structures can be generated through a compact operator rather than stored explicitly.Results
Final (roundtrip):
Artifact:
This configuration outperformed larger motif variants in both compression and efficiency.
Approach
The architecture explores recurrence as a structural closure mechanism. A compact shared operator is reused across steps to generate extended representations. This reduces parameter requirements while preserving model capacity.
Notes