Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence#1255
Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence#1255akaiHuang wants to merge 8 commits intoopenai:mainfrom
Conversation
16L/512d/39M params, trained on M1 Max (not 8xH100). Retrodiction: reversed sequence auxiliary loss from quantum information theory. Int6 + lzma = 14.8MB (within 16MB limit).
UpdateThis PR is for the non-record track. Current results:
Some exploratory H100 coarse-to-fine entries in the README are marked “no log saved” and are research observations, not record claims. Next: run 3–5 seed reproducibility on 8xH100 and optimize for strict 10-minute train/eval constraints. |
This reverts commit a47409e.
…rting scripts This commit promotes the v3.5 writeup (already present as README_v3_5_DRAFT.md since commit 5790ba7) to the canonical README.md so that PR openai#1255 reviewers see the verified version directly, and syncs the three supporting scripts from the meadow-golf research diary to the v3.5 versions. README.md (v3.3 -> v3.5): - Headline now reports the 5-seed mean delta (-0.0205 BPB) as the primary effect size, with the single best seed (-0.0290 BPB) as a post-hoc deployable-artifact reference, not as the headline number - §3.1 is now the 11L multi-seed verification at the true final checkpoint (5 fresh shared seeds + 1 fresh causal-only control seed). Original 6-run scaling sweep retained as §3.2 cross-scale evidence - Adds §6.0 (5L multi-seed verification) as the gating follow-up - Adds Appendix A consolidating legacy intermediate-checkpoint 11L numbers for traceability with v3.3 - §10 Compliance clarifies that competition submission unit is int6.lzma (under 16 MB cap); .npz files are working format only - §3.1 statistical caveat: no significance test is computed because the control side has only 1 fresh seed in this round; the second control seed in §6.0 closes this gap train_cdm.py: writes step_final.pt at end of training (addresses intermediate-checkpoint bias from v3.3). eval_cf_ablation.py: detects .npz files and loads them with the correct parameter dtype, so CF eval can run directly on training final-state .npz output. train_ablation_runner.py: adds --seed flag that patches the module-level SEED constant in train_cdm.py and writes a per-seed patched script with _s<seed> in the filename, so seeds_run/run_p5.sh and run_phase_b.sh are self-contained against the unified runner. The seeds_run/ reviewer spot-check bundle (logs, orchestration scripts, wrapper stdout) is already committed in 5790ba7. The .npz / step_final.pt state files (~1.3 GB) are intentionally not committed; their location and on-request availability are documented in seeds_run/README.md and §10.
|
v3.5 update pushed to the PR head branch. Main changes in this revision:
This remains a non-record submission. The 11L rows are not claimed as record candidates; they are filed under the non-record track explicitly. Standalone research diary mirror: https://github.com/akaiHuang/meadow-golf |
Community Review — Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth RecurrenceCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via |
…s cleanly on Linux CPU Addresses the @MatoTeziTanka community-review compliance check on PR openai#1255 that flagged ModuleNotFoundError("No module named 'mlx'") at the import step on a Linux CPU smoke-test environment (CT2038 proteus-engine, Python 3.10, torch 2.10.0+cpu). The four affected files are Apple-Silicon-only pre-flight artifacts referenced in README §3.3 / §3.4 / §3.6 (the M1 Max 5L sweep and the leakage integrity test). They are not part of the H100 production training or evaluation path (train_cdm.py / eval_cf_ablation.py), but they live in the same PR folder and the reviewer's smoke test imports the whole folder. Fix in each file: - Wrap `import mlx.core / mlx.nn / mlx.optimizers / mlx.utils` in a try/except ImportError block. - On ImportError, install minimal stubs so module-level definitions like `class Foo(nn.Module):` and `COMPUTE_DTYPE = mx.bfloat16` parse without raising. The stub class is permissive: any attribute access or call returns another stub instance, sufficient for class subclassing and module-level attribute assignment. - Set `_HAS_MLX = True/False` so any future runtime check can gate behavior. leakage_test.py specifically had ~95 lines of module-level executable code (including a sys.exit(1) on missing checkpoint, and direct mx.array / mx.eval calls). All of that is now wrapped in a `_main()` function with an `if __name__ == "__main__": sys.exit(_main())` guard at the bottom, so importing the file is a no-op and only running it as a script triggers the test logic. The script also exits cleanly with a clear message when MLX is not installed. Verification (this commit): - All 4 files: py_compile passes on Python 3.10 syntax (verified with the default python3 in this environment). - All 4 files: import succeeds on a machine with MLX installed (_HAS_MLX = True; real MLX path). - All 4 files: import succeeds on a simulated mlx-blocked environment via a custom __import__ override (_HAS_MLX = False; stub path). - Functional behavior on Apple Silicon is unchanged: the real `mlx.core`, `mlx.nn`, `mlx.optimizers`, and `mlx.utils` are imported when available. This commit only touches the four pre-flight scripts. No README, training code, eval code, or numbers change.
|
Thanks for the smoke-test details and for catching this before the official audit. The four files that triggered the import error ( Fix pushed in commit Locally verified:
No README, no scoring path, no numbers change. Please re-run the audit at your convenience — happy to help debug if anything else trips. |
|
Re-audited at head SHA Fix confirmed. The four MLX-dependent files ( As you noted, these are Apple Silicon pre-flight artifacts — not part of the H100 scored eval path ( The Re-audit by @MatoTeziTanka. Verified MLX import guards in all 4 files, py_compile OK under Python 3.10. |
Summary
This PR adds a non-record 16MB submission under records/track_non_record_16mb/2026-04-02_Meadow_TextDiffusion_Retrodiction_TTT_DepthRecurrence.
Techniques included:
Files
This is intended for the non-record track.