Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence by akaiHuang · Pull Request #1255 · openai/parameter-golf

akaiHuang · 2026-04-02T09:43:59Z

Summary

This PR adds a non-record 16MB submission under records/track_non_record_16mb/2026-04-02_Meadow_TextDiffusion_Retrodiction_TTT_DepthRecurrence.

Techniques included:

Text Diffusion (CDM) with Sequential Unmasking eval
AR Retrodiction
Test-Time Training (full-model AdamW)
Depth recurrence experiments
Custom v4096 tokenizer

Files

README.md
submission.json
train_gpt.py
train_cdm.py
eval_sequential_unmasking.py
eval_ttt.py
bpe_v4096.model
train.log

This is intended for the non-record track.

16L/512d/39M params, trained on M1 Max (not 8xH100). Retrodiction: reversed sequence auxiliary loss from quantum information theory. Int6 + lzma = 14.8MB (within 16MB limit).

akaiHuang · 2026-04-02T09:54:45Z

Update

This PR is for the non-record track.

Current results:

AR + Retrodiction (v4096): 1.497
AR + TTT (full-model AdamW): 1.492
Shared AR+CDM (single model): 1.503 (~2.3MB)
CDM + Sequential Unmasking: 2.570

Some exploratory H100 coarse-to-fine entries in the README are marked “no log saved” and are research observations, not record claims.

Next: run 3–5 seed reproducibility on 8xH100 and optimize for strict 10-minute train/eval constraints.

This reverts commit a47409e.

…rting scripts This commit promotes the v3.5 writeup (already present as README_v3_5_DRAFT.md since commit 5790ba7) to the canonical README.md so that PR openai#1255 reviewers see the verified version directly, and syncs the three supporting scripts from the meadow-golf research diary to the v3.5 versions. README.md (v3.3 -> v3.5): - Headline now reports the 5-seed mean delta (-0.0205 BPB) as the primary effect size, with the single best seed (-0.0290 BPB) as a post-hoc deployable-artifact reference, not as the headline number - §3.1 is now the 11L multi-seed verification at the true final checkpoint (5 fresh shared seeds + 1 fresh causal-only control seed). Original 6-run scaling sweep retained as §3.2 cross-scale evidence - Adds §6.0 (5L multi-seed verification) as the gating follow-up - Adds Appendix A consolidating legacy intermediate-checkpoint 11L numbers for traceability with v3.3 - §10 Compliance clarifies that competition submission unit is int6.lzma (under 16 MB cap); .npz files are working format only - §3.1 statistical caveat: no significance test is computed because the control side has only 1 fresh seed in this round; the second control seed in §6.0 closes this gap train_cdm.py: writes step_final.pt at end of training (addresses intermediate-checkpoint bias from v3.3). eval_cf_ablation.py: detects .npz files and loads them with the correct parameter dtype, so CF eval can run directly on training final-state .npz output. train_ablation_runner.py: adds --seed flag that patches the module-level SEED constant in train_cdm.py and writes a per-seed patched script with _s<seed> in the filename, so seeds_run/run_p5.sh and run_phase_b.sh are self-contained against the unified runner. The seeds_run/ reviewer spot-check bundle (logs, orchestration scripts, wrapper stdout) is already committed in 5790ba7. The .npz / step_final.pt state files (~1.3 GB) are intentionally not committed; their location and on-request availability are documented in seeds_run/README.md and §10.

akaiHuang · 2026-04-09T14:08:02Z

v3.5 update pushed to the PR head branch.

Main changes in this revision:

Canonical README.md is now the v3.5 writeup; reviewers no longer need to open README_v3_5_DRAFT.md.
The 11L headline now uses the true final training step with 5 fresh shared-model seeds and 1 fresh matched causal-only control seed.
Main method-level result at 11L: shared CF 1.3009 ± 0.005 vs matched causal-only final-checkpoint control 1.3214, for a 5-seed mean delta of -0.0205 BPB.
The single best shared seed (SEED=1337) is retained only as a post-hoc deployable-artifact reference: 1.2924, i.e. -0.0290 BPB versus the matched control.
The original 6-run 5L+11L sweep is retained as §3.2 cross-scale evidence, while legacy intermediate-checkpoint 11L numbers are moved to Appendix A for traceability with v3.3.
Supporting scripts are synchronized with the v3.5 methodology:
- train_cdm.py now writes step_final.pt
- train_ablation_runner.py supports --seed for the per-seed reruns in seeds_run/
- eval_cf_ablation.py can load final-state .npz saves directly

This remains a non-record submission. The 11L rows are not claimed as record candidates; they are filed under the non-record track explicitly.

Standalone research diary mirror: https://github.com/akaiHuang/meadow-golf

MatoTeziTanka · 2026-04-11T20:15:03Z

Community Review — Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

PEP 701 f-string nesting — e.g. log(f" {cat}: {", ".join(...)}") is valid Python 3.12+ but invalid Python 3.10 because the inner ", " re-enters the outer double-quote context. One-character fix: ', ' instead of ", ". See PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 / Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523 for reference.
Missing flash_attn variants — e.g. from flash_attn_interface import flash_attn_varlen_func when the wrapper script only stubs flash_attn_func. Not a PR defect on H100s, but the eval image / CPU preflight path needs a guarded import.
Local compiled extension — e.g. import cutlass_evt_fusion from a records/*/cutlass_evt_fusion/ subfolder that isn't on the import path at smoke time. Usually an import-order issue inside the script.
Actual syntax error — typo, missing bracket, etc.

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

@MatoTeziTanka

…s cleanly on Linux CPU Addresses the @MatoTeziTanka community-review compliance check on PR openai#1255 that flagged ModuleNotFoundError("No module named 'mlx'") at the import step on a Linux CPU smoke-test environment (CT2038 proteus-engine, Python 3.10, torch 2.10.0+cpu). The four affected files are Apple-Silicon-only pre-flight artifacts referenced in README §3.3 / §3.4 / §3.6 (the M1 Max 5L sweep and the leakage integrity test). They are not part of the H100 production training or evaluation path (train_cdm.py / eval_cf_ablation.py), but they live in the same PR folder and the reviewer's smoke test imports the whole folder. Fix in each file: - Wrap `import mlx.core / mlx.nn / mlx.optimizers / mlx.utils` in a try/except ImportError block. - On ImportError, install minimal stubs so module-level definitions like `class Foo(nn.Module):` and `COMPUTE_DTYPE = mx.bfloat16` parse without raising. The stub class is permissive: any attribute access or call returns another stub instance, sufficient for class subclassing and module-level attribute assignment. - Set `_HAS_MLX = True/False` so any future runtime check can gate behavior. leakage_test.py specifically had ~95 lines of module-level executable code (including a sys.exit(1) on missing checkpoint, and direct mx.array / mx.eval calls). All of that is now wrapped in a `_main()` function with an `if __name__ == "__main__": sys.exit(_main())` guard at the bottom, so importing the file is a no-op and only running it as a script triggers the test logic. The script also exits cleanly with a clear message when MLX is not installed. Verification (this commit): - All 4 files: py_compile passes on Python 3.10 syntax (verified with the default python3 in this environment). - All 4 files: import succeeds on a machine with MLX installed (_HAS_MLX = True; real MLX path). - All 4 files: import succeeds on a simulated mlx-blocked environment via a custom __import__ override (_HAS_MLX = False; stub path). - Functional behavior on Apple Silicon is unchanged: the real `mlx.core`, `mlx.nn`, `mlx.optimizers`, and `mlx.utils` are imported when available. This commit only touches the four pre-flight scripts. No README, training code, eval code, or numbers change.

akaiHuang · 2026-04-12T02:00:03Z

Thanks for the smoke-test details and for catching this before the official audit.

The four files that triggered the import error (eval_cf_dualbrain.py, eval_ttt.py, eval_sequential_unmasking.py, leakage_test.py) are Apple-Silicon-only pre-flight artifacts referenced in §3.3 / §3.4 / §3.6 of the README. They are not part of the H100 production training or evaluation path (which is train_cdm.py + eval_cf_ablation.py), but they live in the same PR folder so your folder-level smoke test correctly hit them.

Fix pushed in commit a4286a4: each file now wraps its import mlx in a try / except ImportError block, installs a minimal stub class on failure (so module-level definitions like class Foo(nn.Module): and COMPUTE_DTYPE = mx.bfloat16 still parse), and sets _HAS_MLX = False. leakage_test.py additionally moves its module-level test logic into a _main() function gated by if __name__ == \"__main__\": so that importing the file is a no-op.

Locally verified:

py_compile passes on all four files under Python 3.10 syntax.
Import succeeds on a real Apple Silicon machine with MLX installed (_HAS_MLX=True, original behaviour preserved).
Import succeeds on a simulated mlx-blocked environment via a custom `import` override (_HAS_MLX=False, stub path).

No README, no scoring path, no numbers change. Please re-run the audit at your convenience — happy to help debug if anything else trips.

MatoTeziTanka · 2026-04-12T14:54:06Z

Re-audited at head SHA a4286a4.

Fix confirmed. The four MLX-dependent files (eval_cf_dualbrain.py, eval_ttt.py, eval_sequential_unmasking.py, leakage_test.py) all now wrap import mlx in try/except ImportError with _HAS_MLX = False fallback and minimal stub classes. leakage_test.py moves its module-level logic into a _main() gated by if __name__ == "__main__". All files compile clean under Python 3.10.

As you noted, these are Apple Silicon pre-flight artifacts — not part of the H100 scored eval path (train_cdm.py + eval_cf_ablation.py). The fix is the right approach: guarded import so folder-level smoke tests don't false-positive while preserving full functionality on actual Apple Silicon machines.

The train_gpt.py (H100 path) compiles clean. I'll queue the full compliance audit on the active eval path for the next sweep.

Re-audit by @MatoTeziTanka. Verified MLX import guards in all 4 files, py_compile OK under Python 3.10.

akaiHuang added 3 commits March 31, 2026 23:52

Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508

a320923

16L/512d/39M params, trained on M1 Max (not 8xH100). Retrodiction: reversed sequence auxiliary loss from quantum information theory. Int6 + lzma = 14.8MB (within 16MB limit).

Add Meadow non-record submission: text diffusion + retrodiction + TTT

b724663

Add concise 8xH100 rerun RUNBOOK

9e48783

akaiHuang added 4 commits April 9, 2026 21:52

Add v3.5 matched-ablation review bundle

5790ba7

Sync final README for v3.5 review

a47409e

Revert "Sync final README for v3.5 review"

7df06f2

This reverts commit a47409e.

This was referenced Apr 12, 2026

Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508 #1183

Closed

Non-record: No-FA3 stack combination — val_bpb 1.1854 (1-seed, 8xH100) #1442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence#1255

Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence#1255
akaiHuang wants to merge 8 commits intoopenai:mainfrom
akaiHuang:codex/nonrecord-textdiffusion-retrodiction-ttt

akaiHuang commented Apr 2, 2026

Uh oh!

akaiHuang commented Apr 2, 2026

Uh oh!

akaiHuang commented Apr 9, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading

Uh oh!

akaiHuang commented Apr 12, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akaiHuang commented Apr 2, 2026

Summary

Files

Uh oh!

akaiHuang commented Apr 2, 2026

Update

Uh oh!

akaiHuang commented Apr 9, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community Review — Non-record Submission: Text Diffusion + Retrodiction + TTT + Depth Recurrence

Uh oh!

akaiHuang commented Apr 12, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading