Skip to content

Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508#1183

Closed
akaiHuang wants to merge 1 commit intoopenai:mainfrom
akaiHuang:main
Closed

Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508#1183
akaiHuang wants to merge 1 commit intoopenai:mainfrom
akaiHuang:main

Conversation

@akaiHuang
Copy link
Copy Markdown

  • 16L / 512d / 39M params, Retrodiction auxiliary loss (α=0.3)

    • Trained on M1 Max 64GB (not 8xH100 — hence non-record)
    • val_bpb: 1.508 at 2000 steps (131M tokens)
    • Int6 + lzma = 14.8MB (within 16MB limit)

    Novel contribution

    Retrodiction: reversed sequence auxiliary loss inspired by the Petz recovery map
    from quantum information theory. The model trains on both forward and reversed
    sequences, learning bidirectional representations while maintaining causal attention.

    loss = AR_loss(forward) + 0.3 * AR_loss(reversed)

    Achieves 1–3.6% BPB improvement over pure AR at matched token counts. Zero
    inference cost (training-only technique).

    Why non-record

    Trained on M1 Max (65K tokens/step), not 8xH100. Planning to submit a record-track
    version once H100 access is available.

    Files

    • README.md — Detailed writeup with results tables
    • submission.json — Metadata
    • train_gpt.py — Complete training script (MLX)

16L/512d/39M params, trained on M1 Max (not 8xH100).
Retrodiction: reversed sequence auxiliary loss from quantum information theory.
Int6 + lzma = 14.8MB (within 16MB limit).
@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: Retrodiction Training (Petz Recovery Map) — val_bpb 1.508

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

@akaiHuang
Copy link
Copy Markdown
Author

Thanks for the smoke-test details.

This PR predates and is now superseded by #1255, which uses a unified PyTorch H100 stack (train_cdm.py) and includes 5-seed multi-seed verification of the matched-compute headline at the true final checkpoint. The Petz-recovery retrodiction line in this PR is included as §7 of the #1255 writeup as a documented negative result, so I am closing this older PR to avoid duplicate review surface and to consolidate the discussion in one place.

The mlx import issue is fixed in #1255 commit a4286a4 (guarded try / except ImportError + minimal stubs in the four M1-only pre-flight files). No action needed on this PR — please direct any follow-up to #1255. Sorry for the extra audit cycle and thanks again for the careful review.

@akaiHuang
Copy link
Copy Markdown
Author

Closing as superseded by #1255 (see comment thread above).

@akaiHuang akaiHuang closed this Apr 12, 2026
@MatoTeziTanka
Copy link
Copy Markdown

No worries at all — consolidating into #1255 is the right move. I'll direct the re-audit there. Thanks for the clean handoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants