WIP: Add adaptive eval-time context non-record MLX submission#62
WIP: Add adaptive eval-time context non-record MLX submission#62stpcoder wants to merge 1 commit intoopenai:mainfrom
Conversation
b7e874d to
e13f5a4
Compare
Community Review — WIP: Add adaptive eval-time context non-record MLX submissionCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via |
Summary
This PR adds a non-record MLX submission under
records/track_non_record_16mb/for adaptive eval-time context.The setup is a local Apple Silicon run, not a leaderboard claim. The main change in this snapshot is the final evaluation path: instead of using one fixed policy over the whole validation stream, it does a coarse pass first and then rescoring with a finer stride on the harder windows from that pass.
Included files
README.mdsubmission.jsontrain.logcompare_standard.logtrain_gpt.pyRun details
9x512,KV4, tied embeddings32768validation tokens200iterationscoarse_stride=256,fine_stride=64,hard_fraction=0.25Result in
train.logval_bpb=2.4070val_bpb=2.4028452411297911bytesSame-setup reference
compare_standard.loguses the same setup with standard final evaluation:val_bpb=2.41303630val_bpb=2.40284524So in this local fixed-step proxy, the adaptive pass improves the final roundtrip score by about
0.01019 bpb, but it also makes the final eval pass slower. That tradeoff is the reason this is being submitted as a WIP non-record result rather than as a performance claim.Validation
python -m py_compile records/track_non_record_16mb/2026-03-19_AdaptiveEvalContext_MLX_M4Pro_sp1024_200it/train_gpt.py