Skip to content

WIP: Add adaptive eval-time context non-record MLX submission#62

Draft
stpcoder wants to merge 1 commit intoopenai:mainfrom
stpcoder:research/clean-room
Draft

WIP: Add adaptive eval-time context non-record MLX submission#62
stpcoder wants to merge 1 commit intoopenai:mainfrom
stpcoder:research/clean-room

Conversation

@stpcoder
Copy link
Copy Markdown

@stpcoder stpcoder commented Mar 19, 2026

Summary

This PR adds a non-record MLX submission under records/track_non_record_16mb/ for adaptive eval-time context.

The setup is a local Apple Silicon run, not a leaderboard claim. The main change in this snapshot is the final evaluation path: instead of using one fixed policy over the whole validation stream, it does a coarse pass first and then rescoring with a finer stride on the harder windows from that pass.

Included files

  • README.md
  • submission.json
  • train.log
  • compare_standard.log
  • train_gpt.py

Run details

  • hardware: Apple M4 Pro, 48 GB unified memory
  • model: SP-1024, 9x512, KV4, tied embeddings
  • train data: first FineWeb train shard
  • validation: first 32768 validation tokens
  • training length: 200 iterations
  • adaptive eval settings: coarse_stride=256, fine_stride=64, hard_fraction=0.25

Result in train.log

  • pre-quant eval at stop: val_bpb=2.4070
  • final int8+zlib roundtrip: val_bpb=2.40284524
  • total submission size: 11297911 bytes

Same-setup reference

compare_standard.log uses the same setup with standard final evaluation:

  • standard final roundtrip: val_bpb=2.41303630
  • adaptive final roundtrip: val_bpb=2.40284524

So in this local fixed-step proxy, the adaptive pass improves the final roundtrip score by about 0.01019 bpb, but it also makes the final eval pass slower. That tradeoff is the reason this is being submitted as a WIP non-record result rather than as a performance claim.

Validation

  • python -m py_compile records/track_non_record_16mb/2026-03-19_AdaptiveEvalContext_MLX_M4Pro_sp1024_200it/train_gpt.py
  • included local MLX run logs for both adaptive and standard final eval under the same setup

@stpcoder stpcoder changed the title WIP: local MLX research support for eval/export experiments WIP: local MLX workflow for adaptive eval/export experiments Mar 19, 2026
@stpcoder stpcoder changed the title WIP: local MLX workflow for adaptive eval/export experiments WIP: local MLX workflow for adaptive eval-time context Mar 19, 2026
@stpcoder stpcoder force-pushed the research/clean-room branch from b7e874d to e13f5a4 Compare March 19, 2026 11:21
@stpcoder stpcoder changed the title WIP: local MLX workflow for adaptive eval-time context WIP: Add adaptive eval-time context non-record MLX submission Mar 19, 2026
@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — WIP: Add adaptive eval-time context non-record MLX submission

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants