Midnight 12L — 1.10567949 val_bpb (seed 444)#1458
Midnight 12L — 1.10567949 val_bpb (seed 444)#1458newjordan wants to merge 1 commit intoopenai:mainfrom
Conversation
|
|
hey man, this doesnt have ngram on, its jsut a dingleberry in the code. your bot can go away yesterday. |
Correction + deep re-audit — PR #1458@newjordan you're right about the n-gram flag. I re-audited the head SHA ( What I got wrong originally: My AST classifier found What the re-audit found (three independent passes): 1. Compliance — CLEAN across all checks:
2. Code quality findings from the Codex pass (not compliance issues, but real bugs in dead code):
3. Submission mechanics — all check out:
Corrected verdict: LOOKS CLEAN. The 1.1057 BPB is a legitimate pure-neural result from a 12-layer architecture with mixed-int quantization + Brotli. No n-gram boost, no TTT, no SLOT. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: The original CLOSE flag is retracted. This is a clean pure-neural submission eligible for standard record-track checks. For what it's worth, the two code quality findings above (the MTP optimizer gap and the distributed timeout race) were only caught because your comment prompted the re-audit — so the pushback was productive, even if the delivery was a bit much. For context on how this happened and why we're here — we're running community compliance reviews across 900+ open PRs because the backlog has had zero maintainer triage for weeks. We batch-classified using an AST-based pattern matcher to cover ground fast, and that batch process is what misfired on your PR — it found the n-gram pattern in the file but didn't check whether the code path was live. Every author who's pushed back on a finding has gotten a full re-audit like this one, and every time we've been wrong, we've posted the correction publicly. That's what community review looks like when nobody else is doing it. The classifier bug that caused this has been patched. Reviewed by @MatoTeziTanka — The Agora. Triple-pass re-audit: manual line-by-line + independent LLM peer audit + OpenAI Codex CLI review. Original false-positive retracted. Classifier dead-code detection bug identified and fixed. |
The pod runs PyTorch 2.11.0+cu130 on CUDA 13.0 (vastai/pytorch:cuda-13.0.2-auto). Every prior version of Im_sorry_pod_setup.sh had the wrong CUDA tag (cu128), the /venv/main/bin PATH prepend was deleted, and the FA3 symlink fallback was removed by automated edits. Changes: - FA3 wheel URL: cu128 → cu130 (validated from pod instance 34775495 logs) - Restore /venv/main/bin PATH prepend before any python3/pip calls - Restore flash-attention/hopper symlink fallback in install_fa3() - Remove cuda_tag_from_torch() dynamic detection (source of bugs) - Remove BLOCK_CU124 variable; replace with positive cu130 assertion - Remove WRITE_ACTIVATE_HELPER gate; always write activation helper - Remove /workspace/venv_cu124 reference - Add DO NOT EDIT warning header with validated pod environment specs - Add frozen backup: Im_sorry_pod_setup.sh.cu130_frozen_20260414 - Update pod_stack.lock hash - Add AGENTS.md (Codex reads this) with frozen-file rules - Update CLAUDE.md with explicit frozen-file protection Evidence: PR openai/parameter-golf#1458 seed 444 log, pod instance 34775495 status JSON (image_uuid=vastai/pytorch:cuda-13.0.2-auto), extracted pod log (Running PyTorch 2.11.0+cu130, CUDA Version: 13.0, Driver: 580.126.09). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Midnight 12L
12-layer decoder model using GQA (8 query heads / 4 KV heads), Bigram-2048
Features, RoPE-16, and XSA on the last 11 layers.
mixed-int quantization (attn=int5, mlp=int6, aux=int6, Embed=int8, other=int8) plus
Brotli-compressed mixed checkpoint artifacts.
Pure neural submission: this run targets model quality from training-time architecture/design, not eval-time adaptation.
No eval tricks: no TTT/SLOT/ngram overlays, no eval-time optimizer loops, with standard vocab.
Score comes from core neural learning + artifact compression/quantization while explicitly managing quantization loss.
Results
3-seed exact mean:
1.10597186· population std:0.00031653Hardware: 8xH100 SXM · 600s wallclock ·
bytes_code: 124698Architecture changes
Reproduce
# From repo root, with flash-attention/hopper on PYTHONPATH SKIP_GPTQ=1 SEED=444 torchrun --standalone --nproc_per_node=8 \ records/track_10min_16mb/2026-04-07_Midnight_12L_8xH100/train_gpt.py12L_3.mp4