Record: 11L EMA + BigramHash(12288) + Mixed Int5 + FA3 (1.1354)#466
Record: 11L EMA + BigramHash(12288) + Mixed Int5 + FA3 (1.1354)#466simonbissonnette wants to merge 1 commit intoopenai:mainfrom
Conversation
|
12288 buckets is a nice bump over 10240, did you ablate that or just go bigger for the hell of it lol. also the fa3 kernel fetch disclosure is appreciated thats good practice |
Thanks! After some initial quantization work, I ended up with a bit of spare artifact budget, so I used part of it to increase BigramHash and improve BPB. 12288 ended up being the best practical tradeoff for this submission after a few trial-and-error runs. It gave a real gain over the smaller setting while still keeping all 3 submission seeds under the 16 MB cap. |
Community Review — Record: 11L EMA + BigramHash(12288) + Mixed Int5 + FA3 (1.1354)Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'env_utils'. Classification via |
Hey, thanks for the review! |
Retraction — this IMPORT_FAIL was a bug in my smoke runnerSorry @simonbissonnette, this one's on me. I re-audited the IMPORT_FAIL I posted above and it was a false positive — the fault is in how my CPU smoke runner set up What happened: The runner imported your Verified at head On the real eval image (Python 3.10, Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit (BPB check, n-gram / TTT / SLOT flags, etc.) on the current head and post findings separately. Again — sorry for the noise. These community reviews only work if I actually read what I'm reviewing, and I didn't in this case. |
Summary
This PR adds a main-track submission attempt for the Parameter Golf challenge based on an 11-layer, 512-dim model with:
0.997)12288, dim128)3-Seed Results
1.13593695val_bpb,15,967,704bytes total1.13389376val_bpb,15,663,365bytes total1.13626774val_bpb,15,660,237bytes total1.1353660.001286Notes
train_gpt.pysnapshot and the same hyperparameter recipe, differing only by seed.kernels-community/flash-attn3, which fetches the FA3 kernel package at runtime.