Add DeepFloor non-record submission candidate#1304
Add DeepFloor non-record submission candidate#1304KenMalloy wants to merge 20 commits intoopenai:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mark Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… confirmations - estimate_artifact_bytes() now only counts parameters for the active cross_token_mode (floor or fused) and output path, not both - run_deepfloor_recipe_evolution recomputes best from confirm_results sorted by val BPB, not just short-fitness winner - Added tests for both: inactive branch exclusion and confirm re-ranking - Updated design doc budget table to match QKV+O block (16K params) - Added implementation plan doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Add DeepFloor non-record submission candidateCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'spectral_flood_walk_v0'. Classification via |
Retraction — this IMPORT_FAIL was a bug in my smoke runnerSorry @KenMalloy, this one's on me. I re-audited the IMPORT_FAIL I posted above and it was a false positive — the fault is in how my CPU smoke runner set up What happened: The runner imported your Verified at head On the real eval image (Python 3.10, Your PR is not broken by this error. I'm retracting the IMPORT_FAIL classification. I'll re-queue the full compliance audit (BPB check, n-gram / TTT / SLOT flags, etc.) on the current head and post findings separately. Again — sorry for the noise. These community reviews only work if I actually read what I'm reviewing, and I didn't in this case. |
|
This is a draft PR, and not my full bet. Please don't spend too much effort on it. Keep up the good work. |
|
Understood — noted for the backlog. The retraction of the IMPORT_FAIL still stands either way. When you're ready to promote it from draft, the compliance pass will pick it up in the next sweep. |
Summary
This PR adds a new non-record submission candidate:
2026-04-03_DeepFloor.DeepFloor is a compact recurrent multi-view language model that replaces a large flat stack with:
Why this is interesting
This is not a leaderboard-quality run yet. The value of the submission is that it packages a real, unusual architecture direction that is reproducible inside the challenge record format:
Included artifacts
README.mdRESULTS.mdsubmission.jsontrain_gpt.pydeepfloor_snapshot.pyfreeze_submission_snapshot.pyCurrent checked-in evidence
Best real-
enwik8small-box matrix candidate:fused_d32_v2val_bpb = 7.9221artifact_bytes = 8448Frozen submission preflight:
bytes_total = 56221bytes_model_estimated = 8192bytes_code = 48029Checked-in candidate submission:
submission.jsonbuilt fromcandidate_result_seed1337.jsonbytes_total = 56477val_bpb = 7.9221Validation
enwik88x H100fullbox suite, with syncedsmoke,matrix,launch_logs, andevolutionartifacts underruns/fullbox/Fullbox evidence
Best fullbox recipe-search result:
frontier seed 2025val_bpb = 4.1101test_bpb = 4.0239artifact_mb = 0.1377floorscalar_decay968This is stronger evidence that DeepFloor is a real direction worth reviewing, but it is still not presented as a record claim or as three repeated fixed-candidate contest runs.
Submission lane
This should be reviewed as a
track_non_record_16mbresearch submission, not a record claim.