0.8128 BPB: Classical Compression Eval + N-gram Backoff on PR #549 Base#786
0.8128 BPB: Classical Compression Eval + N-gram Backoff on PR #549 Base#786shinegami-2002 wants to merge 2 commits intoopenai:mainfrom
Conversation
Novel approach bringing cmix/PAQ techniques (n-gram backoff, match model, APM error correction, logistic mixing) as eval-time augmentation on top of the PR openai#549 neural model stack. Initial proof of concept on 1xH100 shows compression pipeline working. Pending full 8xH100 run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full training run on 8xH100 SXM (7135 steps, 600s wallclock). Base model: 1.1218 BPB (sliding window). With n-gram backoff (orders 2-7) + entropy-adaptive alpha: 0.8128 BPB. Artifact: 15.88 MB. Eval time: 383s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Classical Compression Eval (N-gram Backoff)BPB: 0.8128 (1 seed) | Seeds: 1 (2 more "pending compute grant") | Artifact: 15.88 MB | Compliance: FLAG — hashed n-gram cache with target-in-key (family bug) What this does: On top of the PR #549 base (11L, 512d, BigramHash, XSA, int6+lzma, final_int6_sliding_window val_bpb 1.1218), adds an eval-time multi-order (2–7) n-gram backoff with flat 4M-bucket numpy hash tables and entropy-adaptive alpha mixing. Base BPB 1.1218 drops to 0.8128 from the n-gram layer alone. The author credits PR #727 as the source of the technique and explicitly says "Hash function: XOR with prime table, same approach as PR #727" (README line 53). What I found in the code (
This is the same mechanism @valerio-oai described in PR #779 comment 4146407380: because the target token is part of the hash key and the update of the Smoking-gun numbers from
Questions/flags:
Verdict: COMPLIANCE FLAG — eval-token leak via hashed n-gram cache with target-in-key, same pattern as disallowed PR #779. Also fails the 3-seed record requirement as-submitted. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE as non-compliant for the 0.8128 claim under the PR #779 n-gram ruling. The underlying PR #549-based neural model with Reviewed by @MatoTeziTanka — The Agora. CPU gauntlet skipped — compliance flag made BPB verification moot; the neural base (PR #549 stack) would route through normal gauntlet if re-submitted without the compressed-eval layer. AI tooling: review drafted with Claude Code (Sonnet/Opus) using an internal review template; all citations, file paths, and compliance audits were verified against the PR's actual code at SHA |
…cluster + CT2038 gauntlet provisioned Reviewed all 20 highest-priority Tier 1 PRs from openai/parameter-golf. Two cluster-level findings: - N-gram family bug (10 PRs CLOSED + 1 already ruled): full_key = ((ctx_hash ^ (target * primes[k])) & mask) — target token hashed into the eval-cache lookup key, ruled illegal by valerio-oai on PR openai#779. Same verbatim pattern in openai#770/openai#798/openai#808/openai#825/openai#786/openai#797/openai#909/openai#940/openai#761 + openai#764 follow-up. Upstream parent: lukacf (openai#659/openai#702/openai#727 — task #5 audit queued). - Standard SLOT cluster (4 HOLD pending openai#1336, 2 CLOSE): per-window delta+logit_bias optimized N steps against (per_token_nll * mask) where mask = scored positions [s:wlen]. PRs openai#1321/openai#1324/openai#1278/openai#1263 → HOLD; openai#1319/openai#1376 → CLOSE. Clean MERGE-eligible: openai#1420 (token_hint-only post-fix) and openai#1450 (TMA megakernel triple loop). Eval-budget gate (openai#915/openai#889 anthony-maio pair): clean ngram code, ~14.9 min ngram stage on 8xH100 SXM. One @0hq ruling on Issue openai#17 unblocks both PRs plus ~30 ngram-cache PRs. Infrastructure: provisioned CT2038 (proteus-engine, 128 GB RAM, 32 cores) as the dedicated parameter-golf gauntlet host. Installed Triton 3.6.0, deployed cpu_test.py + flash_attn_stub.py. Re-ran the 4 PRs originally skipped due to FA3/Triton blockers — all PASS. Edited 4 GitHub comments via gh api PATCH to add the rerun results. Coverage went from 9/20 to 14/20 fully gauntleted. Side session handed off via SOW_HF_DATASET_REPUBLISH.md (Scylla 998→1254 fix + SP4096/SP8192/SP12288/SP16384 publish + Cloudflare R2 mirror). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Approach
Eval-time augmentation inspired by classical data compression (cmix/PAQ). Multi-order n-gram backoff with entropy-adaptive alpha, vectorized numpy implementation. All backward-looking, zero artifact cost.
Key Numbers
Test plan
Credits
Built on PR #549 base. N-gram technique inspired by PR #727. Classical compression research from cmix/PAQ.
🤖 Generated with Claude Code