Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) by anthony-maio · Pull Request #889 · openai/parameter-golf

anthony-maio · 2026-03-26T19:13:50Z

Summary

val_bpb = 0.9642 (3-seed mean, std 0.0002) | ~15.95 MB | 8×H100 SXM

3-Seed Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)

Seed	step_avg	steps	Pre-ngram bpb	Post-ngram bpb	ng_helped	Artifact
1337	88.7ms	6,765	1.1225	0.9640	38.5%	15,981,848
42	88.6ms	6,772	1.1224	0.9641	38.6%	15,904,632
2025	88.6ms	6,776	1.1231	0.9644	38.6%	15,974,308
Mean	88.6ms	6,771	1.1227	0.9642 (std 0.0002)	38.6%

All artifacts under 16,000,000 bytes. All 3 train logs attached.

Key Innovation: Multi-Order N-gram Backoff Cache

Backward-looking n-gram cache built causally from already-scored tokens. Zero artifact cost.

Entropy-Adaptive Alpha: alpha = 0.05 + 0.55 * sigmoid(2*(H-4)). Neural-confident → alpha≈0.05. Neural-uncertain → alpha≈0.60.

Multi-Order Backoff (2-7gram): Highest matching order wins. 4M hash buckets per order. min_count=2 gate. Raw count ratios, no smoothing.

Compliance: Score-first — every token scored before any table update. N-gram tables built from already-scored tokens only. No training data access during eval. No oracle selection.

Training Architecture

PR #414 base + LeakyReLU² + VRL + lzma:
11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)² MLP 3×, VRL, VE128, BigramHash(2048), XSA4, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997) + Tight SWA, Late QAT, GPTQ-lite int6 + lzma, FA3 Hopper, Muon WD=0.04

Credits

N-gram backoff: PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727 by @Asukabot0
Base model: PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414 by @signalrush
LeakyReLU²: PR Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309) #493 by @parinzee, PR Record: 11L XSA4 + LeakyReLU(0.5)² + Cosine TTT 50ep (val_bpb=1.0622) #518 by @sofiabod
VRL: ResFormer (arXiv:2410.17897), PR Record: 11L VRL + LeakyReLU² + Full GPTQ (3-seed mean val_bpb=1.1175) #569 by @gowtham0992

Sub-1.0 bpb via multi-order n-gram backoff (2-7gram) with entropy-adaptive alpha mixing. 3-seed mean 0.9642, std 0.0002. All artifacts under 16MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new record submission for track_10min_16mb showcasing a multi-order (2–7) n-gram backoff cache combined with VRL + LeakyReLU², along with reproducibility artifacts and metadata.

Changes:

Added training/eval script implementing n-gram backoff evaluation and model architecture used for the record.
Added attached training logs for multiple seeds and a README describing results/compliance/repro steps.
Added submission metadata JSON for the record entry.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_gpt.py	Training + evaluation script including sliding-window eval and n-gram backoff cache.
records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_seed42.log	Attached run log for seed 42 supporting reported metrics and artifact size.
records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_seed1337.log	Attached run log for seed 1337 supporting reported metrics and artifact size.
records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/submission.json	Record metadata (val_bpb/val_loss/bytes, hardware, etc.).
records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/README.md	Human-readable summary of the method, results, compliance, and reproduction steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T19:20:10Z

records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_gpt.py

+ all_tokens = val_tokens.cpu().numpy().astype(np.int32)
+ scored_up_to = my_windows[0] if my_windows else 0
+ ngram_helped = 0
+ ngram_total = 0


In distributed n-gram eval, each rank’s cache starts at scored_up_to = my_windows[0], so ranks whose first window does not start at 0 will not include earlier (globally previous) tokens in their cache. This makes the n-gram backoff results depend on world_size/window partitioning rather than matching a single causal pass over the validation stream. To make the cache behavior consistent with a global score-first causal ordering, either (mandatory): (a) initialize each rank’s cache with the prefix tokens up to the first token position it will score (e.g., update the cache over [0, first_scored_pos) before scoring), or (b) run the n-gram backoff evaluation on a single rank (rank 0) and skip the distributed aggregation for that phase.

Copilot · 2026-03-26T19:20:12Z

records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_gpt.py

+   probs = torch.exp(log_probs)
+   entropy = -(probs * log_probs).sum(dim=-1)


This computes and materializes both log_probs and probs for the full [B, T, V] tensor, which is large and increases peak memory/bandwidth. You can compute entropy directly from log_probs without keeping a separate probs tensor (e.g., using log_probs.exp() inline) to reduce memory pressure.

Suggested change

probs = torch.exp(log_probs)

entropy = -(probs * log_probs).sum(dim=-1)

entropy = -(log_probs.exp() * log_probs).sum(dim=-1)

Copilot · 2026-03-26T19:20:13Z

records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/train_gpt.py

+ tokens = torch.cat([load_data_shard(file) for file in files]).contiguous()
+ usable = ((tokens.numel() - 1) // seq_len) * seq_len
+ if usable <= 0:
+  raise ValueError(f"Validation split is too short for TRAIN_SEQ_LEN={seq_len}")


The error message hardcodes TRAIN_SEQ_LEN even though the function parameter is seq_len (and the caller may pass a validation/eval seq length). Consider changing the message to refer to seq_len (or EVAL_SEQ_LEN when applicable) to avoid confusion when debugging validation setup.

Suggested change

raise ValueError(f"Validation split is too short for TRAIN_SEQ_LEN={seq_len}")

raise ValueError(f"Validation split is too short for seq_len={seq_len}")

Copilot · 2026-03-26T19:20:14Z

records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/README.md

+| Seed | step_avg | steps | Pre-ngram bpb | **Post-ngram bpb** | ng_helped | Artifact |
+|------|----------|-------|--------------|-------------------|-----------|----------|
+| 1337 | 88.7ms | 6,765 | 1.1225 | **0.9640** | 38.5% | 15,981,848 |
+| 42 | 88.6ms | 6,772 | 1.1224 | **0.9641** | 38.6% | 15,904,632 |
+| 2025 | 88.6ms | 6,776 | 1.1231 | **0.9644** | 38.6% | 15,974,308 |
+| **Mean** | **88.6ms** | **6,771** | **1.1227** | **0.9642 (std 0.0002)** | **38.6%** | |


The results table rows start with ||, which renders as an extra empty column in standard Markdown table syntax. Use a single leading | per row so the table formats correctly on GitHub.

Copilot · 2026-03-26T19:20:14Z

records/track_10min_16mb/2026-03-26_NgramBackoff_VRL_LeakyReLU2/submission.json

+    "val_bpb": 0.9642,
+    "val_loss": 1.6279,
+    "bytes_total": 15953596,


bytes_total appears to be an average across seeds (it doesn’t match the per-seed totals shown in the attached logs). If submission.json is meant to describe a specific submitted artifact, it should use the exact bytes_total (and ideally the exact val_loss/val_bpb) for that chosen artifact; otherwise consider adding explicit fields indicating these values are 3-seed means.

Suggested change

"val_bpb": 0.9642,

"val_loss": 1.6279,

"bytes_total": 15953596,

"val_bpb_mean_3seed": 0.9642,

"val_loss_mean_3seed": 1.6279,

"bytes_total_mean_3seed": 15953596,

MatoTeziTanka · 2026-04-11T14:11:24Z

Community Review — Record-track N-gram Backoff (pre-fused-kernel twin of #915)

BPB: 0.9642 post-ngram (3-seed mean, std 0.0002) / 1.1225 pure-neural stride-64 | Seeds: 3 | Artifact: 15,981,848 B (seed 1337) | Track: record | Compliance: N-gram PASS (same code as #915), eval-budget FLAG (same gate as #915)

TL;DR for the mod team: This is the record-track filing of the same neural+n-gram stack that the author also filed as non-record in PR #915. I reviewed #915 on 2026-04-11 (comment). The n-gram compliance story is literally the same code, so the n-gram verdict is the same; the eval-wallclock question is also the same and gates a record listing the same way it gates the non-record listing.

Relationship to #915 (confirmed by byte-for-byte diff at SHA 50ec6bc):

NgramBackoffCache class, _hash_ctx, _hash_full, update, predict, entropy-adaptive alpha, score-first window logic, and the entire eval_val_ngram_backoff control flow are byte-identical between Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 (this PR) and Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915.
The only real differences Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 adds on top of Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 are: (1) a Makora-generated fused softcap+cross_entropy CUDA kernel compiled via torch.utils.cpp_extension.load_inline with a PyTorch fallback, (2) a new forward_logits_raw method returning pre-softcap logits to feed the kernel, and (3) a use_fused branch in the eval path. Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 contains none of that — it uses the standard F.cross_entropy(softcapped_logits) path at L997-998.
In other words: Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 is the clean "neural + n-gram" submission filed as a record on 2026-03-26, and Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 is the author's follow-up non-record filing of the same result plus a standalone fused-kernel contribution.

N-gram compliance (identical analysis to my #915 review, line numbers shifted):

_hash_ctx at L919-923 reads tokens[pos - ctx_w + k] for k in [0, ctx_w). With pos = ws + t + 1 (the absolute token index of the target, L1006) and ctx_w >= 1, the hashed indices are strictly [pos - ctx_w, ..., pos - 1] — the prefix before the target. Strict prefix lookup — _hash_ctx reads only tokens[pos - ctx_w + k] for k ∈ [0, ctx_w), so the lookup key depends only on the prefix x_{pos-ctx_w}...x_{pos-1}, satisfying Issue A Field Guide to Valid Submissions #1017 condition 1.
predict(tokens, pos, target) at L937-950 uses target only to index full_tables[oi][full_h] for the single true target — it's computing P(target | context), not oracle argmax over candidate targets. Per @valerio-oai's ruling on PR Record: BackoffNgramMixer + Drift-Free TTT (3-seed mean val_bpb=0.6683) #779 (comment 4145781641, 2026-03-27), the disallowed pattern is hashing the target into a key used to select the prediction over candidate targets. This code looks up one count for the one true target to compute a probability — standard n-gram P(target|ctx) = count(ctx, target) / count(ctx) — which is what Issue A Field Guide to Valid Submissions #1017 condition 1 ("p_t may depend only on the artifact and x_1...x_{t-1}") permits when combined with a backward-only cache.
Score-before-update at window granularity (L1005-1030): the scoring loop finishes a whole window before cache.update(all_tokens, scored_up_to, new_end) (L1029) adds that window's tokens. scored_up_to starts at the first-assigned window's left edge. No token ever contributes to its own prediction. Legal under Issues Invalid submissions due to information leakage during TTT #402 / Illegal submissions megathread #677.
Mixing (L1012-1016): mixed_p = (1 - alpha) * model_p + alpha * ng_p, floored at 1e-12. Linear in probability space, standard.

Why the BPB is 0.9642 and not ~1.08: Same as #915: the pre-ngram stride-64 number is 1.1225 (exactly in the SP1024 11L VRL pack), and the n-gram cache buys ~0.16 BPB on top at large eval-wallclock cost. The 0.9642 is a post-processing number layered over a normal 1.12-ish neural eval.

Main flag — eval-budget compliance (IDENTICAL to #915, and this matters more for a record):

From train_seed1337.log:

final_int6_sliding_window val_loss:1.8953 val_bpb:1.1225 stride:64 eval_time:102169ms
final_ngram               val_loss:1.6277 val_bpb:0.9640 ngram_eval_time:895349ms

ngram_eval_time: 895,349 ms ≈ 14.9 minutes for the n-gram stage alone, on top of the ~102 s stride-64 neural eval. Total ~16.6 min on 8×H100 SXM.
The README frames this as "~15 min on 8×H100 SXM (under 10 min per-GPU)". That "per-GPU" reading isn't how I read Issue Environment Clarifications (pytorch, CUDA, H100) #17 / the README's eval budget — the budget is total wallclock on the 8-GPU eval image, not per-device. This gated my MERGE on Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 and it gates the record here as well. For a record-track submission this question needs to be settled before the BPB can be listed against the leaderboard. A one-line mod ruling would unblock Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889, Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915, and the rest of the ngram-cache queue at once.
Everything else on the submission checks out.

Gauntlet (CPU pre-flight on the PR head at SHA 50ec6bc):

[PASS] Import, Hyperparameters (dim=512, layers=11, heads=8, vocab=1024)
[PASS] Model: 26,993,766 params
[PASS] Forward pass: loss=6.9362
[PASS] Artifact: 4,635,892 B (29.0% of 16MB) via int6+lzma on freshly-initialized weights
[INFO] Code size: 67,048 B (matches submission.json bytes_code exactly)
[INFO] Est. 8×H100: 45.9 ms/step, 13,058 steps in 10 min

Gauntlet PASS on all checks. Unlike #915, this version has no fused-kernel compile step, so CPU imports straight through — no fallback path needed.

Seed coverage / artifact sizes (per README table, verified against train_seed1337.log):

seed 1337: 15,981,848 B, post-ngram 0.9640
seed 42: 15,904,632 B, post-ngram 0.9641
seed 2025: 15,974,308 B, post-ngram 0.9644
All under 16,000,000 B. Mean 0.9642, std 0.0002. Pre-ngram stride-64 is 1.1225 / 1.1224 / 1.1231. Tight.

Questions / flags:

Eval wallclock (same flag as Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915). The n-gram stage is pure-Python over a NumPy buffer and takes ~15 min on an 8-GPU image. If the eval budget is 10 min total wallclock (my reading), this doesn't fit; if it's 10 min per-GPU (the author's reading), it does. Record-track submissions need this resolved before listing.
Is Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 or Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 the "authoritative" filing? Both point at the same records folder. Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889 was created 2026-03-26 19:13 UTC and is record-track; Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 came later and is non-record. If this stack is going to land, the mod team should decide which PR to merge and close the other to avoid duplicate records folder.
Prior-art credit. README credits PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727 (@Asukabot0) for the n-gram backoff, PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414 for the neural base, PRs Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309) #493/Record: 11L XSA4 + LeakyReLU(0.5)² + Cosine TTT 50ep (val_bpb=1.0622) #518 for LeakyReLU², Record: 11L VRL + LeakyReLU² + Full GPTQ (3-seed mean val_bpb=1.1175) #569 for VRL, and PR Record: 11L XSA + EMA + Int6 MLP3x + WD=0.04 (val_bpb: 1.1271) #287 for XSA. Clean attribution.

Verdict: NEEDS CLARIFICATION — on eval-budget interpretation. The technique is compliant (n-gram is backward-looking, score-first, no oracle; byte-identical to #915's clean implementation) and the engineering is clean, but a record-track listing of 0.9642 requires the eval wallclock question to be resolved.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica:

HOLD as record pending a one-line mod ruling on the 10-min eval-wallclock interpretation for post-processing caches (Issue Environment Clarifications (pytorch, CUDA, H100) #17 / README). The code itself is clean and the seed numbers are tight — the only blocker is the wallclock reading.
Pair-resolve with Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915. The two PRs share a records folder and the same seed logs. If record: merge Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean) #889, close Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 (or leave Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915 as the non-record twin explicitly for the added fused-kernel contribution, which is itself README-requested and worth merging independently). If the eval-wallclock ruling lands against the author's reading, neither PR should be listed as-is and the n-gram stage would need to be ported out of pure-Python before resubmission.
Anthony has a strong track record of detailed README compliance audits and prior-art credit on his prior PRs, and the n-gram code here clears the current compliance bar without reservation. The only reason this isn't a straight MERGE is the eval-budget question, which is upstream of the PR.

Reviewed by @MatoTeziTanka — The Agora. Gauntlet ran clean on CPU: all checks PASS, artifact budget 29.0%, no fused-kernel fallback needed (this PR doesn't contain one). AI tooling: review drafted with Claude Code (Opus) using an internal review template; all citations, file paths, and compliance audits were verified against the PR's actual code at SHA 50ec6bce1d6722caa8d20ad6f6f53fbec9abfdae, including a byte-for-byte diff of the n-gram cache against PR #915 at SHA 15a5cb8c.

@0hq

…cluster + CT2038 gauntlet provisioned Reviewed all 20 highest-priority Tier 1 PRs from openai/parameter-golf. Two cluster-level findings: - N-gram family bug (10 PRs CLOSED + 1 already ruled): full_key = ((ctx_hash ^ (target * primes[k])) & mask) — target token hashed into the eval-cache lookup key, ruled illegal by valerio-oai on PR openai#779. Same verbatim pattern in openai#770/openai#798/openai#808/openai#825/openai#786/openai#797/openai#909/openai#940/openai#761 + openai#764 follow-up. Upstream parent: lukacf (openai#659/openai#702/openai#727 — task #5 audit queued). - Standard SLOT cluster (4 HOLD pending openai#1336, 2 CLOSE): per-window delta+logit_bias optimized N steps against (per_token_nll * mask) where mask = scored positions [s:wlen]. PRs openai#1321/openai#1324/openai#1278/openai#1263 → HOLD; openai#1319/openai#1376 → CLOSE. Clean MERGE-eligible: openai#1420 (token_hint-only post-fix) and openai#1450 (TMA megakernel triple loop). Eval-budget gate (openai#915/openai#889 anthony-maio pair): clean ngram code, ~14.9 min ngram stage on 8xH100 SXM. One @0hq ruling on Issue openai#17 unblocks both PRs plus ~30 ngram-cache PRs. Infrastructure: provisioned CT2038 (proteus-engine, 128 GB RAM, 32 cores) as the dedicated parameter-golf gauntlet host. Installed Triton 3.6.0, deployed cpu_test.py + flash_attn_stub.py. Re-ran the 4 PRs originally skipped due to FA3/Triton blockers — all PASS. Edited 4 GitHub comments via gh api PATCH to add the rerun results. Coverage went from 9/20 to 14/20 fully gauntleted. Side session handed off via SOW_HF_DATASET_REPUBLISH.md (Scylla 998→1254 fix + SP4096/SP8192/SP12288/SP16384 publish + Cloudflare R2 mirror). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

anthony-maio and others added 2 commits March 26, 2026 15:12

Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed)

11f5159

Sub-1.0 bpb via multi-order n-gram backoff (2-7gram) with entropy-adaptive alpha mixing. 3-seed mean 0.9642, std 0.0002. All artifacts under 16MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove private files from submission branch

50ec6bc

Copilot AI review requested due to automatic review settings March 26, 2026 19:13

Copilot started reviewing on behalf of anthony-maio March 26, 2026 19:14 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

anthony-maio mentioned this pull request Mar 27, 2026

Non-record: Fused Softcap+CE Megakernel (1.94x vs torch.compile) + N-gram Backoff #915

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean)#889

Record: N-gram Backoff + VRL + LeakyReLU² — val_bpb 0.9642 (3-seed mean)#889
anthony-maio wants to merge 2 commits intoopenai:mainfrom
anthony-maio:submission/ngram-backoff-clean

anthony-maio commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		probs = torch.exp(log_probs)
		entropy = -(probs * log_probs).sum(dim=-1)

	probs = torch.exp(log_probs)
	entropy = -(probs * log_probs).sum(dim=-1)
	entropy = -(log_probs.exp() * log_probs).sum(dim=-1)

	raise ValueError(f"Validation split is too short for TRAIN_SEQ_LEN={seq_len}")
	raise ValueError(f"Validation split is too short for seq_len={seq_len}")

Conversation

anthony-maio commented Mar 26, 2026

Summary

3-Seed Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)

Key Innovation: Multi-Order N-gram Backoff Cache

Training Architecture

Credits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Record-track N-gram Backoff (pre-fused-kernel twin of #915)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants