Skip to content

Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean)#399

Open
abaybektursun wants to merge 4 commits intoopenai:mainfrom
abaybektursun:submission/parallel-muon-82ms
Open

Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean)#399
abaybektursun wants to merge 4 commits intoopenai:mainfrom
abaybektursun:submission/parallel-muon-82ms

Conversation

@abaybektursun
Copy link
Copy Markdown
Contributor

@abaybektursun abaybektursun commented Mar 22, 2026

Novel Contribution: Parameter Banking + Parallel Muon

This submission introduces Parameter Banking, a weight layout restructuring that enables batched optimizer operations, combined with an adapted Parallel Muon communication strategy. Together, these provide a 3.4% training throughput improvement that is architecture-agnostic and composes with any Muon-based training stack. The approach has since been adopted by subsequent competition submissions (e.g., PR #549).

Pure systems optimization — model architecture and hyperparameters are unchanged.

3-Seed Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128, 600s)

Seed step_avg steps int6 sliding val_bpb artifact
1337 81.86 ms 7,331 1.1241 15,830,960 bytes
42 81.88 ms 7,328 1.1253 15,819,728 bytes
2025 81.86 ms 7,330 1.1247 15,796,052 bytes
Mean 81.87 ms 7,330 1.1247 (std 0.0006) ~15.8 MB

Technical Approach

1. Parameter Banking (novel)

We restructure 66 separate nn.Linear weight matrices into 4 contiguous 3D nn.Parameter tensors, grouped by shape:

  • qo_bank: (22, 512, 512) — Q + Out projections
  • kv_bank: (22, 256, 512) — K + V projections
  • mlp_up_bank: (11, 1536, 512) — MLP up
  • mlp_down_bank: (11, 512, 1536) — MLP down

Forward pass uses F.linear(x, bank[layer_idx]) — compiles identically to nn.Linear under torch.compile. Verified: banked forward+backward = 72.33ms vs baseline 72.59ms.

The key benefit: Newton-Schulz orthogonalization (used by Muon) becomes a single torch.bmm over the batch dimension, replacing 66 sequential small GEMMs. This reduces optimizer time from 19.7ms to 1.3ms (15× faster).

2. Parallel Muon (adapted from arXiv:2511.07464)

Standard DDP is incompatible with parameter banking: bank gradients aggregate across all 11 layers and are only available at end of backward, destroying compute-communication overlap (+4ms regression).

Our solution removes DDP for banked parameters and schedules communication explicitly:

  1. Launch async reduce_scatter for all banks (biggest first)
  2. all_reduce + Adam step on small replicated params (while bank RS is in-flight)
  3. Wait for RS, local batched NS on each GPU's shard, async all_gather

This follows the DDP-free communication pattern from modded-nanogpt, adapted to work with our banking structure.

Engineering notes

Approach Result Lesson
Non-surgery batching (keep 66 params, batch in optimizer) 85.73ms Gather/scatter kernel overhead offsets speedup
DDP with banks 88.8ms (+4ms) Bank grads only available at end of backward
Polar Express (arXiv:2505.16932) 82ms, 16.2MB PE weights compress ~190KB worse than NS
Parameter Banking + Parallel Muon 81.87ms, 15.8MB Architecture-agnostic, composable

Compatibility analysis

Base PR Speed Score Finding
#315 (EMA only) -3.4% -0.0006 BPB Extra steps improve EMA monotonically
#374 (Tight SWA) -3.5% +0.001 SWA averages warmdown weights; extra steps don't enter the window
#401 (EMA+SWA) -2.8% +0.0005 Same SWA dilution
#398 (TTT) -2.3% +0.004 More-converged model has less room for TTT adaptation

Key finding: The throughput advantage translates to quality gains exclusively for EMA-based models, where every additional step monotonically refines the exponential moving average.

Credits

🤖 Generated with Claude Code

Systems optimization built on PR openai#315 by @jfprincz (11L XSA4+EMA, 1.1248 bpb).
Same architecture, same hyperparameters, only optimizer changed.

82.14ms/step vs 84.76ms baseline = 7,306 steps vs 7,079 in 600s.
Pre-quant val_bpb 1.1421 (identical to baseline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun and others added 2 commits March 22, 2026 00:13
…1.1248)

Unbank state dict before quantization so int6 per-row scales match baseline.
Rebank after dequantization for roundtrip eval.

Results: 82.13ms/step, 7,306 steps, int6 sliding window val_bpb 1.1238.
Artifact: 16.06MB (int6+zstd).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seeds 42, 1337, 2025: mean 82.08ms/step, val_bpb 1.1239 (std 0.0001).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abaybektursun abaybektursun changed the title Record: Parallel Muon + Parameter Banking — 82.14ms/step (3.1% faster than PR #315) Record: Parallel Muon + Parameter Banking — 82.08ms/step (3.2% faster than PR #315) Mar 22, 2026
@abaybektursun abaybektursun force-pushed the submission/parallel-muon-82ms branch from 5f4d141 to 4db0057 Compare March 22, 2026 15:24
@abaybektursun abaybektursun changed the title Record: Parallel Muon + Parameter Banking — 82.08ms/step (3.2% faster than PR #315) Record: Parallel Muon + Parameter Banking + Polar Express — 82.14ms/step (3.1% faster than PR #315) Mar 22, 2026
Replaced Polar Express with standard Newton-Schulz + switched to lzma compression.
3-seed results: 81.87ms/step mean, 1.1247 sliding bpb mean, all artifacts ~15.8MB.

Seed 1337: 7331 steps, 1.1241 bpb, 15,830,960 bytes
Seed 42:   7328 steps, 1.1253 bpb, 15,819,728 bytes
Seed 2025: 7330 steps, 1.1247 bpb, 15,796,052 bytes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abaybektursun abaybektursun changed the title Record: Parallel Muon + Parameter Banking + Polar Express — 82.14ms/step (3.1% faster than PR #315) Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean) Mar 22, 2026
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 22, 2026
Legal score-first TTT (PR openai#461 recipe) applied to openai#414 stack with
Parameter Banking + Parallel Muon (first introduced in PR openai#399).

Pre-TTT: 1.1234, post-TTT: 1.1213 (-0.0021). TTT eval: 400s.
Artifact: 15.84 MB. Seed 1337, 8×H100 SXM, PyTorch 2.9.1+cu128.

Every token scored BEFORE model adapts (inference_mode enforced).
SGD+momentum(0.9), 3 epochs/32K chunk, freeze first 2 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
Legal score-first TTT (PR openai#461 recipe) applied to openai#414 stack with
Parameter Banking + Parallel Muon (first introduced in PR openai#399).

Pre-TTT: 1.1234, post-TTT: 1.1213 (-0.0021). TTT eval: 400s.
Artifact: 15.84 MB. Seed 1337, 8×H100 SXM, PyTorch 2.9.1+cu128.

Every token scored BEFORE model adapts (inference_mode enforced).
SGD+momentum(0.9), 3 epochs/32K chunk, freeze first 2 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
Legal score-first TTT (PR openai#461 recipe) applied to openai#414 stack with
Parameter Banking + Parallel Muon (first introduced in PR openai#399).

Pre-TTT: 1.1234, post-TTT: 1.1213 (-0.0021). TTT eval: 400s.
Artifact: 15.84 MB. Seed 1337, 8×H100 SXM, PyTorch 2.9.1+cu128.

Every token scored BEFORE model adapts (inference_mode enforced).
SGD+momentum(0.9), 3 epochs/32K chunk, freeze first 2 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
Legal score-first TTT (PR openai#461 recipe) applied to openai#414 stack with
Parameter Banking + Parallel Muon (first introduced in PR openai#399).

Pre-TTT: 1.1234, post-TTT: 1.1213 (-0.0021). TTT eval: 400s.
Artifact: 15.84 MB. Seed 1337, 8×H100 SXM, PyTorch 2.9.1+cu128.

Every token scored BEFORE model adapts (inference_mode enforced).
SGD+momentum(0.9), 3 epochs/32K chunk, freeze first 2 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
…d mean)

Legal score-first TTT (PR openai#461 recipe) + BigramHash(3072) + freeze=0
on openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results (BIGRAM=3072, 3ep, freeze=0, SGD+mom=0.9):
  Seed 1337: 1.1204 bpb, 413s TTT, 15.98 MB
  Seed 42:   1.1216 bpb, 406s TTT, 15.99 MB
  Seed 2025: 1.1221 bpb, 405s TTT, 15.99 MB
  Mean:      1.1214 (std 0.0009)

All artifacts under 16MB. All eval times under 600s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Seed 1337: pending (log will be added)
  Mean:      1.1195 (std 0.0008)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 23, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mistobaan pushed a commit to Mistobaan/parameter-golf that referenced this pull request Mar 25, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TimS-ml referenced this pull request in TimS-ml/parameter-golf-autoresearch Mar 26, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR #461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
#414 stack with Parameter Banking + Parallel Muon (PR #399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nedcut pushed a commit to nedcut/parameter-golf that referenced this pull request Mar 26, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
anish-krishnan pushed a commit to anish-krishnan/parameter-golf that referenced this pull request Mar 30, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Itssshikhar pushed a commit to Itssshikhar/parameter-golf that referenced this pull request Mar 31, 2026
…ed mean)

LeakyReLU(0.5)² activation (-0.003 vs relu²) + legal score-first TTT
(PR openai#461 recipe, 3ep SGD, all blocks unfrozen) + BigramHash(1536) on
openai#414 stack with Parameter Banking + Parallel Muon (PR openai#399).

3-seed results:
  Seed 1337: 1.1192 bpb, 410s TTT, 15.98 MB
  Seed 42:   1.1200 bpb, 408s TTT, 15.88 MB
  Seed 2025: 1.1189 bpb, 408s TTT, 15.99 MB
  Mean:      1.1194 (std 0.0006)

All artifacts under 16MB. All eval under 10 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
akaiHuang added a commit to akaiHuang/parameter-golf that referenced this pull request Apr 7, 2026
Add a non-record submission documenting a stack that runs without
Flash Attention 3 (the runpod default pytorch:2.4.0 image lacks
flash_attn_3). 1-seed result: val_bpb 1.1854, beating the OpenAI
baseline (1.2244) by -0.039 BPB.

Stack:
- 11L d=512 SP1024
- XSA-all + BigramHash 3072x112 (from PR openai#1019)
- Parallel Muon (from PR openai#399)
- Step-based warmdown=2000/3500 (documents trigger bug)
- Mixed Q4/Q5/Q6 quantization (Gemma-4 inspired, ~100 LOC pipeline)
- Sliding-window eval stride=32, temperature=0.90

No SLOT, no TTT, no validation data accessed during eval.
Eval: 322s wall on 8xH100 (within 600s budget).

Single seed only (record track requires 3-seed mean).
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 9, 2026
Thesis: the speed path is the most underutilized section of openai/parameter-golf.
The quality path has 170+ PRs; the speed path has maybe 30 and 2-3 genuine novelties.
Our 13x per-GPU gap vs comp records is almost entirely soft — most of it collapses
under free wins + comp ports.

Findings:

TIER 0 FREE WINS (before any kernel work) — ~3x speedup, 2-3 days total:
- Shot 0a: drop grad_accum_steps 8→1. The single biggest easy win hiding in
  plain sight. We're paying 8x kernel-launch overhead because grad_accum was
  inherited from an 8xGPU distributed config. 5 LOC, 30-50% speedup.
- Shot 0b: eval batched + streaming KV cache. Current sliding-window eval is
  625K sequential forwards at B=1 stride=64. 97% of each window's context is
  shared with the previous. Streaming KV (StreamingLLM arXiv 2309.17453) gives
  5-15x eval speedup, saves 3-5 min of the 600s budget.
- Shot 0c: SkyLadder progressive seq_len 256→2048 (NeurIPS 2025 arXiv
  2503.15450). 22% throughput + 1-3.7% quality. Already in Mac SETUP §35
  backlog, never shipped.
- Shot 0d: train-data GPTQ calibration (PR openai#1219, comp-organizer-approved).
  Replaces 220s AR self-gen with 14s. +2000 extra training steps.
- Free: TORCHINDUCTOR_MIX_ORDER_REDUCTION=0 + torch 2.9.1 pin. +8.8% step time.

TIER 2 COMP-PORT WINS we missed in the original Phase 2 plan:
- Shot 9: FA3 varlen + window + mixed seq_len across GPUs (PR openai#1212 holds the
  fastest step in the leaderboard at 69.6 ms/step)
- Shot 10: Parameter Banking + Parallel Muon (PR openai#399): 66 nn.Linear → 4
  contiguous 3D banks → Newton-Schulz becomes one bmm → optimizer time 19.7 ms
  → 1.3 ms (15x). World-novel, NOT in modded-nanogpt.
- Shot 11: CUTLASS EVT backward with the novel `post=0.5·act_grad·pre` identity
  (PRs openai#1105, openai#1420). Identity itself looks world-novel.
- Shots 13-14: eval path wins (Triton KV-cache backend, fused softcap+CE
  megakernel). Combined eval speedup ~5x on top of Shot 0b.

TIER 3 BIG DREAMS (world-first opportunities):
- Megadream 1: **Training megakernel** (fwd+bwd+optim in a single persistent
  SM kernel). HazyResearch / Mirage / MegaQwen have inference megakernels;
  nobody has built one for TRAINING. 1.3us × ~600 launches per step = 16% of
  our step budget is pure launch overhead. 5-7 days, 500-1500 LOC, ThunderKittens
  templates. Potential PhD-defensible mini-paper.
- Megadream 2: **Streaming KV sliding-window eval** (our Shot 0b, also novel)
- Megadream 3: **Fuzzy LR bandit per microbatch** — user's "dial-in" hint
  operationalized. Thompson sampling from {0.5x, 1x, 2x} * base_lr. 80 LOC.
- Megadream 4: **CPU n-gram precompute thread** — user's "CPU while GPU" hint
  operationalized. BG thread pre-computes n-gram hash tensors, 50 LOC.
- Megadream 5: **GPU-resident successive halving** — user's "GPU tests" hint
  operationalized. Run 4 replicas × 100 steps inside the 600s budget, pick
  winner, continue. Online hyperband. 200 LOC.
- Megadream 6: **AOTInductor precompile + binary ship** — kill the 5+ min
  compile cold-start permanently.

Stacked expected impact:
- Phase 1 (now): 180 steps / 600s, val_bpb ~1.4-1.6
- +Tier 0 free wins: ~540 steps, val_bpb ~1.25-1.35
- +Tier 1 kernel work: ~2000 steps, val_bpb ~1.15-1.22
- +Tier 2 comp ports: ~4000 steps, val_bpb ~1.10-1.15
- +Tier 3 Megadream 1 (training megakernel): ~8000 steps, val_bpb ~1.08-1.12
- +Tier 3 all: ~10000 steps, val_bpb ~1.06-1.10 (**ahead of comp on 1xH100**)

10000 steps on 1xH100 = 4x more per-GPU training than the comp's 20000 on 8xH100.
That's where val_bpb drops BELOW comp records.

Key finding: eval path holds the biggest speed wins currently, not training.
Our sliding-window eval eats 10-15 min of the 600s budget. Tier 0b + Tier 2
Shots 13-14 save 5-8 min per eval pass. More than any training-side single
patch would buy at our current rate.

Source reports: /tmp/phase2_comp_speed_audit.md (22 PRs surveyed),
/tmp/phase2_world_speed_research.md (12 research areas surveyed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 9, 2026
Documents what's actually in the repo now:

SHIPPED:
- phase2/README.md, bootstrap.sh, metrics.py, warm_compile_cache.py, run.sh
- submission/run.sh: Inductor patch + CUDA allocator expandable segments
- submission/train.py ShuffledSequenceLoader: prefetch thread + pinned RAM
  + prefill during pretime
- All gated by env vars with sensible defaults on

NOT SHIPPED (future work):
- Shot 2 FA3 sourcing (not on PyPI)
- Shot 9 FA3 varlen + window attention (PR openai#1212)
- Shot 10 Parameter Banking + Parallel Muon (PR openai#399)
- Shot 14 Training megakernel (world-first)
- Shot 0b batched + streaming KV sliding eval
- Shot 17 fuzzy LR bandit
- Shot 19 GPU-resident successive halving

HONEST SKIPS:
- grad_accum 8→1: research agent missed memory math, would OOM
- CPU n-gram precompute: research agent missed GPU HBM is 60× faster than
  CPU→GPU PCIe path for gather ops. Pivoted to prefetch prefill instead.

Tasks 7-12 complete (metrics, free env wins, prefetch loader, compile cache
warmup, prefill during pretime, bootstrap wiring). Phase 2 Tier 0 is
mechanically shipped. Still a plan for the bigger shots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: Parallel Muon + Parameter Banking — 81.87ms/step, val_bpb 1.1247 (3-seed mean)

BPB: 1.1247 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA d5340951f214, file records/track_10min_16mb/2026-03-22_ParallelMuon_ParameterBanking_82ms/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=9, vocab=1024, code=76436 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=9, vocab=1024, code=76436 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants