You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| MuonEq-R + Depth Recurrence + WD=0.090 + All-Int6 GPTQ | 1.0912 | dexhunter | On PR #1285 (submitted 2026-04-03 05:34 UTC): added MuonEq-R, repeated layers 4-5, raised weight decay to 0.090 for Brotli headroom, and kept all 66 GPTQ matrix layers at int6 | 2026-04-03 |[info](records/track_10min_16mb/2026-04-03_MuonEqR_DepthRecurrence_WD090_AllInt6/README.md)|
39
+
| 4096-Vocab + Larger Model + High WD + Simplifications | 1.0979 | Kevin Clark | On PR #1218 (submitted 2026-04-01 11:55 UTC): switched to SP4096, widened MLP to 4x, increased weight decay, added GPTQ/Brotli, and removed TTT, hash embeddings, SmearGate, value residuals, and other auxiliaries | 2026-04-01 |[info](records/track_10min_16mb/2026-04-01_Vocab4096_MLPMult4_WD085/README.md)|
40
+
| Parallel Residuals + Mini Depth Recurrence | 1.1063 | Marko Sisovic | On PR #1204 (submitted 2026-04-01 00:46 UTC): added delayed mini recurrence on layers 4-5, untied repeated MLPs, layer-7+ parallel attention/MLP residual lanes, and AR self-generated GPTQ calibration | 2026-03-31 |[info](records/track_10min_16mb/2026-03-31_ParallelResiduals_MiniDepthRecurrence/README.md)|
33
41
| 11L AR Self-Gen GPTQ + XSA | 1.1147 | abaybektursun | On PR #1019: Self-Generated GPTQ Calibration Data + all-layer XSA on the PR #549 stack | 2026-03-25 |[info](records/track_10min_16mb/2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072/README.md)|
0 commit comments