Skip to content

Latest commit

 

History

History
59 lines (48 loc) · 2.57 KB

File metadata and controls

59 lines (48 loc) · 2.57 KB

HSLM Experiment Results

Auto-generated by tri experiment export. Do not edit manually.

Leaderboard (sorted by best PPL)

Rank Directory Best PPL Best Loss Best Step Max Step Checkpoints
1 data/checkpoints 2.96 1.086 100000 100000 21
2 data/checkpoints/real TBD TBD TBD TBD 10
3 data/checkpoints_v3 TBD TBD TBD TBD 10
4 data/checkpoints_v13_lamb128 TBD TBD TBD TBD 4

Run tri experiment export to regenerate with actual values from checkpoint headers.

Wave 6 Configurations

All share: HSLM_OPTIMIZER=lamb, HSLM_LR_SCHEDULE=cosine, HSLM_BATCH=66, HSLM_WARMUP=2000

ID LR Context Steps Seed Special
W6-1 5e-4 27 100K 61 LR sweep
W6-2 7e-4 27 100K 62 LR sweep
W6-3 1.5e-3 27 100K 63 LR sweep
W6-4 2e-3 27 100K 64 LR sweep
W6-5 1e-3 9 100K 65 Short context
W6-6 1e-3 18 100K 66 Medium context
W6-7 1e-3 54 100K 67 Long context
W6-8 1e-3 81 100K 68 Overfitting test
W6-9 1e-3 27 100K 69 GRAD_ACCUM=1
W6-10 1e-3 27 100K 70 GRAD_ACCUM=4
W6-11 1e-3 27 100K 71 GRAD_ACCUM=8
W6-12 1e-3 27 100K 72 PHI schedule
W6-13 1e-3 27 100K 73 Restart period=33K
W6-14 1e-3 27 100K 74 Warmup=5000
W6-15 1e-3 27 100K 75 PHI_SCALE=1
W6-16 1e-3 27 100K 76 Adaptive sparsity
W6-17 1e-3 27 100K 77 PHI+PHI_SCALE
W6-18 1e-3 27 100K 78 Dropout=0.15
W6-19 1e-3 27 200K 79 Extended run
W6-20 1e-3 27 200K 80 PHI extended

Key Findings

  • R5 KING: PPL=2.96 (LAMB 1e-3, cosine, ctx=27) — best result to date
  • Cosine LR schedule essential — flat schedule dies by step 20K
  • LAMB optimizer outperforms AdamW for ternary weights
  • Context length 27 appears optimal for current architecture
  • Batch size 66 with gradient accumulation 2 is default baseline

Wave History

Wave Date Focus Best Result
Wave 2 2026-03-12 Night experiments, optimizer sweep v4R PPL=125
Wave 3 2026-03-12 STE wiring fix, batch size tuning v7 avg loss=5.73
Wave 4 2026-03-13 15 experiments, 3 accounts, all cosine R5 PPL=2.96 KING
Wave 5 2026-03-13 Extended runs, context variations In progress
Wave 6 2026-03-13 20 R5 KING variations Planned