Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# 10L MLP3x Int6 Baseline (non-record)

Non-record submission. Local MLX smoke test confirming pipeline works end-to-end.

## Config
- 10 layers, 512 dim, 8 heads, 4 KV heads
- MLP 3x expansion (hidden=1536), relu²
- int6 quantization, zlib-9 compression
- Trained on Apple Silicon (MLX), 200 iterations only

## Score
val_bpb: 2.3517 (200 iterations — not a competitive score)

## Planned improvements
- zstd-22 compression
- Sliding window eval (stride=64)
- Muon WD=0.04
- SmearGate + BigramHash
- SWA over last 40% of warmdown
- Full 10-min run on 8xH100
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"name": "Your Name",
"github_id": "your_github_username",
"val_bpb": 2.3517,
"notes": "Non-record submission. 10-layer, 3x MLP, int6 quant baseline run on Apple Silicon MLX. 200 iterations smoke test only \u2014 full H100 run pending compute grant."
}
Loading