Skip to content

Non-record: 10L MLP3x int6 baseline (MLX smoke test)#404

Open
yashward001 wants to merge 1 commit intoopenai:mainfrom
yashward001:submission/10L-3x-int6-baseline
Open

Non-record: 10L MLP3x int6 baseline (MLX smoke test)#404
yashward001 wants to merge 1 commit intoopenai:mainfrom
yashward001:submission/10L-3x-int6-baseline

Conversation

@yashward001
Copy link
Copy Markdown

No description provided.

@yashward001
Copy link
Copy Markdown
Author

Non-record submission. Local MLX smoke test confirming the full pipeline works end-to-end on Apple Silicon. 10-layer, 3× MLP, int6 quantization, zlib-9 compression. 200 iterations only (val_bpb 2.3517) — not a competitive score. Full H100 run planned once compute grant is approved. Planned improvements: zstd-22, sliding window eval, Muon WD, SmearGate, BigramHash, SWA.

@yashward001
Copy link
Copy Markdown
Author

Approach summary

Architecture: 10 layers, 512 dim, 8 heads (4 KV heads GQA), 3× MLP expansion (hidden=1536), relu² activation, U-Net skip connections, tied embeddings.

Compression: int6 per-row quantization on all block weights, fp16 tied embeddings, zlib-9. Estimated artifact ~14.6MB.

Current result: val_bpb 2.3517 at 200 iterations (MLX smoke test on Apple Silicon — not competitive, pipeline validation only).

Planned improvements in order:

  1. zlib-9 → zstd-22 (~730KB savings, frees headroom)
  2. Sliding window eval stride=64 (~0.034 bpb free)
  3. Muon weight decay WD=0.04 (improves int6 quantization quality)
  4. SmearGate (learned prev/curr token blend, ~512 params)
  5. BigramHash embeddings 4096→10240 buckets (~0.002 bpb)
  6. SWA over last 40% of warmdown every 50 steps
  7. Mixed int5 MLP / int6 attn QAT + 11th layer

Research directions: Asymmetric MLP expansion (early layers 2×, late layers 4×); SWA+QAT interaction (whether averaged checkpoints quantize better than final checkpoint).

Full H100 runs pending compute grant.

@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: 10L MLP3x int6 baseline (MLX smoke test)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants