-
Notifications
You must be signed in to change notification settings - Fork 373
Pull requests: danveloper/flash-moe
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
experiment: M4 Pro 24GB — 3.50 tok/s at 4-bit, architecture confirmed on 24GB
#21
opened Apr 3, 2026 by
JackCid89
Loading…
fix: 8-bit dequant for MLX mixed-precision gate quantization
#14
opened Mar 23, 2026 by
userFRM
Loading…
4 tasks
feat: cache-aware routing + co-activation expert clustering
#12
opened Mar 23, 2026 by
userFRM
Loading…
6 tasks
perf: 5 pure-win optimizations — zero quality tradeoff
#11
opened Mar 22, 2026 by
userFRM
Loading…
4 tasks
feat: CUDA/NVIDIA port — Qwen3.5-397B on single GPU at 5.35 tok/s (5.86 peak)
#7
opened Mar 22, 2026 by
ssubbotin
Loading…
10 tasks done
feat: live dashboard monitor + serve loop improvements
#5
opened Mar 21, 2026 by
msitarzewski
Loading…
5 tasks
feat: runtime model config from HuggingFace config.json
#3
opened Mar 20, 2026 by
Alexintosh
Loading…
6 tasks
Fix portability: runtime paths, missing setup scripts, vocab format bug
#1
opened Mar 19, 2026 by
msitarzewski
Loading…
5 tasks
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.