Skip to content

experiment: M4 Pro 24GB — 3.50 tok/s at 4-bit, architecture confirmed on 24GB#21

Open
JackCid89 wants to merge 1 commit intodanveloper:mainfrom
JackCid89:m4-pro-24gb-experiment
Open

experiment: M4 Pro 24GB — 3.50 tok/s at 4-bit, architecture confirmed on 24GB#21
JackCid89 wants to merge 1 commit intodanveloper:mainfrom
JackCid89:m4-pro-24gb-experiment

Conversation

@JackCid89
Copy link
Copy Markdown

What

Ran flash-moe on a MacBook Pro M4 Pro with 24GB unified memory and documented the results.

Results

Machine RAM GPU cores Bandwidth tok/s TTFT
M3 Max (original) 48 GB 40 ~400 GB/s 4.36
M4 Pro (this PR) 24 GB 20 ~273 GB/s 3.50 4613ms

Only ~20% slower despite half the RAM and half the GPU cores. The OS page cache with ~14GB available is still effective. No code changes required — the architecture scales down to 24GB unified memory without modification.

Also documents: vocab.bin GPT-2 byte decoding bug

export_tokenizer.py must reverse the GPT-2 byte-to-unicode encoding when building vocab.bin. Without this, raw BPE symbols leak into output (Ġ instead of space, Ċ instead of newline). The fix is to apply the bytes_to_unicode() reverse mapping when writing each token string.

Changes

  • CLAUDE.md / README.md: add M4 Pro 24GB to Hardware section; document export_tokenizer.py in project structure; add vocab.bin fix to What We Tried table
  • results.tsv + metal_infer/results.tsv: add M4 Pro experiment row

🤖 Generated with Claude Code

… on 24GB

Verified flash-moe on MacBook Pro M4 Pro with 24GB unified memory (half the
RAM and GPU cores of the original M3 Max 48GB machine).

Results:
- 4-bit experts: 3.50 tok/s steady-state, TTFT 4613ms
- Only ~20% slower despite halved memory bandwidth (~273 vs ~400 GB/s)
- OS page cache ~14GB (vs ~35GB on 48GB machine) still effective
- No code changes required — architecture scales down without modification

Also documents the GPT-2 byte-to-unicode decoding fix for vocab.bin:
export_tokenizer.py must reverse the BPE encoding (Ġ→space, Ċ→newline)
when building vocab.bin, otherwise raw BPE unicode leaks into output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant