-
Notifications
You must be signed in to change notification settings - Fork 385
Issue: Missing complete expert_index.json for Qwen3.5-397B-A17B-4bit model #17
Description
Problem Description
When trying to run the flash-moe project with the Qwen3.5-397B-A17B-4bit model, I encountered an issue where the provided expert_index.json file only contains expert information for layer 0, but the project requires expert information for all 60 layers.
Steps to Reproduce
- Clone the repository and set up the environment
- Download the Qwen3.5-397B-A17B-4bit model from ModelScope (46 safetensors files, ~224GB total)
- Run
extract_weights.pyto createmodel_weights.binandmodel_weights.json(successful) - Run
repack_experts.pyto create packed expert files - Try to run
./metal_infer --fullor./inferwith the model
Error Message
ERROR: Cannot open /path/to/model/packed_experts/layer_01.bin: No such file or directory
Root Cause Analysis
-
Current
expert_index.jsonstructure: Only contains expert information for layer 0{ "model_path": "...", "expert_reads": { "0": { ... } // Only layer 0! } } -
Expected structure: Should contain expert information for all 60 layers (0-59)
{ "model_path": "...", "expert_reads": { "0": { ... }, "1": { ... }, // ... layers 2-58 ... "59": { ... } } } -
Impact: The
repack_experts.pyscript can only process layer 0, leaving layers 1-59 without packed expert files.
Workarounds Attempted
- Modified
expert_index.json: Tried to manually add layer information, but need accurate offsets from all 46 safetensors files - Partial testing: Can only test layer 0 performance, cannot run full 60-layer inference
- Code inspection: Found that
repack_experts.pyexpects a complete index but the provided one is incomplete
Questions for the Author
- Is there a script to generate the complete
expert_index.jsonfor all 60 layers from the 46 safetensors files? - Can you provide the complete
expert_index.jsonfile for the Qwen3.5-397B-A17B-4bit model? - What's the intended workflow for users who download the model from ModelScope/Hugging Face?
Environment
- Project: flash-moe (commit 3601d41)
- Model: mlx-community/Qwen3.5-397B-A17B-4bit from ModelScope
- Hardware: MacBook Pro M4 Max, 128GB RAM
- OS: macOS (Darwin Kernel Version 24.6.0)
Additional Context
The project is amazing and the performance claims are impressive! I was able to successfully:
- Download the 224GB model
- Extract non-expert weights (5.5GB
model_weights.bin) - Run single-layer MoE benchmarks (showing ~160 tok/s theoretical throughput)
- Build all binaries successfully
The only blocker is the missing expert information for layers 1-59.
Suggested Solution
- Option A: Provide a script that analyzes the 46 safetensors files and generates the complete
expert_index.json - Option B: Share the complete
expert_index.jsonfile in the repository - Option C: Document the exact process for users to generate this index themselves
Thank you for creating this incredible project! Looking forward to running the full 397B model on my MacBook Pro.