Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model support for Phi 3.5 MoE #1948

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

svaruag
Copy link

@svaruag svaruag commented Nov 7, 2024

Motivation

Add support for Phi 3.5 MoE

Modifications

Updated mixtral.py with a custom routing function specific to phi3moe

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@svaruag
Copy link
Author

svaruag commented Nov 7, 2024

@merrymercy I'm still making progress on this and would appreciate your input. I'm using vllm's phi3 moe implementation as a base, and I'm adjusting the parts that differ from mixtral, specifically the custom top k routing mechanism and layer norm. However, I'm encountering some gibberish outputs during generation. Do you have any insights on debugging this further? Thanks a lot!

@merrymercy
Copy link
Contributor

I am not super familiar with this model. Can you try this interactive debugging approach and compare the activations layer-by-layer? We typically use this method to add new models

https://sgl-project.github.io/references/supported_models.html#interactive-debugging

@merrymercy merrymercy self-assigned this Nov 8, 2024
@svaruag
Copy link
Author

svaruag commented Nov 8, 2024

Thanks @merrymercy, I was able to fix the problem here (was missing the residuals), will push in the fixes shortly, along with the testing code etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants