-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Molmo-D-7B Model #1542
Conversation
Any updates on this? |
@jlia0 I'm working on this slowly since I'm juggling a couple things. It will take me probably a week. |
/ready |
def test_molmo_d(self): | ||
for text, images in CONVS: | ||
for model_path, torch_dtype, tolerance in MODELS: | ||
self.assert_srt_vision_backbone_and_hf_vision_backbone_close( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an end-to-end test for the whole model?
Can you add a new test here https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server.py?
Closed due to inactivity. Feel free to reopen if you have new progress. |
Support for https://huggingface.co/allenai/Molmo-7B-D-0924
It's been awhile since I've supported a model, so I figured it would be good practice. It seems to be a CLIP-Vit encoder with some embedding pooling and then a Qwen LLM backbone. I'm trying to figure out the vision backbone first and then I think the frozen parts will come together by importing from Qwen2.py file.