-
Notifications
You must be signed in to change notification settings - Fork 52
Gemma3 Multimodal optimization #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Jimin Ha <[email protected]>
Signed-off-by: Jimin Ha <[email protected]>
Signed-off-by: Mohit Deopujari <[email protected]>
Signed-off-by: Jimin Ha <[email protected]>
Signed-off-by: Jimin Ha <[email protected]>
Reduces memory usage for long sequences by eliminating dual attention mask creation. Improves capacity from 150 to 400 images with 8K prompts by avoiding OOM issues. Limitation: Only available when block_list is None. Signed-off-by: Jimin Ha <[email protected]>
f21b007
to
5fd5a81
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
5fd5a81
to
e2238fc
Compare
@xuechendi Could you review this? This includes model file and some utils changes which are necessary for gemma3 model optimization. |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Jimin Ha <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
This is to optimize gemma3 multimodal memory/performance.