-
Notifications
You must be signed in to change notification settings - Fork 237
Support VLM calibration with image-text data #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #755 +/- ##
==========================================
- Coverage 74.66% 73.16% -1.50%
==========================================
Files 192 193 +1
Lines 18975 19346 +371
==========================================
- Hits 14167 14154 -13
- Misses 4808 5192 +384 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
…for Nemotron-VLM-Dataset-v2 Signed-off-by: Zhiyu Cheng <[email protected]>
…for Nemotron-VLM-Dataset-v2 Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
|
So, we only support image quantization for just nemotron-vl? If yes, why? |
| type=str, | ||
| default=None, | ||
| ) | ||
| parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all these added flags necessary? why cannot we just use calib dataset instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| # limitations under the License. | ||
|
|
||
| """Utility functions for getting samples and forward loop function for different vlm datasets.""" | ||
| """Utility functions for getting samples and dataloader for different VLM calibration datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajrasane could you review this change?
| --trust_remote_code \ | ||
| --calib_with_images \ | ||
| --vlm_dataset nemotron_vlm_dataset_v2 \ | ||
| --vlm_subsets sparsetables,plotqa_cot,wiki_en \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far for LLM we just make these default without introducing the flag. Curious to know if we can follow the same convention and reduce the flags.
| --qformat nvfp4 \ | ||
| --export_path <quantized_ckpt_path> \ | ||
| --trust_remote_code \ | ||
| --calib_with_images \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be the default for VLM?
|
@Edwardf0t1 do you have experiments evaluating the accuracy impact of using the new dataset? |
| --qformat nvfp4 \ | ||
| --export_path <quantized_ckpt_path> \ | ||
| --trust_remote_code \ | ||
| --calib_with_images \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check if the model supports an image as input and enable this automatically?
|
|
||
| # Some Nemotron tokenizers may not define pad_token by default; but we use padding=True during calibration. | ||
| if tokenizer.pad_token is None: | ||
| tokenizer.pad_token = tokenizer.eos_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to reverse this change before saving the tokenizer here?
Model-Optimizer/examples/llm_ptq/hf_ptq.py
Line 549 in 951c6aa
| tokenizer.save_pretrained(export_path) |
What does this PR do?
Type of change: New feature
Overview:
The primary goal of this PR is to allow the model optimizer to use image-text pair data during the calibration phase of quantization, which is likely help improve accuracy of quantized VLMs like Nemotron VL on visual understanding tasks particularly, compared to text-only calibration data.
Nemotron-VLM-Dataset-v2.hf_ptq.py) clean.Nemotron-Nano-VL-12B-V2model with image data.This PR complements #347 and we will consolidate llm_ptq and vlm_ptq examples in follow-up PRs.
Usage
Testing
Before your PR is "Ready for review"
Additional Information