Support VLM calibration with image-text data #755

Edwardf0t1 · 2026-01-09T09:26:47Z

What does this PR do?

Type of change: New feature

Overview:

The primary goal of this PR is to allow the model optimizer to use image-text pair data during the calibration phase of quantization, which is likely help improve accuracy of quantized VLMs like Nemotron VL on visual understanding tasks particularly, compared to text-only calibration data.

New Feature: Adds support for VLM calibration specifically using image-text data.
Dataset Integration: Introduces support for sampling from the Nemotron-VLM-Dataset-v2.
Refactoring: Created a separate utility for VLM datasets to keep the main Hugging Face PTQ script (hf_ptq.py) clean.
Simplified logic for handling multimodal inputs.
Addressed specific issues encountered when calibrating the Nemotron-Nano-VL-12B-V2 model with image data.
Documentation: Updated the README to include instructions and examples for VLM calibration.

This PR complements #347 and we will consolidate llm_ptq and vlm_ptq examples in follow-up PRs.

Usage

python3 hf_ptq.py   --pyt_ckpt_path /home/scratch.omniml_data_2/models/Nemotron-Nano-VL-12B-V2   --qformat nvfp4   --export_path /home/omniml_data_3/zhiyuc/checkpoints/Nemotron-Nano-VL-12B-V2-NVFP4-doccalib   --trust_remote_code   --kv_cache_qformat none --calib_with_images   --vlm_dataset nemotron_vlm_dataset_v2   --vlm_subsets sparsetables,plotqa_cot   --calib_size 512

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Not yet

Additional Information

Signed-off-by: Zhiyu Cheng <[email protected]>

copy-pr-bot · 2026-01-09T09:26:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2026-01-09T09:37:57Z

Codecov Report

❌ Patch coverage is 9.84615% with 293 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.16%. Comparing base (307fe71) to head (2a3868a).
⚠️ Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/utils/vlm_dataset_utils.py	8.37%	175 Missing ⚠️
modelopt/torch/utils/nemotron_vlm_dataset_utils.py	11.94%	118 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #755      +/-   ##
==========================================
- Coverage   74.66%   73.16%   -1.50%     
==========================================
  Files         192      193       +1     
  Lines       18975    19346     +371     
==========================================
- Hits        14167    14154      -13     
- Misses       4808     5192     +384

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Zhiyu Cheng <[email protected]>

…for Nemotron-VLM-Dataset-v2 Signed-off-by: Zhiyu Cheng <[email protected]>

Signed-off-by: Zhiyu Cheng <[email protected]>

shengliangxu · 2026-01-14T01:34:44Z

So, we only support image quantization for just nemotron-vl? If yes, why?

cjluo-nv · 2026-01-14T16:58:49Z

examples/llm_ptq/hf_ptq.py

        type=str,
        default=None,
    )
+    parser.add_argument(


are all these added flags necessary? why cannot we just use calib dataset instead?

cjluo-nv · 2026-01-14T17:00:12Z

modelopt/torch/utils/vlm_dataset_utils.py

 # limitations under the License.

-"""Utility functions for getting samples and forward loop function for different vlm datasets."""
+"""Utility functions for getting samples and dataloader for different VLM calibration datasets.


@ajrasane could you review this change?

cjluo-nv · 2026-01-14T17:00:58Z

examples/llm_ptq/README.md

+  --trust_remote_code \
+  --calib_with_images \
+  --vlm_dataset nemotron_vlm_dataset_v2 \
+  --vlm_subsets sparsetables,plotqa_cot,wiki_en \


so far for LLM we just make these default without introducing the flag. Curious to know if we can follow the same convention and reduce the flags.

cjluo-nv · 2026-01-14T17:01:22Z

examples/llm_ptq/README.md

+  --qformat nvfp4 \
+  --export_path <quantized_ckpt_path> \
+  --trust_remote_code \
+  --calib_with_images \


Can this be the default for VLM?

cjluo-nv · 2026-01-14T17:02:44Z

@Edwardf0t1 do you have experiments evaluating the accuracy impact of using the new dataset?

meenchen · 2026-01-14T17:35:37Z

examples/llm_ptq/README.md

+  --qformat nvfp4 \
+  --export_path <quantized_ckpt_path> \
+  --trust_remote_code \
+  --calib_with_images \


Is there a way to check if the model supports an image as input and enable this automatically?

meenchen · 2026-01-14T17:53:09Z

examples/llm_ptq/hf_ptq.py

+
+        # Some Nemotron tokenizers may not define pad_token by default; but we use padding=True during calibration.
+        if tokenizer.pad_token is None:
+            tokenizer.pad_token = tokenizer.eos_token


Do we need to reverse this change before saving the tokenizer here?

Model-Optimizer/examples/llm_ptq/hf_ptq.py

Line 549 in 951c6aa

tokenizer.save_pretrained(export_path)

Edwardf0t1 added 2 commits January 9, 2026 01:21

Add support for VLM calibration with image-text pair data

b9acc43

Signed-off-by: Zhiyu Cheng <[email protected]>

Add support for VLM calibration with image-text pair data

528b51d

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 added 13 commits January 9, 2026 18:09

add support for sampling from Nemotron-VLM-Dataset-v2

3ef4b9d

Signed-off-by: Zhiyu Cheng <[email protected]>

update readme

2d60f98

Signed-off-by: Zhiyu Cheng <[email protected]>

fix issues when calibrate with image data for Nemotron Nano VL

42a8406

Signed-off-by: Zhiyu Cheng <[email protected]>

fix issues when calibrate with image data for Nemotron Nano VL

7489a36

Signed-off-by: Zhiyu Cheng <[email protected]>

fix issues when calibrate with image data for Nemotron Nano VL

bd87154

Signed-off-by: Zhiyu Cheng <[email protected]>

fix issues when calibrate with image data for Nemotron Nano VL

3200a63

Signed-off-by: Zhiyu Cheng <[email protected]>

simplify

8964aa5

Signed-off-by: Zhiyu Cheng <[email protected]>

refactor to make hf_ptq cleaner, create a separate vlm dataset utils …

3b7373d

…for Nemotron-VLM-Dataset-v2 Signed-off-by: Zhiyu Cheng <[email protected]>

refactor to make hf_ptq cleaner, create a separate vlm dataset utils …

5c774f9

…for Nemotron-VLM-Dataset-v2 Signed-off-by: Zhiyu Cheng <[email protected]>

update readme

f2774fc

Signed-off-by: Zhiyu Cheng <[email protected]>

update readme

59d97a6

Signed-off-by: Zhiyu Cheng <[email protected]>

update readme

e2e59f6

Signed-off-by: Zhiyu Cheng <[email protected]>

minor refactor

2a3868a

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 self-assigned this Jan 14, 2026

Edwardf0t1 marked this pull request as ready for review January 14, 2026 01:16

Edwardf0t1 requested review from a team as code owners January 14, 2026 01:16

Edwardf0t1 requested review from ChenhanYu, ajrasane, cjluo-nv, jingyu-ml, kaix-nv, meenchen, mxinO, realAsma and shengliangxu January 14, 2026 01:16

cjluo-nv reviewed Jan 14, 2026

View reviewed changes

meenchen reviewed Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support VLM calibration with image-text data #755

Support VLM calibration with image-text data #755

Edwardf0t1 commented Jan 9, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 9, 2026

Uh oh!

codecov bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

shengliangxu commented Jan 14, 2026 •

edited

Loading

Uh oh!

cjluo-nv Jan 14, 2026

Uh oh!

realAsma Jan 14, 2026

Uh oh!

cjluo-nv Jan 14, 2026

Uh oh!

cjluo-nv Jan 14, 2026

Uh oh!

cjluo-nv Jan 14, 2026

Uh oh!

cjluo-nv commented Jan 14, 2026

Uh oh!

meenchen Jan 14, 2026

Uh oh!

meenchen Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Support VLM calibration with image-text data #755

Are you sure you want to change the base?

Support VLM calibration with image-text data #755

Conversation

Edwardf0t1 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Jan 9, 2026

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shengliangxu commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjluo-nv Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv commented Jan 14, 2026

Uh oh!

meenchen Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Edwardf0t1 commented Jan 9, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading

shengliangxu commented Jan 14, 2026 •

edited

Loading