Skip to content

Conversation

@Edwardf0t1
Copy link
Contributor

@Edwardf0t1 Edwardf0t1 commented Jan 9, 2026

What does this PR do?

Type of change: New feature

Overview:

The primary goal of this PR is to allow the model optimizer to use image-text pair data during the calibration phase of quantization, which is likely help improve accuracy of quantized VLMs like Nemotron VL on visual understanding tasks particularly, compared to text-only calibration data.

  • New Feature: Adds support for VLM calibration specifically using image-text data.
  • Dataset Integration: Introduces support for sampling from the Nemotron-VLM-Dataset-v2.
  • Refactoring: Created a separate utility for VLM datasets to keep the main Hugging Face PTQ script (hf_ptq.py) clean.
  • Simplified logic for handling multimodal inputs.
  • Addressed specific issues encountered when calibrating the Nemotron-Nano-VL-12B-V2 model with image data.
  • Documentation: Updated the README to include instructions and examples for VLM calibration.

This PR complements #347 and we will consolidate llm_ptq and vlm_ptq examples in follow-up PRs.

Usage

python3 hf_ptq.py   --pyt_ckpt_path /home/scratch.omniml_data_2/models/Nemotron-Nano-VL-12B-V2   --qformat nvfp4   --export_path /home/omniml_data_3/zhiyuc/checkpoints/Nemotron-Nano-VL-12B-V2-NVFP4-doccalib   --trust_remote_code   --kv_cache_qformat none --calib_with_images   --vlm_dataset nemotron_vlm_dataset_v2   --vlm_subsets sparsetables,plotqa_cot   --calib_size 512

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Not yet

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 9.84615% with 293 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.16%. Comparing base (307fe71) to head (2a3868a).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/utils/vlm_dataset_utils.py 8.37% 175 Missing ⚠️
modelopt/torch/utils/nemotron_vlm_dataset_utils.py 11.94% 118 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #755      +/-   ##
==========================================
- Coverage   74.66%   73.16%   -1.50%     
==========================================
  Files         192      193       +1     
  Lines       18975    19346     +371     
==========================================
- Hits        14167    14154      -13     
- Misses       4808     5192     +384     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Edwardf0t1 Edwardf0t1 self-assigned this Jan 14, 2026
@Edwardf0t1 Edwardf0t1 marked this pull request as ready for review January 14, 2026 01:16
@Edwardf0t1 Edwardf0t1 requested review from a team as code owners January 14, 2026 01:16
@shengliangxu
Copy link
Contributor

shengliangxu commented Jan 14, 2026

So, we only support image quantization for just nemotron-vl? If yes, why?

type=str,
default=None,
)
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all these added flags necessary? why cannot we just use calib dataset instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

# limitations under the License.

"""Utility functions for getting samples and forward loop function for different vlm datasets."""
"""Utility functions for getting samples and dataloader for different VLM calibration datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane could you review this change?

--trust_remote_code \
--calib_with_images \
--vlm_dataset nemotron_vlm_dataset_v2 \
--vlm_subsets sparsetables,plotqa_cot,wiki_en \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far for LLM we just make these default without introducing the flag. Curious to know if we can follow the same convention and reduce the flags.

--qformat nvfp4 \
--export_path <quantized_ckpt_path> \
--trust_remote_code \
--calib_with_images \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be the default for VLM?

@cjluo-nv
Copy link
Collaborator

@Edwardf0t1 do you have experiments evaluating the accuracy impact of using the new dataset?

--qformat nvfp4 \
--export_path <quantized_ckpt_path> \
--trust_remote_code \
--calib_with_images \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check if the model supports an image as input and enable this automatically?


# Some Nemotron tokenizers may not define pad_token by default; but we use padding=True during calibration.
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reverse this change before saving the tokenizer here?

tokenizer.save_pretrained(export_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants