Skip to content

Conversation

@anzr299
Copy link
Collaborator

@anzr299 anzr299 commented Sep 22, 2025

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

WC Conformance Test #167: https://github.com/openvinotoolkit/nncf/actions/runs/19372182852 - Pass

@anzr299 anzr299 requested a review from a team as a code owner September 22, 2025 14:43
@github-actions github-actions bot added the API Public API-impacting changes label Sep 22, 2025
@anzr299 anzr299 marked this pull request as draft September 22, 2025 14:56
@daniil-lyakhov daniil-lyakhov self-requested a review September 22, 2025 15:03
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I see the PR with OpenVINOQuantizer?

Comment on lines 34 to 35
) -> torch.fx.GraphModule:
self._quantizer = quantizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints an docstring are missing

Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome feature!

Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, good job!

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor

Comment on lines +175 to +179
- name: Run PyTorch precommit test scope
run: |
make test-executorch
env:
NUM_WORKERS: 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Run PyTorch precommit test scope
run: |
make test-executorch
env:
NUM_WORKERS: 4
- name: Run PyTorch precommit test scope
run: |
pytest -ra tests/executorch

NUM_WORKERS do nothing now
Better write command directly here without make file and please keep same pytest arguments

--extra-index-url https://download.pytorch.org/whl/nightly/cpu

# Pytorch
torch==2.10.0.dev20250922+cpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch nightly builds are retained for 60 days, so the 0922 build is no longer available

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installation executorch falls with latest nightly builds

env:
NUM_WORKERS: 4

executorch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_precommit is currently running weekly tests across multiple Python versions, but Executorch only supports Python 3.10-3.12.

I'm concerned that relying on nightly builds is too unstable, since nightly builds are periodically removed and the custom ExecuTorch commit is not fixed.

I think it would be better to add a new workflow that runs nightly and also runs on pull requests, but only when specific files are modified. This would minimize the impact on unrelated pull requests while still keeping it in experimental.

from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression


class WeightsCompression(Algorithm):
Copy link
Collaborator

@AlexanderDokuchaev AlexanderDokuchaev Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using same name is confusing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexanderDokuchaev, what are the benefits of such approach? I would suggest to keep it as it is, as at least experimental PTQ done in the same fashion. If we would like to use inheritance here, I suggest to do it for all experimental classes in a separate PR.

This PR is essential for the executorch collab, we can create a ticket and discuss it separately

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was bad, but still thinking about it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So first, src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py it's backend specific module that used quantizers from torchao.
Looks like it can be better define compress_pt2e and WeightsCompression in src/nncf/experimental/torch/fx/quantization/compress_pt2e.py

Next about implementation of algorithm,
Hierarchy of classes like that, it's bad.

Algorithm
|_____ WeightCompression
|_____ TorchAOWeightCompression (contains instance of WeightCompression as _algo and use directory methods from _aglo)  

My main complaint is that exists copypaste of apply method (except for one line), which looks messy and makes future changes more difficult.

I dont see good option to resolve it and follow SOLID principles.
But in my mind more easy to inhered TorchAOWeightsCompression from WeightCompression with extended quantizer argument and overriding get_weight_compression_parameters and available_backends, .

class TorchAOWeightsCompression(WeightCompression):

    def __init__(
        self,
        mode: CompressWeightsMode,
        ratio: float,
        group_size: int,
        ignored_scope: IgnoredScope,
        all_layers: bool,
        sensitivity_metric: SensitivityMetric,
        awq: bool,
        subset_size: int,
        scale_estimation: bool,
        gptq: bool,
        lora_correction: bool,
        backup_mode: BackupMode = BackupMode.INT8_ASYM,
        compression_format: CompressionFormat = CompressionFormat.DQ,
        advanced_parameters: Optional[AdvancedCompressionParameters] = None,
        quantizer: Optional[Quantizer] = None,
    ):
        if isinstance(quantizer, None):
            raise ValueError("Quantizer must be provided for TorchAOWeightsCompression")
        self._quantizer = quantizer
        
        super().__init__(
            mode=mode,
            ratio=ratio,
            group_size=group_size,
            ignored_scope=None,
            all_layers=all_layers,
            sensitivity_metric=sensitivity_metric,
            awq=awq,
            subset_size=subset_size,
            scale_estimation=scale_estimation,
            gptq=gptq,
            lora_correction=lora_correction,
            backup_mode=backup_mode,
            compression_format=compression_format,
            advanced_parameters=advanced_parameters,
        )

    def available_backends(self) -> list[BackendType]:
        return [BackendType.TORCH_FX]

    def get_weight_compression_parameters(self, model: TModel, graph: NNCFGraph) -> tuple[
        list[WeightCompressionParameters],
        list[WeightCompressionParameters],
        list[WeightCompressionParameters],
    ]:
        return self._quantizer.get_weight_compression_parameters(model, graph)

def compress_pt2e(....):
         wc_config = quantizer.get_weight_compression_config()
         mode = wc_config.get("mode", CompressWeightsMode.INT8_ASYM)
         ...
         algorithm = TorchAOWeightsCompression(mode=mode, ...., quantizer=quantizer)
         compressed_model = algorithm.apply(transformed_model, nncf_graph, dataset=dataset)

It's not follow LSP, but looks like it more easy to support without refactoring.
Suggest to implement it as class TorchAOWeightsCompression(WeightCompression):. At least it's shorter.

In general about structure of algorithm classes, Algorithm abstract class using as interface, and can be implemented as Protocol.

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you, Alexander. But I don't get how we would plug in quantizer in the apply method then in your example. Looks like we need to refactor the original class somehow to support version with and without the quantizer.

And regarding the apply method - yes, it is actually a copy-paste, but I don't see a better way to plug in the quantizer. Original WC and the experimental WC are two different algos, and with the composition we can do whatever we want with great flexibility - perhaps some new updates will come when we will test external quantizers.

The approach in this pr - develop an external API for the original WC, so it is possible to plug in the quantizer. Aamir tried several other ways to do it - with the inheritance as well, and in the end we had to redefine a couple methods + refactor the internal code. These 2 methods (inheritance vs composition) are both viable, but we together with Aamir and Nikolay picked the composition and polished this composition method for almost a month, and if we would like to do it in the inheritance way we need to rethink everything we done in this PR

scale_estimation=scale_estimation,
gptq=gptq,
lora_correction=lora_correction,
backup_mode=wc_config.get("backup_mode", None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup_mode expects BackMode not str or None
Better add logic in init to convert each parameters from wc_config to nncf specific values

lora_correction: bool,
sensitivity_metric: SensitivityMetric,
compression_format: CompressionFormat,
advanced_parameters: AdvancedCompressionParameters,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
advanced_parameters: AdvancedCompressionParameters,
advanced_parameters: Optional[AdvancedCompressionParameters] = None,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF PT Pull requests that updates NNCF PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants