[Torch FX] Compress PT2E Support #3663

anzr299 · 2025-09-22T14:43:32Z

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

WC Conformance Test #167: https://github.com/openvinotoolkit/nncf/actions/runs/19372182852 - Pass

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

…match in signatures in prepare_pt2e.

…_algo

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py

tests/executorch/__init__.py

tests/executorch/test_quantizer.py

tests/executorch/observers.py

daniil-lyakhov

Can I see the PR with OpenVINOQuantizer?

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

daniil-lyakhov · 2025-09-23T15:29:53Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+    ) -> torch.fx.GraphModule:
+        self._quantizer = quantizer


typehints an docstring are missing

src/nncf/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

ljaljushkin

Awesome feature!

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

tests/executorch/test_quantizer_compression.py

nikita-savelyevv

Please run https://github.com/openvinotoolkit/nncf/actions/workflows/conformance_weight_compression.yml and attach the link to the PR description.

daniil-lyakhov

Almost there, good job!

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

src/nncf/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

src/nncf/experimental/quantization/quantizer.py

src/nncf/experimental/torch/fx/quantization/quantizer/torch_ao_adapter.py

tests/executorch/requirements.txt

tests/executorch/test_quantizer_compression.py

daniil-lyakhov

LGTM, minor

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py

AlexanderDokuchaev · 2025-11-20T20:57:25Z

.github/workflows/call_precommit.yml

+      - name: Run PyTorch precommit test scope
+        run: |
+          make test-executorch
+        env:
+          NUM_WORKERS: 4


Suggested change

- name: Run PyTorch precommit test scope

run: |

make test-executorch

env:

NUM_WORKERS: 4

- name: Run PyTorch precommit test scope

run: |

pytest -ra tests/executorch

NUM_WORKERS do nothing now
Better write command directly here without make file and please keep same pytest arguments

AlexanderDokuchaev · 2025-11-21T10:43:01Z

tests/executorch/requirements.txt

+--extra-index-url https://download.pytorch.org/whl/nightly/cpu
+
+# Pytorch
+torch==2.10.0.dev20250922+cpu


PyTorch nightly builds are retained for 60 days, so the 0922 build is no longer available

Installation executorch falls with latest nightly builds

AlexanderDokuchaev · 2025-11-21T13:35:03Z

.github/workflows/call_precommit.yml

        env:
          NUM_WORKERS: 4

+  executorch:


call_precommit is currently running weekly tests across multiple Python versions, but Executorch only supports Python 3.10-3.12.

I'm concerned that relying on nightly builds is too unstable, since nightly builds are periodically removed and the custom ExecuTorch commit is not fixed.

I think it would be better to add a new workflow that runs nightly and also runs on pull requests, but only when specific files are modified. This would minimize the impact on unrelated pull requests while still keeping it in experimental.

AlexanderDokuchaev · 2025-11-21T13:41:02Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression
+
+
+class WeightsCompression(Algorithm):


Using same name is confusing

@AlexanderDokuchaev, what are the benefits of such approach? I would suggest to keep it as it is, as at least experimental PTQ done in the same fashion. If we would like to use inheritance here, I suggest to do it for all experimental classes in a separate PR.

This PR is essential for the executorch collab, we can create a ticket and discuss it separately

My suggestion was bad, but still thinking about it

So first, src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py it's backend specific module that used quantizers from torchao.
Looks like it can be better define compress_pt2e and WeightsCompression in src/nncf/experimental/torch/fx/quantization/compress_pt2e.py

Next about implementation of algorithm,
Hierarchy of classes like that, it's bad.

Algorithm |_____ WeightCompression |_____ TorchAOWeightCompression (contains instance of WeightCompression as _algo and use directory methods from _aglo)

My main complaint is that exists copypaste of apply method (except for one line), which looks messy and makes future changes more difficult.

I dont see good option to resolve it and follow SOLID principles.
But in my mind more easy to inhered TorchAOWeightsCompression from WeightCompression with extended quantizer argument and overriding get_weight_compression_parameters and available_backends, .

class TorchAOWeightsCompression(WeightCompression): def __init__( self, mode: CompressWeightsMode, ratio: float, group_size: int, ignored_scope: IgnoredScope, all_layers: bool, sensitivity_metric: SensitivityMetric, awq: bool, subset_size: int, scale_estimation: bool, gptq: bool, lora_correction: bool, backup_mode: BackupMode = BackupMode.INT8_ASYM, compression_format: CompressionFormat = CompressionFormat.DQ, advanced_parameters: Optional[AdvancedCompressionParameters] = None, quantizer: Optional[Quantizer] = None, ): if isinstance(quantizer, None): raise ValueError("Quantizer must be provided for TorchAOWeightsCompression") self._quantizer = quantizer super().__init__( mode=mode, ratio=ratio, group_size=group_size, ignored_scope=None, all_layers=all_layers, sensitivity_metric=sensitivity_metric, awq=awq, subset_size=subset_size, scale_estimation=scale_estimation, gptq=gptq, lora_correction=lora_correction, backup_mode=backup_mode, compression_format=compression_format, advanced_parameters=advanced_parameters, ) def available_backends(self) -> list[BackendType]: return [BackendType.TORCH_FX] def get_weight_compression_parameters(self, model: TModel, graph: NNCFGraph) -> tuple[ list[WeightCompressionParameters], list[WeightCompressionParameters], list[WeightCompressionParameters], ]: return self._quantizer.get_weight_compression_parameters(model, graph) def compress_pt2e(....): wc_config = quantizer.get_weight_compression_config() mode = wc_config.get("mode", CompressWeightsMode.INT8_ASYM) ... algorithm = TorchAOWeightsCompression(mode=mode, ...., quantizer=quantizer) compressed_model = algorithm.apply(transformed_model, nncf_graph, dataset=dataset)

It's not follow LSP, but looks like it more easy to support without refactoring.
Suggest to implement it as class TorchAOWeightsCompression(WeightCompression):. At least it's shorter.

In general about structure of algorithm classes, Algorithm abstract class using as interface, and can be implemented as Protocol.

Got you, Alexander. But I don't get how we would plug in quantizer in the apply method then in your example. Looks like we need to refactor the original class somehow to support version with and without the quantizer.

And regarding the apply method - yes, it is actually a copy-paste, but I don't see a better way to plug in the quantizer. Original WC and the experimental WC are two different algos, and with the composition we can do whatever we want with great flexibility - perhaps some new updates will come when we will test external quantizers.

The approach in this pr - develop an external API for the original WC, so it is possible to plug in the quantizer. Aamir tried several other ways to do it - with the inheritance as well, and in the end we had to redefine a couple methods + refactor the internal code. These 2 methods (inheritance vs composition) are both viable, but we together with Aamir and Nikolay picked the composition and polished this composition method for almost a month, and if we would like to do it in the inheritance way we need to rethink everything we done in this PR

AlexanderDokuchaev · 2025-11-21T13:54:58Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+            scale_estimation=scale_estimation,
+            gptq=gptq,
+            lora_correction=lora_correction,
+            backup_mode=wc_config.get("backup_mode", None),


backup_mode expects BackMode not str or None
Better add logic in init to convert each parameters from wc_config to nncf specific values

AlexanderDokuchaev · 2025-11-21T14:11:00Z

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py

+        lora_correction: bool,
+        sensitivity_metric: SensitivityMetric,
+        compression_format: CompressionFormat,
+        advanced_parameters: AdvancedCompressionParameters,


Suggested change

advanced_parameters: AdvancedCompressionParameters,

advanced_parameters: Optional[AdvancedCompressionParameters] = None,

anzr299 added 3 commits September 22, 2025 17:22

init

190f9d5

fixes

c52fcca

add message for unsupported external quantizers

4e56cb5

anzr299 requested a review from a team as a code owner September 22, 2025 14:43

github-actions bot added the API Public API-impacting changes label Sep 22, 2025

anzr299 marked this pull request as draft September 22, 2025 14:56

daniil-lyakhov self-requested a review September 22, 2025 15:03

daniil-lyakhov reviewed Sep 22, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

anzr299 added 19 commits September 22, 2025 19:27

add algorithm

9651ceb

impotr openvino quantizer from nncf instead of executorch

14daeb5

Add observers and openvino quantizer to nncf

3746815

fix

0815dc5

minor fix

1b8d940

fix

7d35374

fix some more bugs; observers was importing from torchao. causing mis…

427ebc2

…match in signatures in prepare_pt2e.

add compress pt2e to init

24dbfb6

fix quantizer init file. Remove extra code.

4bb8c1a

small fix for the big problem:)

8902842

fix quantizer preset definition

3842538

fix openvino quantizer for ptq. call _algo instead of legacy _min_max…

2e70c2e

…_algo

fix quantizer defaults

b1c9aad

microfix

33fe01c

precommit fix

d8e1006

revert openvino quantizer to old

88a8472

create ovquantizer in executorch dir

7a8e51a

update executorch quantizer location.

fed5052

check if openvino quantizer has weight compression in openvino adapter

2866473

daniil-lyakhov requested changes Sep 23, 2025

View reviewed changes

daniil-lyakhov reviewed Sep 23, 2025

View reviewed changes

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

ljaljushkin approved these changes Nov 14, 2025

View reviewed changes

src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

daniil-lyakhov reviewed Nov 14, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

daniil-lyakhov reviewed Nov 14, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

nikita-savelyevv reviewed Nov 14, 2025

View reviewed changes

anzr299 added 4 commits November 14, 2025 20:08

review changes

33a4b77

review changes

ffd601d

remove extra function

8484f1a

minor fix

5aadf1c

nikita-savelyevv reviewed Nov 14, 2025

View reviewed changes

daniil-lyakhov requested changes Nov 14, 2025

View reviewed changes

anzr299 added 5 commits November 17, 2025 12:19

remove private var assignments from experimeental WC algo init

f93eed2

review changes

a5bb632

add description for MP and validation methods in algo

5665bed

review changes

ee86a20

update docstring

b31a1ab

daniil-lyakhov requested changes Nov 17, 2025

View reviewed changes

anzr299 added 7 commits November 17, 2025 14:41

fix error

c6557f6

review changes

ee9a2de

review changes

b1fcfa9

remove extra kwarg

6c56b91

fix executorch test

e5ea21b

review changes

7bf3c78

review changes

f2d9968

daniil-lyakhov approved these changes Nov 17, 2025

View reviewed changes

src/nncf/experimental/torch/fx/quantization/quantize_pt2e.py Outdated Show resolved Hide resolved

nikita-savelyevv approved these changes Nov 17, 2025

View reviewed changes

AlexanderDokuchaev requested changes Nov 21, 2025

View reviewed changes

anzr299 and others added 4 commits November 25, 2025 14:28

Merge branch 'develop' into an/fx/compress_pt2e

8c935d4

use pytorch from a specific commit

731d261

fix precommit

d5857b0

micro fix

96c5d3f

		from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression


		class WeightsCompression(Algorithm):

	advanced_parameters: AdvancedCompressionParameters,
	advanced_parameters: Optional[AdvancedCompressionParameters] = None,

[Torch FX] Compress PT2E Support #3663

Are you sure you want to change the base?

[Torch FX] Compress PT2E Support #3663

Uh oh!

Conversation

anzr299 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ljaljushkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikita-savelyevv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderDokuchaev Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anzr299 commented Sep 22, 2025 •

edited

Loading

nikita-savelyevv left a comment •

edited

Loading

AlexanderDokuchaev Nov 21, 2025 •

edited

Loading

daniil-lyakhov Nov 24, 2025 •

edited

Loading