-
Notifications
You must be signed in to change notification settings - Fork 269
[Torch FX] Compress PT2E Support #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…match in signatures in prepare_pt2e.
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py
Outdated
Show resolved
Hide resolved
daniil-lyakhov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I see the PR with OpenVINOQuantizer?
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
| ) -> torch.fx.GraphModule: | ||
| self._quantizer = quantizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehints an docstring are missing
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
ljaljushkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome feature!
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run https://github.com/openvinotoolkit/nncf/actions/workflows/conformance_weight_compression.yml and attach the link to the PR description.
daniil-lyakhov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, good job!
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/torch/fx/quantization/quantizer/torch_ao_adapter.py
Outdated
Show resolved
Hide resolved
daniil-lyakhov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor
| - name: Run PyTorch precommit test scope | ||
| run: | | ||
| make test-executorch | ||
| env: | ||
| NUM_WORKERS: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - name: Run PyTorch precommit test scope | |
| run: | | |
| make test-executorch | |
| env: | |
| NUM_WORKERS: 4 | |
| - name: Run PyTorch precommit test scope | |
| run: | | |
| pytest -ra tests/executorch |
NUM_WORKERS do nothing now
Better write command directly here without make file and please keep same pytest arguments
tests/executorch/requirements.txt
Outdated
| --extra-index-url https://download.pytorch.org/whl/nightly/cpu | ||
|
|
||
| # Pytorch | ||
| torch==2.10.0.dev20250922+cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch nightly builds are retained for 60 days, so the 0922 build is no longer available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installation executorch falls with latest nightly builds
| env: | ||
| NUM_WORKERS: 4 | ||
|
|
||
| executorch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call_precommit is currently running weekly tests across multiple Python versions, but Executorch only supports Python 3.10-3.12.
I'm concerned that relying on nightly builds is too unstable, since nightly builds are periodically removed and the custom ExecuTorch commit is not fixed.
I think it would be better to add a new workflow that runs nightly and also runs on pull requests, but only when specific files are modified. This would minimize the impact on unrelated pull requests while still keeping it in experimental.
| from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression | ||
|
|
||
|
|
||
| class WeightsCompression(Algorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using same name is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexanderDokuchaev, what are the benefits of such approach? I would suggest to keep it as it is, as at least experimental PTQ done in the same fashion. If we would like to use inheritance here, I suggest to do it for all experimental classes in a separate PR.
This PR is essential for the executorch collab, we can create a ticket and discuss it separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was bad, but still thinking about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So first, src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py it's backend specific module that used quantizers from torchao.
Looks like it can be better define compress_pt2e and WeightsCompression in src/nncf/experimental/torch/fx/quantization/compress_pt2e.py
Next about implementation of algorithm,
Hierarchy of classes like that, it's bad.
Algorithm
|_____ WeightCompression
|_____ TorchAOWeightCompression (contains instance of WeightCompression as _algo and use directory methods from _aglo)
My main complaint is that exists copypaste of apply method (except for one line), which looks messy and makes future changes more difficult.
I dont see good option to resolve it and follow SOLID principles.
But in my mind more easy to inhered TorchAOWeightsCompression from WeightCompression with extended quantizer argument and overriding get_weight_compression_parameters and available_backends, .
class TorchAOWeightsCompression(WeightCompression):
def __init__(
self,
mode: CompressWeightsMode,
ratio: float,
group_size: int,
ignored_scope: IgnoredScope,
all_layers: bool,
sensitivity_metric: SensitivityMetric,
awq: bool,
subset_size: int,
scale_estimation: bool,
gptq: bool,
lora_correction: bool,
backup_mode: BackupMode = BackupMode.INT8_ASYM,
compression_format: CompressionFormat = CompressionFormat.DQ,
advanced_parameters: Optional[AdvancedCompressionParameters] = None,
quantizer: Optional[Quantizer] = None,
):
if isinstance(quantizer, None):
raise ValueError("Quantizer must be provided for TorchAOWeightsCompression")
self._quantizer = quantizer
super().__init__(
mode=mode,
ratio=ratio,
group_size=group_size,
ignored_scope=None,
all_layers=all_layers,
sensitivity_metric=sensitivity_metric,
awq=awq,
subset_size=subset_size,
scale_estimation=scale_estimation,
gptq=gptq,
lora_correction=lora_correction,
backup_mode=backup_mode,
compression_format=compression_format,
advanced_parameters=advanced_parameters,
)
def available_backends(self) -> list[BackendType]:
return [BackendType.TORCH_FX]
def get_weight_compression_parameters(self, model: TModel, graph: NNCFGraph) -> tuple[
list[WeightCompressionParameters],
list[WeightCompressionParameters],
list[WeightCompressionParameters],
]:
return self._quantizer.get_weight_compression_parameters(model, graph)
def compress_pt2e(....):
wc_config = quantizer.get_weight_compression_config()
mode = wc_config.get("mode", CompressWeightsMode.INT8_ASYM)
...
algorithm = TorchAOWeightsCompression(mode=mode, ...., quantizer=quantizer)
compressed_model = algorithm.apply(transformed_model, nncf_graph, dataset=dataset)It's not follow LSP, but looks like it more easy to support without refactoring.
Suggest to implement it as class TorchAOWeightsCompression(WeightCompression):. At least it's shorter.
In general about structure of algorithm classes, Algorithm abstract class using as interface, and can be implemented as Protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you, Alexander. But I don't get how we would plug in quantizer in the apply method then in your example. Looks like we need to refactor the original class somehow to support version with and without the quantizer.
And regarding the apply method - yes, it is actually a copy-paste, but I don't see a better way to plug in the quantizer. Original WC and the experimental WC are two different algos, and with the composition we can do whatever we want with great flexibility - perhaps some new updates will come when we will test external quantizers.
The approach in this pr - develop an external API for the original WC, so it is possible to plug in the quantizer. Aamir tried several other ways to do it - with the inheritance as well, and in the end we had to redefine a couple methods + refactor the internal code. These 2 methods (inheritance vs composition) are both viable, but we together with Aamir and Nikolay picked the composition and polished this composition method for almost a month, and if we would like to do it in the inheritance way we need to rethink everything we done in this PR
| scale_estimation=scale_estimation, | ||
| gptq=gptq, | ||
| lora_correction=lora_correction, | ||
| backup_mode=wc_config.get("backup_mode", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backup_mode expects BackMode not str or None
Better add logic in init to convert each parameters from wc_config to nncf specific values
| lora_correction: bool, | ||
| sensitivity_metric: SensitivityMetric, | ||
| compression_format: CompressionFormat, | ||
| advanced_parameters: AdvancedCompressionParameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| advanced_parameters: AdvancedCompressionParameters, | |
| advanced_parameters: Optional[AdvancedCompressionParameters] = None, |
Changes
Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.
Reason for changes
To support Quantizers defined in torch ao.
Related tickets
169342
WC Conformance Test #167: https://github.com/openvinotoolkit/nncf/actions/runs/19372182852 - Pass