Intel Neural Compressor Release 3.5

Latest

Latest

thuang6 released this 10 Sep 03:06

· 7 commits to master since this release

338f933

Highlights
Features
Improvements
Validated Hardware
Validated Configurations

Highlights

Aligned Gaudi SW Release 1.22 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
Preliminary FP8 Q/DQ quantization support for CPU

Features

Support FP8 dynamic quant, including Linear, FusedMoE on Gaudi
Support per-node FP8 scale method configuration on Gaudi
Enhance the stability of the warmup time optimization (dynamic scale patching)
Enable FP8 dynamic quantization of DeepSeek V3/R1 model on Gaudi
Support activation ordering in GPTQ INT4 on Gaudi
Support FP8 Q/DQ quantization on Gaudi when using Optimum-Habana
Support FP8 Q/DQ quantization on CPU (experimental)

Improvements

Improve saving vLLM compatible FP8 model in multi-cards scenario on Gaudi
Support FP32 Softmax mode in FP8 Fused SDPA on Gaudi
Support FP8 GaudiFluxPipeline save and load on Gaudi
Improve saving Hugging Face format checkpoint for Intel CPU/GPU
New diffusion model examples for INT8 and FP8 quantization

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th, 6th Gen)
Intel Core Ultra Processors (Series 1 and 2)
Intel Data Center GPU Max Series (1550)
Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

Centos 8.4 & Ubuntu 24.04 & Win 11
Python 3.9, 3.10, 3.11, 3.12
PyTorch/IPEX 2.6, 2.7

Assets 2