Skip to content

Intel Neural Compressor Release 3.5

Latest
Compare
Choose a tag to compare
@thuang6 thuang6 released this 10 Sep 03:06
· 7 commits to master since this release
v3.5
338f933
  • Highlights
  • Features
  • Improvements
  • Validated Hardware
  • Validated Configurations

Highlights

  • Aligned Gaudi SW Release 1.22 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
  • Preliminary FP8 Q/DQ quantization support for CPU

Features

  • Support FP8 dynamic quant, including Linear, FusedMoE on Gaudi
  • Support per-node FP8 scale method configuration on Gaudi
  • Enhance the stability of the warmup time optimization (dynamic scale patching)
  • Enable FP8 dynamic quantization of DeepSeek V3/R1 model on Gaudi
  • Support activation ordering in GPTQ INT4 on Gaudi
  • Support FP8 Q/DQ quantization on Gaudi when using Optimum-Habana
  • Support FP8 Q/DQ quantization on CPU (experimental)

Improvements

  • Improve saving vLLM compatible FP8 model in multi-cards scenario on Gaudi
  • Support FP32 Softmax mode in FP8 Fused SDPA on Gaudi
  • Support FP8 GaudiFluxPipeline save and load on Gaudi
  • Improve saving Hugging Face format checkpoint for Intel CPU/GPU
  • New diffusion model examples for INT8 and FP8 quantization

Validated Hardware

  • Intel Gaudi Al Accelerators (Gaudi 2 and 3)
  • Intel Xeon Scalable processor (4th, 5th, 6th Gen)
  • Intel Core Ultra Processors (Series 1 and 2)
  • Intel Data Center GPU Max Series (1550)
  • Intel® Arc™ B-Series Graphics GPU (B580)

Validated Configurations

  • Centos 8.4 & Ubuntu 24.04 & Win 11
  • Python 3.9, 3.10, 3.11, 3.12
  • PyTorch/IPEX 2.6, 2.7