- Highlights
- Features
- Improvements
- Validated Hardware
- Validated Configurations
Highlights
- Aligned Gaudi SW Release 1.22 with the improvements on FP8 and INT4 quantization for Intel® Gaudi® AI accelerator
- Preliminary FP8 Q/DQ quantization support for CPU
Features
- Support FP8 dynamic quant, including Linear, FusedMoE on Gaudi
- Support per-node FP8 scale method configuration on Gaudi
- Enhance the stability of the warmup time optimization (dynamic scale patching)
- Enable FP8 dynamic quantization of DeepSeek V3/R1 model on Gaudi
- Support activation ordering in GPTQ INT4 on Gaudi
- Support FP8 Q/DQ quantization on Gaudi when using Optimum-Habana
- Support FP8 Q/DQ quantization on CPU (experimental)
Improvements
- Improve saving vLLM compatible FP8 model in multi-cards scenario on Gaudi
- Support FP32 Softmax mode in FP8 Fused SDPA on Gaudi
- Support FP8 GaudiFluxPipeline save and load on Gaudi
- Improve saving Hugging Face format checkpoint for Intel CPU/GPU
- New diffusion model examples for INT8 and FP8 quantization
Validated Hardware
- Intel Gaudi Al Accelerators (Gaudi 2 and 3)
- Intel Xeon Scalable processor (4th, 5th, 6th Gen)
- Intel Core Ultra Processors (Series 1 and 2)
- Intel Data Center GPU Max Series (1550)
- Intel® Arc™ B-Series Graphics GPU (B580)
Validated Configurations
- Centos 8.4 & Ubuntu 24.04 & Win 11
- Python 3.9, 3.10, 3.11, 3.12
- PyTorch/IPEX 2.6, 2.7