Skip to content

Releases: intel/auto-round

v0.9.2 patch release

04 Dec 05:00
v0.9.2

Choose a tag to compare

Remove accelerate version limitation #1090

v0.9.1 patch release

26 Nov 08:15
v0.9.1
6a7dc2b

Choose a tag to compare

Fix installation on ARM devices.

v0.9.0

14 Nov 12:32
v0.9.0
8d8a1cd

Choose a tag to compare

Highlights

What's Changed

Full Changelog: v0.8.0...v0.9.0

v0.8.0

23 Oct 08:53
v0.8.0
cee6ac3

Choose a tag to compare

Highlights

What's Changed

Full Changelog: v0.7.1...v0.8.0

v0.7.1 patch release

23 Sep 04:54
v0.7.1
4d72b45

Choose a tag to compare

fix severe vram leak regression in auto-round format packing @ #842

v0.7.0

10 Sep 09:12
v0.7.0

Choose a tag to compare

🚀 Highlights

  • Enhanced NVFP4 algorithm and added support to export MXFP4/NVFP4 to the llm-compressor format
    by @WeiweiZhang1 and @wenhuach21

  • Improved W2A16 quantization algorithm
    by @wenhuach21

  • Introduced the scheme interface for easier configuration of quantization settings
    by @wenhuach21

  • Added support for using FP8 models as input and str name as model input in API
    by @wenhuach21 and @n1ck-guo

  • Unified device and device_map arguments and introduced device_map="auto"
    to simplify quantization of extremely large models
    by @Kaihui-intel

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0

24 Jul 02:33
v0.6.0
dd95bdb

Choose a tag to compare

Highlights

  • provide experimental support for gguf q*_k format and customized mixed bits setting
  • support xpu in triton backend by @wenhuach21 in #563
  • add torch backend by @WeiweiZhang1 in #555
  • provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1:bug fix release

23 Apr 08:50
v0.5.1
73669aa

Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

22 Apr 08:05
v0.5.0
e90f991

Choose a tag to compare

Highlights

  • refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
  • support xpu in tuning and inference by @wenhuach21 in #481
  • support for more vlms by @n1ck-guo in #390
  • change quantization method name and made several refinements by @wenhuach21 in #500
  • support rtn via iters==0 by @wenhuach21 in #510
  • fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

Full Changelog: v0.4.7...v0.5.0

v0.4.7

01 Apr 09:50

Choose a tag to compare

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

Full Changelog: v0.4.6...v0.4.7