Releases · intel/auto-round

Fix rtn tuning_device issue by @Kaihui-intel in #893
fix vlm gguf ut by @n1ck-guo in #895
update alg_ext.abi3.so with python compatible version by @chensuyue in #894
move ste from quant to round for nvfp4 by @xin3he in #889
Add GPT-OSS quant support by @yiliu30 in #887
better help printing information by @n1ck-guo in #883
speedup quant and evaluation, fix recompile issue by @xin3he in #897
fix nvfp act quantization bug by @WeiweiZhang1 in #891
support automatic mixed bits assignment by @wenhuach21 in #851
try to fix gguf vram issue on windows by @wenhuach21 in #886
remove numba from requirments by @yiliu30 in #905
Extend mxfp loading dtypes by @yiliu30 in #907
block dataset logger info by @n1ck-guo in #908
fix torch compile issue in AutoScheme by @wenhuach21 in #909
Revert "Extend mxfp loading dtypes" by @wenhuach21 in #915
support disable_opt_rtn in auto-scheme by @wenhuach21 in #913
fix llama 4 ut by @n1ck-guo in #896
Add numba for cpu lib by @yiliu30 in #919
Loosen the packing restrictions for mxfp&nvfp by @WeiweiZhang1 in #911
Extend mxfp loading dtypes by @yiliu30 in #916
Fix act config exporting for mixed schemes by @WeiweiZhang1 in #903
optimize rtn for int woq by @wenhuach21 in #924
fix bug of gguf and support for LiquidAI/LFM2-1.2B by @n1ck-guo in #927
remove numpy<2.0 limitation by @xin3he in #921
enable regex quantization config saving for mixed bits by @WeiweiZhang1 in #825
Fix Flux tuning issue by @mengniwang95 in #936
gguf support for inclusionAI/Ling-flash-2.0 by @n1ck-guo in #940
remove low_cpu_mem by @n1ck-guo in #934
Add compatibility test by @XuehaoSun in #918
Add commit hash to version by @XuehaoSun in #941
gguf weight type align with original, output.weight, token_embed by @n1ck-guo in #900
support attention mask in user's dataset by @wenhuach21 in #930
Add diffusion README by @mengniwang95 in #923
update readme by @wenhuach21 in #949
refactor utils file by @n1ck-guo in #943
update readme for sglang support by @WeiweiZhang1 in #953
update gguf and support for CompressedLinear by @n1ck-guo in #950
Reduce AutoSchem VRAM usage by up to 10X by @wenhuach21 in #944
add self attribution and fix avg_bits error by @xin3he in #956
add logo by @wenhuach21 in #960
refine AutoScheme readme/code by @wenhuach21 in #958
update readme by @wenhuach21 in #962
fix critic disable_opt_rtn regression by @wenhuach21 in #963
[1/N] Initial vllm-ext evaluation support (MXFP4 MOE) by @yiliu30 in #935
fix bug of imatrix contains 0 by @n1ck-guo in #955
fix rtn bug by @mengniwang95 in #966
enhance flux doc by @mengniwang95 in #967
clean code by @wenhuach21 in #968
support for model scope by @n1ck-guo in #957
merge main branch to alg_ext by @wenhuach21 in #970
fix cuda CI backend issue, fixtypo by @WeiweiZhang1 in #974
disable compile packing by default by @yiliu30 in #975
enhance auto device map and support XPU by @xin3he in #961
refine readme by @wenhuach21 in #978
cli support for positional arguments model by @n1ck-guo in #979
update bits in UT by @xin3he in #986
fix guff scheme and device_map bug by @n1ck-guo in #969
add support for Magistral-Small by @n1ck-guo in #980
support model_dtype and fix bug of scheme contains quotes, mllm eval by @n1ck-guo in #985
fix bug of cannot create adam compressor by @n1ck-guo in #992
[CI] Update python to 3.12 and torch to 2.8.0 by @XuehaoSun in #741
fix lm head bug and rm clear_mem_reach_threhold by @wenhuach21 in #997
Reduce peak gpu memory usage and support moe estimation by @xin3he in #981
fix cuda ut bug by @n1ck-guo in #999
fix mllm device_map ut by @Kaihui-intel in #1000
refine md tables by @WeiweiZhang1 in #994
Refine exllamav2 ut by @WeiweiZhang1 in #1001
Support for immediate saving to reduce ram usage by @Kaihui-intel in #965
Fix diffusion multi-device ut issue by @mengniwang95 in #1002
fix multiple devices map issue in calibration by @wenhuach21 in #1003
Fix non auto device map by @WeiweiZhang1 in #1005
fix multiple devices issue in Compressor and AutoScheme by @wenhuach21 in #1007
fix cuda low_cpu_mem_usage ut by @Kaihui-intel in #1010
Fix param missing bug by @mengniwang95 in #1008
add device list to clear memory by @wenhuach21 in #1009
Minor refactor for LLMC by @yiliu30 in #993
fix one clear memory issue by @wenhuach21 in #1011
add ut for gguf alg_ext and update so file by @n1ck-guo in #1012
fix multi cuda ut bug by @n1ck-guo in #1014
Including auto_scheme.default_alg into pypi by @chensuyue in #1018
add num_device check for set_auto_device_map_for_block_with_tuning by @xin3he in #1021
dispatch model with real max memory by @xin3he in #1022
fix cuda ut by @n1ck-guo in #1020
disable itrex format first by @WeiweiZhang1 in #998
fix bug of lm_head and dispatch model,gguf eval by @n1ck-guo in #1025
Fix the missing temporary name by @yiliu30 in #1029
Reduce mem usage of GPT-OSS by @yiliu30 in #1013
update gguf alg ext by @n1ck-guo in #1026
optimize vram for gguf and add momentum by @wenhuach21 in #1031
fix incorrect model name in readme by @wenhuach21 in #1035
Bump into v0.9.0 by @XuehaoSun in #1024

Full Changelog: v0.8.0...v0.9.0

Contributors

chensuyue, mengniwang95, and 7 other contributors

Assets 2

23 Oct 08:53

wenhuach21

v0.8.0

cee6ac3

v0.8.0

Highlights

merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
MXFP4 and MXFP8 loading support by @yiliu30 in #832
Support Flux quantization by @mengniwang95 in #850

What's Changed

fix cuda ut bug of use_deterministic_algorithms by @n1ck-guo in #805
remove torch compile in nv quant by @wenhuach21 in #807
Support loading for static quant weight fp8 act fp8 by @yiliu30 in #730
fix bug of q_layer_inputs by @n1ck-guo in #811
fix gptqmodel inference issue by @wenhuach21 in #813
Bump version to v0.7.0 by @XuehaoSun in #814
fix nsamples in get_dataloader by @wenhuach21 in #804
Refine logger and add envs by @yiliu30 in #817
Fix llm-compressor export by @Kaihui-intel in #820
enhance auto-round eval with vllm backend by @xin3he in #815
rm triton from requirements and correct the supported python version to 3.10(+) by @wenhuach21 in #824
move environment variable setting into eval function by @xin3he in #829
bump version to 0.8.0.dev by @XuehaoSun in #830
[STEP 1] merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
add support for scheme FP8_STATIC to export llm_compressor format by @n1ck-guo in #816
fix format checking bug by @WeiweiZhang1 in #836
MXFP4 and MXFP8 loading support by @yiliu30 in #832
hpu build with auto_round package name by @chensuyue in #838
fix hpu detect issue by @xin3he in #823
fix severe vram leak regression in auto-round format packing by @wenhuach21 in #842
fix tp device issue caused by device_map by @xin3he in #833
fix log error by @n1ck-guo in #843
[High Risk]Refine inference code by @wenhuach21 in #840
fix gguf fp8 input model and vram issue by @wenhuach21 in #844
NVFP4 Loading support by @yiliu30 in #839
fix extra config by @n1ck-guo in #847
change the method of detecting linear by @n1ck-guo in #849
fix device_map setting by @Kaihui-intel in #854
Add typo checker by @XuehaoSun in #846
fix parse layer config bug by @wenhuach21 in #856
Refine BackendInfo to include act fields by @yiliu30 in #848
fix bug of data_type fp8_sym by @n1ck-guo in #855
fix save_quantied format cheaker by @WeiweiZhang1 in #857
fix bug of get_layer_names_in_block by @wenhuach21 in #861
raise vlm loading error by @wenhuach21 in #863
fix FP8 model as input and backend issue by @wenhuach21 in #864
fix seqlen bug and calib slow of mllm tuning by @n1ck-guo in #871
fix device bug by @xin3he in #873
fix vllm backend evaluation by @xin3he in #872
Optimize CPU unit test workflow by @XuehaoSun in #881
Fix Cuda CI failures due to Transformers and AWQ incompatibility by @WeiweiZhang1 in #882
Support Flux quantization by @mengniwang95 in #850
fp8 exporting bugfix by @WeiweiZhang1 in #874
lm_eval stop try except and add back missing arguments by @xin3he in #884
Fix act calibration bug by @mengniwang95 in #880
restrict accelerate version by @wenhuach21 in #885
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #868
update require accelerate version by @n1ck-guo in #888

Full Changelog: v0.7.1...v0.8.0

Contributors

chensuyue, pre-commit-ci, and 8 other contributors

Assets 2

23 Sep 04:54

wenhuach21

v0.7.1

4d72b45

v0.7.1 patch release

fix severe vram leak regression in auto-round format packing @ #842

Assets 2

10 Sep 09:12

wenhuach21

v0.7.0

072cb8b

v0.7.0

🚀 Highlights

Enhanced NVFP4 algorithm and added support to export MXFP4/NVFP4 to the llm-compressor format
by @WeiweiZhang1 and @wenhuach21
Improved W2A16 quantization algorithm
by @wenhuach21
Introduced the scheme interface for easier configuration of quantization settings
by @wenhuach21
Added support for using FP8 models as input and str name as model input in API
by @wenhuach21 and @n1ck-guo
Unified device and device_map arguments and introduced device_map="auto"
to simplify quantization of extremely large models
by @Kaihui-intel

What's Changed

fix ut import issue by @WeiweiZhang1 in #686
support to export static afp8 model by @n1ck-guo in #662
Add ruff and isort by @XuehaoSun in #578
Improved log message for unsupported dataset by @wenhuach21 in #688
support rceil for mxfp by @wenhuach21 in #660
Add black and blacken-docs in pre-commit by @XuehaoSun in #692
support static global scale for nvfp4 and update readme by @wenhuach21 in #691
Update readme by @wenhuach21 in #695
Add script for cuda unit test by @XuehaoSun in #567
support to save image_processor by @n1ck-guo in #694
support for static activation quantization calibration with group_size by @n1ck-guo in #693
fix xpu oom checker by @n1ck-guo in #705
FIXBUG: CPU Offloading for Cache Blocks in Low-Memory GPU Systems or Single GPU on ROCM Configs by @JartX in #703
fix bug of zero accuracy for mx-fp by @n1ck-guo in #709
catch oom error and move to cpu directly by @n1ck-guo in #708
code optimization of vlm by @n1ck-guo in #704
fix critic bug of gguf tuning by @wenhuach21 in #710
support fp8 model and str as input in llm quantization by @wenhuach21 in #699
change act_scale to input_scale for fp8 export by @n1ck-guo in #711
simply CpuInfo class by @wenhuach21 in #715
Update step_by_step.md by @wenhuach21 in #717
fix bug of activation quant when act_max is None by @n1ck-guo in #718
Bump transformers in /test/test_cuda by @dependabot[bot] in #719
Freeze torchvision version in CI by @XuehaoSun in #720
update autoround mllm and support Mistral 3.2 series by @n1ck-guo in #713
Fix hpu CI by @XuehaoSun in #723
fix fp8 model input issue by @wenhuach21 in #724
update gguf convert.py and support for gpt-oss by @n1ck-guo in #721
new cast_to_nvfp4 with high performance by @xin3he in #727
make the tuning deterministic and move infrequently used arguments to kwargs by @wenhuach21 in #726
add original convert file and support for the newest llama.cpp by @n1ck-guo in #729
fix bug for exporting afp8 fake format by @n1ck-guo in #731
Fix torch_zp infer bug & API disable_deterministic_algorithms bug by @WeiweiZhang1 in #733
fix gguf mistral_common import by @n1ck-guo in #736
Enable mxfp exporting by @WeiweiZhang1 in #649
support for glm4.5 gguf by @n1ck-guo in #735
support auto-round-mllm command by @n1ck-guo in #742
Optimize pack zeros for int sym by @WeiweiZhang1 in #743
fix UT check for int zp by @WeiweiZhang1 in #745
support llama4 quant by @mengniwang95 in #744
fix bug of loading fp8 model by @n1ck-guo in #747
improved algorithm for int2 by @wenhuach21 in #748
Add Static FP8 KV Support by @yiliu30 in #737
refine code by @wenhuach21 in #749
mllm supports loading fp8 model and fix bug of loading fp8 model by @n1ck-guo in #750
support deepspeed LinearLayer and LinearAllreduce by @xin3he in #698
fix alg_ext moe and model str input bug by @wenhuach21 in #751
api support for fp8 model and mllm api support load from str by @n1ck-guo in #752
fix some torch compile warnings by @wenhuach21 in #755
Speedup FP4 packing by @yiliu30 in #760
fix_script_fp_layer_config_for_bits_checking by @WeiweiZhang1 in #756
support quant lm_head for rtn w8afp8 static quant by @n1ck-guo in #754
Revert "Speedup FP4 packing" by @yiliu30 in #763
refine code and fix activation quantization eval regression by @wenhuach21 in #762
fix gguf ut bug by @n1ck-guo in #767
fix gguf bug by int zp by @n1ck-guo in #771
Keep the model’s buffer dtype unchanged in most cases by @wenhuach21 in #770
fix set_layer_config bug by @wenhuach21 in #768
fix bug of auto_round exporting by @n1ck-guo in #772
gguf format supports for fp8 model by @n1ck-guo in #778
[API CHANGE] Stage 1 add quant scheme and consolidate device and device_map by @wenhuach21 in #774
Speedup FP4 packing by @yiliu30 in #766
hot fix for nvfp4 scheme by @wenhuach21 in #784
fix alg_ext regression and support mxfp4 in it with slight improvement by @wenhuach21 in #785
refine nvfp code, typofix by @WeiweiZhang1 in #777
mxfp/nvfp/fp8 support torch compile in tuning by @wenhuach21 in #789
refine nvfp4 algorithm by @wenhuach21 in #790
add limit arg for eval by @n1ck-guo in #764
torch backend bugfix and speedup ut by @WeiweiZhang1 in #793
Support auto device mapping by @Kaihui-intel in #781
fix bug and add nvfp in alg-ext with slight improvement by @wenhuach21 in #794
rename llmcompressor to llm_compressor for align with other formats by @WeiweiZhang1 in #780
align formats packing device to API by @WeiweiZhang1 in #795
add fp8 export format check by @n1ck-guo in #779
fix several regressions including lm-head quantization, 3bit asym torch backend,etc by @wenhuach21 in #796
refine readme by @wenhuach21 in #798
fix typo in readme by @wenhuach21 in #799
fix several cuda ut bug by @n1ck-guo in #797
enable model python files saving by @WeiweiZhang1 in #802
AutoRoundMLLM supports scheme and fix device_map=dict regression by @n1ck-guo in #801
improve the robustness of scheme by @wenhuach21 in #803
fix mxfp exporting by @WeiweiZhang1 in #806

New Contributors

@JartX made their first contribution in #703
@mengniwang95 made their first contribution in #744

Full Changelog: v0.6.0...v0.7.0

Contributors

JartX, dependabot, and 8 other contributors

Assets 2

24 Jul 02:33

wenhuach21

v0.6.0

dd95bdb

v0.6.0

Highlights

provide experimental support for gguf q*_k format and customized mixed bits setting
support xpu in triton backend by @wenhuach21 in #563
add torch backend by @WeiweiZhang1 in #555
provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

bump version into v0.5.1 by @XuehaoSun in #540
Freeze pytorch & ipex version in CI by @XuehaoSun in #541
fix_quantization_config_for_inference by @WeiweiZhang1 in #542
[critic bug]remove redundant round in dq simulation by @wenhuach21 in #543
update readme by @wenhuach21 in #550
add recipes for qwen3 8b and 14b by @n1ck-guo in #552
itrex requires torch<2.7 by @XuehaoSun in #548
[GGUF STEP4] fix search bug and improve packing & eval speed by @n1ck-guo in #545
refine xpu requirement/config json and fix several issues by @wenhuach21 in #558
add UE5M3 simulation by @wenhuach21 in #562
support xpu in triton backend by @wenhuach21 in #563
fix typo in backend by @wenhuach21 in #564
update habana docker to 1.21.0 by @XuehaoSun in #566
Support for more gguf format and float zp for Q*_1 by @n1ck-guo in #560
update readme by @wenhuach21 in #569
update readme by @wenhuach21 in #571
support for llava-based hf model by @n1ck-guo in #568
add gguf accuracy data by @wenhuach21 in #574
add sym & asym gguf quant for gguf baseline (iter==0) by @n1ck-guo in #573
modify default asym 4bits auto-round format to awq, fix save folder typo for mllm by @WeiweiZhang1 in #575
improve the robustness of parsing vlm config by @wenhuach21 in #577
switch to transformers API in cpu ut by @wenhuach21 in #580
add torch backend by @WeiweiZhang1 in #555
fix awq exporting at group_size=-1 by @wenhuach21 in #579
refact cuda ut to facilitate automation by @n1ck-guo in #559
fix tensor shape mismatch error for API usage by @WeiweiZhang1 in #582
fix device bug at calibration by @wenhuach21 in #587
Update gguf_accuracy (q3_ks) by @SinpackKonmakan in #590
add recipes for deepseek-r1-0528 by @n1ck-guo in #588
correct errors of deepseek-r1-0528 recipes by @n1ck-guo in #591
fix cuda ut by @wenhuach21 in #592
Bump protobuf from 3.20.1 to 3.20.2 in /test/test_cuda by @dependabot[bot] in #585
rm unnecessary forward to improve speed by @wenhuach21 in #593
update readme by @wenhuach21 in #597
fix q2k bug by @n1ck-guo in #599
support for q4_k_m by @n1ck-guo in #596
fix vlm uttest path error by @WeiweiZhang1 in #601
fix lots of gguf critic bugs and support imatrix in rtn mode by @wenhuach21 in #595
fix gguf bug by @wenhuach21 in #610
mv some checkers by @wenhuach21 in #611
fix gguf packing bug and moe regression by @wenhuach21 in #614
support customized mixed bits for gguf by @wenhuach21 in #615
fix double quant sym bug by @wenhuach21 in #616
FP8 WOQ export by @wenhuach21 in #617
fix bug of q5_k_s w/ imatrix by @n1ck-guo in #620
add auto-round related vllm and transformers UT by @WeiweiZhang1 in #613
refine_doc_0624 by @WeiweiZhang1 in #619
fix not using imatrix for gguf at rtn mode by @wenhuach21 in #623
fix vlm hf config loading issue by @WeiweiZhang1 in #624
refine gguf rtn algorithm and fix bugs by @wenhuach21 in #630
fix gguf bug of moe models and lmhead/embedding bits setting regression by @n1ck-guo in #628
[BUG FIX] fix bug of deepseek gguf:q*k by @n1ck-guo in #637
support packing immediately for gguf to reduce ram usage by @wenhuach21 in #638
support llmcompressor format by @xin3he in #646
fix norm_bias_tuning by @wenhuach21 in #639
[W4A8]Fix Packing by @yiliu30 in #648
Integrate RTN quantization into GGUF packing to enhance robustness by @n1ck-guo in #644
Remove vlm cuda UT dependencies version restrictions by @XuehaoSun in #651
speedup mxfp tuning and fix nvfp bug by @wenhuach21 in #647
support two more calib datasets and fix embedding layer bug by @wenhuach21 in #653
fix some issues by @wenhuach21 in #655
fix bug of q4_0 and q5_0 at iters==0 by @n1ck-guo in #658
support vlm models for gguf format by @n1ck-guo in #654
fix bug of block-wise quant imatrix by @n1ck-guo in #663
fix gguf block-wise issue by @wenhuach21 in #664
fix bugs of export deepseek gguf format when iters=0 and q3k accuracy by @n1ck-guo in #665
handle zeros in imatrix by @wenhuach21 in #667
fix ut issue by @WeiweiZhang1 in #668
fix cuda hanging issue during packing by @WeiweiZhang1 in #669
support to use lm_eval for vlm by @n1ck-guo in #670
add trust remote code to gguf format load tokenizer by @n1ck-guo in #675
fix 3bits asym accuracy and calib dataset issues by @WeiweiZhang1 in #674
restrict accelerate version to reduce ram usage by @wenhuach21 in #673
rm low_cpu when loading the model by @wenhuach21 in #676
rm_old_vlm_cuda_ut by @WeiweiZhang1 in #678
update gguf convert file and fix bug of permute bug by @n1ck-guo in #679
fix gguf regression for large models by @wenhuach21 in #680
fix gemma vlm gguf regression by @wenhuach21 in #685

New Contributors

@SinpackKonmakan made their first contribution in #590
@xin3he made their first contribution in #646

Full Changelog: v0.5.1...v0.6.0

Contributors

dependabot, xin3he, and 6 other contributors

Assets 2

23 Apr 08:50

wenhuach21

v0.5.1

73669aa

v0.5.1:bug fix release

What's Changed

bump version into v0.5.0 by @XuehaoSun in #538
fix triton multiple gpus and some other issues by @wenhuach21 in #539

Full Changelog: v0.5.0...v0.5.1

Contributors

wenhuach21 and XuehaoSun

Assets 2

22 Apr 08:05

wenhuach21

v0.5.0

e90f991

v0.5.0

Highlights

refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
support xpu in tuning and inference by @wenhuach21 in #481
support for more vlms by @n1ck-guo in #390
change quantization method name and made several refinements by @wenhuach21 in #500
support rtn via iters==0 by @wenhuach21 in #510
fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

support xpu in tuning and inference by @wenhuach21 in #481
add light ut, fixtypos by @WeiweiZhang1 in #483
bump into v0.4.7 by @XuehaoSun in #487
fix dataset combine bug by @wenhuach21 in #489
fix llama 8b time cost by @WeiweiZhang1 in #490
update 2bits acc results by @WeiweiZhang1 in #491
fix bug of mix calib dataset by @n1ck-guo in #492
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #494
[GGUF support step3]patch for double quant by @n1ck-guo in #473
refine inference backend/code step 1 by @wenhuach21 in #486
refine inference step 2 by @wenhuach21 in #498
change quantization method name and made several refinements by @wenhuach21 in #500
fix bug of awq/gptq modules_to_not_convert by @n1ck-guo in #501
use --tasks to control evaluation enabling by @wenhuach21 in #505
fix gguf eval regression bug by @n1ck-guo in #506
change to new api in readme by @wenhuach21 in #507
fix setup issue on cuda machine by @wenhuach21 in #511
support rtn via iters==0 by @wenhuach21 in #510
fix critical bug of get_multimodal_block_names by @n1ck-guo in #509
Update requirements-lib.txt by @yiliu30 in #513
add group_size divisible check in backend by @wenhuach21 in #512
support for more vlms by @n1ck-guo in #390
move gguf-dq test to cuda by @n1ck-guo in #520
fix bs!=1 for gemma and MiniMax-Text-01 by @wenhuach21 in #515
add regex support in layer_config setting by @wenhuach21 in #519
patch for vlm by @n1ck-guo in #518
rename backend to packing_format in config.json by @wenhuach21 in #521
fix example's model_dtype by @WeiweiZhang1 in #523
rm fp16 export in autoround format by @wenhuach21 in #525
update convert_hf_to_gguf to support more models by @n1ck-guo in #524
fix light config by @WeiweiZhang1 in #526
fix typos, add model card link for VLMs by @WeiweiZhang1 in #527
add backend readme by @wenhuach21 in #528
update mllm readme by @WeiweiZhang1 in #530
fix bug of cuda ut by @n1ck-guo in #532
fix inference issue by @wenhuach21 in #529
update readme by @wenhuach21 in #531
refine readme by @WeiweiZhang1 in #536
fix cuda ut by @n1ck-guo in #537

Full Changelog: v0.4.7...v0.5.0

Contributors

pre-commit-ci, yiliu30, and 4 other contributors

Assets 2

01 Apr 09:50

wenhuach21

v0.4.7

2d904a4

v0.4.7

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

step-1 support naive double quant in tuning by @wenhuach21 in #442
fix critic bug of mxfp4 by @wenhuach21 in #451
update readme by @wenhuach21 in #455
update eval by @n1ck-guo in #450
awq exporting bugfix by @WeiweiZhang1 in #456
Support force loading into autoround Format by @WeiweiZhang1 in #453
20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
fixl eval bug by @n1ck-guo in #461
[STEP-1]W4Afp8 export by @wenhuach21 in #378
[HPU] Update W4A8 for HPU by @yiliu30 in #467
support for gemma3 by @n1ck-guo in #468
upload_auto-round-light results by @WeiweiZhang1 in #454
GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
fix incorrect recipe data by @WeiweiZhang1 in #471
support for mistral3 by @n1ck-guo in #472
support to export gemma3 gguf format by @n1ck-guo in #470
Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
rm redundant line break by @WeiweiZhang1 in #475
Temporarily close qxk api for new release by @n1ck-guo in #478
add restrict for exporting act-quant models by @n1ck-guo in #480

Full Changelog: v0.4.6...v0.4.7

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

Releases: intel/auto-round

v0.9.2 patch release

Uh oh!

v0.9.1 patch release

Uh oh!

v0.9.0

Highlights

What's Changed

Contributors

Uh oh!

v0.8.0

Highlights

What's Changed

Contributors

Uh oh!

v0.7.1 patch release

Uh oh!

v0.7.0

🚀 Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.1:bug fix release

What's Changed

Contributors

Uh oh!

v0.5.0

Highlights

What's Changed

Contributors

Uh oh!

v0.4.7

Highlights

What's Changed

Contributors

Uh oh!