Skip to content
/ amq Public

[EMNLP 2025] AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

License

Notifications You must be signed in to change notification settings

dlwns147/amq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

대표이미지1

🎉 Accepted as Oral Paper at EMNLP 2025 Main Conference! 🎉

EMNLP 2025 arXiv

Authors

Sangjun Lee*, Seung-taek Woo*, Jun-gyu Jin, Changhun Lee, Eunhyeok Park
* Equal contribution

Description

This is the official repository for AMQ.

AMQ is an automated mixed-precision quantization library for Large Language Models (LLMs). It uses multi-objective optimization to find the optimal balance between model performance and efficiency.

Key Features

  • Multiple Quantization Methods: Support for AWQ, GPTQ, OWQ, and more
  • Multi-objective Optimization: NSGA-II based search algorithm
  • Surrogate Models: Efficient exploration through MLP and RBF
  • Layer-wise Sensitivity Analysis: Measure quantization sensitivity per layer
  • Automated Mixed-precision Search: Automatic exploration of optimal bit configurations

Installation

Installation via pip

pip install -e .

Installation via requirements

pip install -r requirements.txt

Usage Examples

0. Prepare the Quantization proxy

bash scripts/amq_quantiztion_proxy.sh 0

1. Measure Layer Sensitivity

bash scripts/amq_sensitivity.sh 0

2. Mixed-precision Search

bash scripts/amq_search.sh 0

3. Evaluate Search Results

bash scripts/amq_quantization_proxy

Speed Benchmark

0. Install kernel environment

bash scripts/amq_install_kernel.sh

1. Measure Speed (It requires the Quantization proxy)

bash scripts/amq_speed_benchmark.sh 0

Supported Models

  • Llama 2 (7B, 13B, 70B)
  • Mistral
  • Qwen2

Dependencies

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • Transformers == 4.45.2
  • HQQ >= 0.2.0
  • See requirements.txt for more

Configuration Files

Model-specific configuration files are located in the configs/ directory:

  • configs/llama.json - Llama model configuration
  • configs/mistral.json - Mistral model configuration
  • configs/qwen2.json - Qwen2 model configuration

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Citation

If you use this work in your research, please cite:

@inproceedings{lee2025amq,
  title={Amq: Enabling automl for mixed-precision weight-only quantization of large language models},
  author={Lee, Sangjun and Woo, Seung-taek and Jin, Jun-gyu and Lee, Changhun and Park, Eunhyeok},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={35520--35538},
  year={2025}
}

Contributing

Contributions are welcome! Please submit a Pull Request or open an issue.

Contact

If you have any questions or feedback, please open an issue.

About

[EMNLP 2025] AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •