Sangjun Lee*, Seung-taek Woo*, Jun-gyu Jin, Changhun Lee, Eunhyeok Park
* Equal contribution
This is the official repository for AMQ.
AMQ is an automated mixed-precision quantization library for Large Language Models (LLMs). It uses multi-objective optimization to find the optimal balance between model performance and efficiency.
- Multiple Quantization Methods: Support for AWQ, GPTQ, OWQ, and more
- Multi-objective Optimization: NSGA-II based search algorithm
- Surrogate Models: Efficient exploration through MLP and RBF
- Layer-wise Sensitivity Analysis: Measure quantization sensitivity per layer
- Automated Mixed-precision Search: Automatic exploration of optimal bit configurations
pip install -e .pip install -r requirements.txtbash scripts/amq_quantiztion_proxy.sh 0bash scripts/amq_sensitivity.sh 0bash scripts/amq_search.sh 0bash scripts/amq_quantization_proxybash scripts/amq_install_kernel.shbash scripts/amq_speed_benchmark.sh 0- Llama 2 (7B, 13B, 70B)
- Mistral
- Qwen2
- Python >= 3.8
- PyTorch >= 2.0.0
- Transformers == 4.45.2
- HQQ >= 0.2.0
- See requirements.txt for more
Model-specific configuration files are located in the configs/ directory:
configs/llama.json- Llama model configurationconfigs/mistral.json- Mistral model configurationconfigs/qwen2.json- Qwen2 model configuration
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
If you use this work in your research, please cite:
@inproceedings{lee2025amq,
title={Amq: Enabling automl for mixed-precision weight-only quantization of large language models},
author={Lee, Sangjun and Woo, Seung-taek and Jin, Jun-gyu and Lee, Changhun and Park, Eunhyeok},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={35520--35538},
year={2025}
}
Contributions are welcome! Please submit a Pull Request or open an issue.
If you have any questions or feedback, please open an issue.
