Skip to content

Conversation

@chensuyue
Copy link
Contributor

@chensuyue chensuyue commented Jan 5, 2026

This pull request introduces significant improvements to the AutoRound Kernel integration, including enabling and documenting the kernel backend, updating requirements and installation methods, and enhancing test coverage. It also refactors the ARK QLinear implementation for better device and dtype handling. The most important changes are summarized below:

AutoRound Kernel Integration and Documentation:

  • Added a comprehensive README.md for AutoRound Kernel, detailing supported hardware, quantization configurations, versioning, and installation instructions.
  • Introduced a new installation script install_kernel.py to automatically detect the PyTorch version and install the appropriate kernel version, and registered a new CLI command auto-round-kernel-install. [1] [2]

Backend Configuration and Requirements:

  • Enabled and registered the auto_round_kernel, auto_round_kernel_zp, and auto_round_kernel_awq backends for CPU (previously commented out), and updated their PyTorch version requirements to torch>=2.8.0 for broader compatibility. [1] [2] [3]
  • Removed the "kernel" extra from setup.py to simplify dependency management.

ARK QLinear Refactoring:

  • Refactored ark/qlinear.py to instantiate ARK via auto_round_kernel.ARK(), improved dtype/device handling, unified bias and input dtype logic, and switched to using woqgemm for computation. [1] [2] [3]

Testing Improvements:

  • Expanded test coverage in test_model.py to include CPU devices for all relevant test cases, and re-enabled tests for additional quantization configurations.

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: chensuyue <[email protected]>
@chensuyue chensuyue marked this pull request as ready for review January 5, 2026 14:27
@chensuyue chensuyue added this to the 0.9.5 milestone Jan 8, 2026
Copilot AI review requested due to automatic review settings January 12, 2026 08:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@chensuyue
Copy link
Contributor Author

  • Test on CPU must run with OMP_NUM_THREADS and numactl, e.g. OMP_NUM_THREADS=32 numactl -C "0-31" pytest -v test_ark/test_model.py.
  • Test verified locally on GNR and BMG.
  • Keep ark CI closed, because test failed on CI machine (EMR), debugging WIP.

@chensuyue chensuyue merged commit ba48dd7 into main Jan 15, 2026
28 checks passed
@chensuyue chensuyue deleted the suyue/ark_install branch January 15, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants