Releases: KernelTuner/kernel_tuner
Version 1.0
Finally, the Version 1.0 release is here! The software has been stable and ready for production use for quite some time now and after being in beta for about a half a year, we are confident that the current version of the software deserves to mark the first major release of Kernel Tuner.
Version 1.0 integrates a lot of new functionality, including blazing fast search space construction, support for tuning HIP kernels on AMD GPUs, new functionality for mixed precision and accuracy tuning, experimental support for tuning OpenACC programs, a conda package installer for Kernel Tuner, and many more changes and additions.
I would like to thank every one involved in the development of Kernel Tuner of the past years! Special thanks to the Kernel Tuner developers team for their continued support of the project!
From the Changelog
- HIP backend to support tuning HIP kernels on AMD GPUs
- Experimental features for mixed-precision and accuracy tuning
- Experimental features for OpenACC tuning
- Major speedup due to new parser and using revamped python-constraint for searchspace building
- Implemented ability to use
PySMT
andATF
for searchspace building - Added Poetry for dependency and build management
- Switched from
setup.py
andsetup.cfg
topyproject.toml
for centralized metadata, added relevant tests - Updated GitHub Action workflows to use Poetry
- Updated dependencies, most notably NumPy is no longer version-locked as scikit-opt is no longer a dependency
- Documentation now uses
pyproject.toml
metadata, minor fixes and changes to be compatible with updated dependencies - Set up Nox for testing on all supported Python versions in isolated environments
- Added linting information, VS Code settings and recommendations
- Discontinued use of
OrderedDict
, as all dictionaries in the Python versions used are already ordered - Dropped Python 3.7 support
Merged Pull Requests
- HIP Backend by @MiloLurati in #199
- Accuracy tuning by @stijnh in #189
- Fix issue where HIP backend fails due to invalid arguments type by @stijnh in #216
- Searchspace improvements and project meta modernization by @fjwillemsen in #214
- Minor bugfix by @isazi in #219
- OpenACC support by @isazi in #197
- Fixed broken tests as per issue #217 by @fjwillemsen in #220
- Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
- expand documentation on backends by @benvanwerkhoven in #213
- Add support for passing cupy arrays to "C" lang by @bouweandela in #226
- improve code quality of cache file related functions by @benvanwerkhoven in #240
- New readme by @benvanwerkhoven in #231
New Contributors
- @MiloLurati made their first contribution in #199
- @dependabot made their first contribution in #222
- @bouweandela made their first contribution in #226
Full Changelog: 0.4.5...1.0
Version 1.0.0b6
This is a beta release for early access to the new features. Not intended for production use.
The release contains:
- Inclusion of tests in the source package, as requested in #225
- Updated dependencies
Version 1.0.0b5
This is a beta release for early access to the new features. Not intended for production use.
The release contains:
- Expanded documentation on backends by @benvanwerkhoven in #213
- A fix for an issue that could cause incorrect conversion to Constraint
- Extended tests to detect this
- Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #222
- Updated dependencies
Full Changelog: 1.0.0b4...1.0.0b5
Version 1.0.0b4
This is a beta release for early access to the new features. Not intended for production use.
This release contains several improvements:
nvidia-ml-py
added totutorial
extra dependencies.- Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
- Updated dependencies.
Version 1.0.0b3
This is a beta release for early access to the new features. Not intended for production use.
This version contains several bugfixes:
- Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
- Fixed an issue where some restrictions would not be recognized by the old
check_restrictions
function. - Fixed an issue where
bayes_opt
would not handle pruned parameters correctly.
Full Changelog: 1.0.0b2...1.0.0b3
Version 1.0.0b2
This is a beta release for early access to the new features. Not intended for production use.
Full Changelog: 1.0.0b1...1.0.0b2
Version 1.0.0 beta 1
This is a beta release for early access to the new features. Not intended for production use.
What's Changed
- HIP Backend by @MiloLurati in #199
- Accuracy tuning by @stijnh in #189
- Fix issue where HIP backend fails due to invalid arguments type by @stijnh in #216
- Searchspace improvements and project meta modernization by @fjwillemsen in #214
- Minor bugfix by @isazi in #219
- OpenACC support by @isazi in #197
- Fixed broken tests as per issue #217 by @fjwillemsen in #220
New Contributors
- @MiloLurati made their first contribution in #199
Full Changelog: 0.4.5...1.0.0b1
Version 0.4.5
Version 0.4.5 adds support of using PMT in combination with Kernel Tuner enabling power and energy measurements on a wide range of devices. In addition, we have worked extensively on the internals of Kernel Tuner and the interfaces of the separate components that together make up Kernel Tuner. Along with a few bugfixes, fixes of small errors in examples and documentation.
[0.4.5] - 2023-06-01
Added
- PMTObserver to measure power and energy on various platforms
Changed
- Improved functionality for storing output and metadata files
- Updated PowerSensorObserver to support PowerSensor3
- Refactored interal interfaces of runners and backends
- Bugfix in interface to set objective and optimization direction
Version 0.4.4
Version 0.4.4
Version 0.4.4 adds extended support for energy efficiency tuning. In particular, with the new capability to fit a performance model to the target GPUs power-frequency curve. How to use these features is demonstrated in:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/going_green_performance_model.py
And described in the paper:
Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning
R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022
https://arxiv.org/abs/2211.07260
Other than that, we've implemented a new output and metadata JSON format that adheres to the 'T4' auto-tuning schema created by the auto-tuning community at the Lorentz Center workshop in March 2022.
From the changelog:
[0.4.4] - 2023-03-09
Added
- Support for using time_limit in simulation mode
- Helper functions for energy tuning
- Example to show ridge frequency and power-frequency model
- Functions to store tuning output and metadata
Changed
- Changed what timings are stored in cache files
- No longer inserting partial loop unrolling factor of 0 in CUDA
Version 0.4.3
The version 0.4.3 release consists of a large number of changes to the internals of Kernel Tuner, including the addition of a new backend based on Nvidia's official Python bindings for CUDA, as well as improved functionality for tuning energy efficiency, e.g. measuring core voltages, the measurement of power and the interface with NVML has also improved a lot.
Some of the changes are also in the "externals" of Kernel Tuner. In the sense that we have migrated from https://github.com/benvanwerkhoven/ to https://github.com/KernelTuner. The goal of this move is to bring the collection of repositories belonging to the larger Kernel Tuner project under one organization.
From the Changelog:
[0.4.3] - 2022-10-19
Added
- A new backend that uses Nvidia cuda-python
- Support for locked clocks in NVMLObserver
- Support for measuring core voltages using NVML
- Support for custom preprocessor definitions
- Support for boolean scalar arguments in PyCUDA backend
Changed
- Migrated from github.com/benvanwerkhoven to github.com/KernelTuner
- Significant update to the documentation pages
- Unified benchmarking loops across backends
- Backends are no longer context managers
- Replaced the method for measuring power consumption using NVML
- Improved NVML measurements of temperature and clock frequencies
- bugfix in parse_restrictions when using and/or in expressions
- bugfix in GreedyILS when using neighbor method "adjacent"
- bugfix in Bayesian Optimization for small problems