Skip to content

Releases: KernelTuner/kernel_tuner

Version 0.3.1

11 Jun 14:50

Choose a tag to compare

A small release for 2 small new features and a bugfix for older GPUs.

[0.3.1] - 2020-06-11

Added

  • kernelbuilder functionality for including kernels in Python applications
  • smem_args option for dynamically allocated shared memory in CUDA kernels

Changed

  • bugfix for NVML Error on Nvidia devices without internal current sensor

Version 0.3.0

20 Dec 16:40

Choose a tag to compare

Version 0.3.0

This is the release of version 0.3.0 of Kernel Tuner. We have done a lot of work on the internals of Kernel Tuner. This release fixes several issues, adds and extends new features, and simplifies the user interface.

[0.3.0] - 2019-12-20

Changed

  • fix for output checking, custom verify functions are called just once
  • benchmarking now returns multiple results not only time
  • more sophisticated implementation of genetic algorithm strategy
  • how the "method" option is passed, now use strategy_options

Added

  • Bayesian Optimizaton strategy, use strategy="bayes_opt"
  • support for kernels that use texture memory in CUDA
  • support for measuring energy consumption of CUDA kernels
  • option to set strategy_options to pass strategy specific options
  • option to cache and restart from tuned kernel configurations cachefile

Removed

  • Python 2 support, it may still work but we no longer test for Python 2
  • Noodles parallel runner

Version 0.2.0

16 Nov 16:07

Choose a tag to compare

Version 0.2.0

Version 0.2.0 adds a large number of search optimization algorithms and basic support for testing and tuning Fortran kernels.

Changed

  • no longer replacing kernel names with instance strings during tuning
  • bugfix in tempfile creation that lead to too many open files error

Added

  • A minimal Fortran example and basic Fortran support
  • Particle Swarm Optimization strategy, use strategy="pso"
  • Simulated Annealing strategy, use strategy="simulated_annealing"
  • Firefly Algorithm strategy, use strategy="firefly_algorithm"
  • Genetic Algorithm strategy, use strategy="genetic_algorithm"

Version 0.1.9

18 Apr 10:10

Choose a tag to compare

[0.1.9] - 2018-04-18

Changed

  • bugfix for C backend for byte array arguments
  • argument type mismatches throw warning instead of exception

Added

  • wrapper functionality to wrap C++ functions
  • citation file and zenodo doi generation for releases

Version 0.1.8

23 Nov 21:01

Choose a tag to compare

Version 0.1.8 brings many improvements, mostly focused on user friendliness. The installation process of optional dependencies is simplified as you can now use extras with pip. For example, pip install kernel_tuner[cuda] can be used to install both Kernel Tuner and the optional dependency PyCuda. In addition, Version 0.1.8 introduces many more checks on the user input that you pass to tune_kernel and run_kernel. For example, the kernel source code is parsed to see if the signature matches the argument list. The additional checks on input should make it easier to use and debug programs using Kernel Tuner. For a more detailed overview of the changes, see below:

[0.1.8] - 2017-11-23

Changed

  • bugfix for when using iterations smaller than 3
  • the install procedure now uses extras, e.g. [cuda,opencl]
  • option quiet makes tune_kernel completely quiet
  • extensive updates to documentation

Added

  • type checking for kernel arguments and answers lists
  • checks for reserved keywords in tunable paramters
  • checks for whether thread block dimensions are specified
  • printing units for measured time with CUDA and OpenCL
  • option to print all measured execution times

Version 0.1.7

10 Nov 14:49

Choose a tag to compare

[0.1.7] - 2017-10-11

Changed

  • bugfix install when scipy not present
  • bugfix for GPU cleanup when using Noodles runner
  • reworked the way strings are handled internally

Added

  • option to set compiler name, when using C backend

Version 0.1.6

24 Aug 13:23

Choose a tag to compare

Version 0.1.6

Version 0.1.6 brings a few bugfixes but mostly extends the existing functionality of the tuner. Three new search strategies have been added and are now ready to use: minimize, basinhopping, and diff_evo. For more info on what these strategies do and what solvers and methods they support please see the documentation pages.

From the CHANGELOG:

[0.1.6] - 2017-08-17

Changed

  • actively freeing GPU memory after tuning
  • bugfix for 3D grids when using OpenCL

Added

  • support for dynamic parallelism when using PyCUDA
  • option to use differential evolution optimization
  • global optimization strategies basinhopping, minimize

Version 0.1.5

21 Jul 13:51

Choose a tag to compare

Version 0.1.5

Version 0.1.5 brings more flexibility, you can now pass code generating functions, your own functions for verifying kernel output correctness, and use your own names for the thread block dimensions.

Internally, quite a lot has changed in this version. The runners have been separated into strategies and runners. And the way that options are passed around within the Kernel Tuner has changed dramatically.

From the CHANGELOG:

[0.1.5] - 2017-07-21

Changed

  • option to pass a fraction to the sample runner
  • fixed a bug in memset for OpenCL backend

Added

  • parallel tuning on single node using Noodles runner
  • option to pass new defaults for block dimensions
  • option to pass a Python function as code generator
  • option to pass custom function for output verification

Version 0.1.4

14 Jun 09:24

Choose a tag to compare

This release adds that tune_kernel will also return a dictionary containing information about the environment in which the benchmarking of the kernel was performed. This is very useful for understanding how and under what circumstances certain measurement results were obtained.

In addition, there were some very minor changes in the way C functions are compiled and called.

Version 0.1.3

06 Apr 13:59

Choose a tag to compare

Bugfixes for handling scalar arguments and documentation update.