Skip to content

Halide 10.0.0

Compare
Choose a tag to compare
@alexreinking alexreinking released this 16 Sep 20:17
· 35 commits to release/10.x since this release

We are pleased to announce the release of Halide 10.0.0!

This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.

What happened to version 9?

For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).

Autoschedulers

  • There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
  • The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
  • The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.

Build

  • The CMake build has been rewritten. See README_cmake.md for details.
  • The minimum CMake version is now 3.16
  • The old halide.cmake module has been removed in favor of find_package(Halide).
  • We no longer support the MinGW toolchain.

Language features

  • The atomic scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not).
  • Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the vectorize and atomic directives
  • Integer division or mod by zero now returns zero instead of being undefined behavior.
  • The simplifier is now formally verified.
  • You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
  • Allocation size inference is more precise in a variety of cases.
  • Various bugfixes for compute_with.

Backends and targets

  • Better Direct3D 12 support
  • Added support for macOS and Windows on ARM.
  • We no longer support the legacy buffer_t type.
  • Explicit support for Volta, Turing, Ampere GPUs