Halide 10.0.0
alexreinking
released this
16 Sep 20:17
·
35 commits
to release/10.x
since this release
We are pleased to announce the release of Halide 10.0.0!
This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.
What happened to version 9?
For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).
Autoschedulers
- There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
- The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
- The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.
Build
- The CMake build has been rewritten. See
README_cmake.md
for details. - The minimum CMake version is now 3.16
- The old
halide.cmake
module has been removed in favor offind_package(Halide)
. - We no longer support the MinGW toolchain.
Language features
- The
atomic
scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not). - Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the
vectorize
andatomic
directives - Integer division or mod by zero now returns zero instead of being undefined behavior.
- The simplifier is now formally verified.
- You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
- Allocation size inference is more precise in a variety of cases.
- Various bugfixes for
compute_with
.
Backends and targets
- Better Direct3D 12 support
- Added support for macOS and Windows on ARM.
- We no longer support the legacy
buffer_t
type. - Explicit support for Volta, Turing, Ampere GPUs