diff --git a/spec-constant/index.md b/spec-constant/index.md index 5ed351e..f604a51 100644 --- a/spec-constant/index.md +++ b/spec-constant/index.md @@ -2,7 +2,8 @@ |-------------|--------| | Name | SYCL Specialization Constant | | Date of Creation | 18 April 2018 | -| Target | SYCL 1.2.1 extension / SYCL 2.2 | +| Target | SYCL 1.2.1 extension | +| Depends on | `kernel_handler` | | Current Status | _Work in progress_ | | Reply-to | Victor Lomüller | | Original author | Victor Lomüller , Toomas Remmelg | @@ -13,13 +14,13 @@ ## Motivation Many applications use runtime known constants to adapt their behaviors to their runtime environment. -Such constants are unknown when the developer compiles the application but will remain invariant through-out the application execution. -This is especially true for highly tuned software that requires information about the hardware on which the the application is running. +Such constants are unknown when the developer compiles the application but will remain invariant throughout the application execution. +This is especially true for highly tuned software that requires information about the hardware on which the application is running. Since OpenCL C kernels are being fully compiled at runtime, those constants are usually expressed as macro and the value is passed to online compiler when the kernel is being compiled. However, SYCL being statically compiled, it is not possible to use this approach. Template based techniques might not be possible or come at the price of code size explosion. -SPIR-V, the standard intermediate representation for shader and compute kernels, introduced "specialization constants" as a way to replace this macro usage in statically compiled kernels. +SPIR-V, the standard intermediate representation for shader and compute kernels, introduced *specialization constants* as a way to replace this macro usage in statically compiled kernels. Specialization constants in SPIR-V are treated as constants whose value is not known at the time of the SPIR-V module generation. Providing these constants before building the module for the actual target provides the compiler with the opportunity to further optimize the program. @@ -28,7 +29,55 @@ Even if the motivation is derived from a SPIR-V concept, its usage is not limite ## Specialization Constant Overview -The following SYCL program present a specialization constant are expressed. +The following present a minimal SYCL 1.2.1 program using specialization constant written with C++17: + +```cpp +#include +#include + +class specialized_kernel; + +// Fetch a value at runtime. +float get_value(); + +// Declare a specialization constant id. +// The variable `runtime_const` will be used as the id. +cl::sycl::specialization_id runtime_const(42.f); + +int main() { + cl::sycl::queue queue; + std::vector vec(1); + + { + cl::sycl::buffer buffer(vec.data(), vec.size()); + + queue.submit([&](cl::sycl::handler &cgh) { + auto acc = cgh.get_access(buffer); + + // Set a runtime as a JIT constant. + // This may force the runtime to compile `specialized_kernel` + // using the value returned by get_value. + cgh.set_specialization_constant(get_value()); + + cgh.single_task([=](cl::sycl::kernel_handler h) { + // The value returned by get_specialization_constant is the value + // returned by get_value(). + acc[0] = h.get_specialization_constant(); + }); + }); + } + return 0; +} +``` + +The global variable `runtime_const` declare a specialization constant id with the default value `42.0`. +This variable acts as an identifier for SYCL to refer to a constant whose value may only be set at runtime. +In the main function, a SYCL kernel is enqueued. Before it is enqueued, the call `cgh.set_specialization_constant(get_value())` is binding the value returned by `get_value()` to the current module and the specialization id `runtime_const`. +In the kernel, the value return by `h.get_specialization_constant()` will be the value returned by `get_value()` before the enqueue. +If the target natively supports specialization constant, this value will be known by the underlying consumer when it builds the kernel. +Without the call to `set_specialization_constant`, the call `get_specialization_constant` would still be valid and would have return the default value `42.0`. + +The following present an equivalent program as above but in C++11 and with a prebuild kernel: ```cpp #include @@ -40,36 +89,42 @@ class runtime_const; // Fetch a value at runtime. float get_value(); +// Declare a specialization constant id. +// The variable `runtime_const` will be used as the id. +cl::sycl::specialization_id runtime_const(42.f); + int main() { - cl::sycl::queue queue; - cl::sycl::program program(queue.get_context()); - - // Create a specialization constant. - cl::sycl::experimental::spec_constant my_constant = - program.set_spec_constant(get_value()); - program.build_with_kernel_type(); - - std::vector vec(1); - { - cl::sycl::buffer buffer(vec.data(), vec.size()); - - queue.submit([&](cl::sycl::handler& cgh) { - auto acc = cgh.get_access(buffer); - cgh.single_task( - program.get_kernel(), - [=]() { acc[0] = my_constant.get(); }); - }); - } + cl::sycl::queue queue; + cl::sycl::program program(queue.get_context()); + + // Set the value of the specialization constant. + program.set_specialization_constant(get_value()); + + // Build the program, the value set by set_spec_constant + // will be used as a constant by the underlying JIT + // if it has native support for specialization constant. + program.build_with_kernel_type(); + + std::vector vec(1); + { + cl::sycl::buffer buffer(vec.data(), vec.size()); + + queue.submit([&](cl::sycl::handler &cgh) { + auto acc = cgh.get_access(buffer); + + cgh.single_task( + program.get_kernel(), + [=](cl::sycl::kernel_handler h) { + // This should become a constant. + acc[0] = h.get_specialization_constant(); + }); + }); + } + return 0; } ``` -In this example, the call to `set_spec_constant` binds the value returned by the call to `get_value` to the SYCL `program`. -At static compilation time, the value is unknown to the SYCL device compiler, thus cannot be used by the optimizations. -At runtime, `get_value` is evaluated and bond to the SYCL `program`, giving the opportunity for the underlying OpenCL runtime to use it during the kernel build. -The function `set_spec_constant` returns a `spec_constant` object allowing the user to use the value inside the kernel. -After all runtime values are bounded to the program, the program is built. -The specialization constant `my_constant` is later used inside `specialized_kernel` and the expression `my_constant.get()` returns the value returned by the call to `get_value()`. -If the target natively supports specialization constant, this value will be known by the underlying OpenCL consumer when it builds the kernel. +This code is doing the same as previously expect that the kernel is precompiled and compatible with C++11. A more concrete example would be a blocked matrix-matrix multiply. In this version of the algorithm, threads in a work-group collectively load elements from global to local memory and then perform part of the operation on the block. @@ -150,27 +205,28 @@ void mat_multiply(cl::sycl::queue& q, T* MA, T* MB, T* MC, int matSize) { } } ``` -In this example, `blockSize` depends on a runtime feature that is a hardware constant for a given device, thus never changes once known. + +In this example, `blockSize` depends on a runtime feature that is a hardware constant depending on the device, thus never changes once known. The main issue is that this value is treated as a constant variable, and the compiler is unable to use it to perform optimizations like constant propagation or loop unrolling. It can even have an adverse effect as the value will use a register and the compiler may be forced to spill or reload the value multiple times. Using specialization constants, the routine can be rewritten as: + ```cpp +using namespace cl::sycl; + +specialization_id block_size{1}; + + template -void mat_multiply(cl::sycl::queue& q, T* MA, T* MB, T* MC, int matSize) { +void mat_multiply(queue& q, T* MA, T* MB, T* MC, int matSize) { auto device = q.get_device(); // Choose a block size based on some information about the device. auto maxBlockSize = - device.get_info(); + device.get_info(); auto blockSizeCst = prevPowerOfTwo(std::sqrt(maxBlockSize)); blockSizeCst = std::min(matSize, blockSize); - cl::sycl::program program(queue.get_context()); - - // Create a specialization constant to encapsulate blockSize. - auto blockSize = program.set_spec_constant(blockSizeCst); - program.build_with_kernel_type>(); - { range<1> dimensions(matSize * matSize); buffer bA(MA, dimensions); @@ -181,18 +237,19 @@ void mat_multiply(cl::sycl::queue& q, T* MA, T* MB, T* MC, int matSize) { auto pA = bA.template get_access(cgh); auto pB = bB.template get_access(cgh); auto pC = bC.template get_access(cgh); - auto localRange = range<1>(blockSize * blockSize); + auto localRange = range<1>(blockSizeCst * blockSizeCst); accessor pBA( localRange, cgh); accessor pBB( localRange, cgh); + cgh.set_spec_constant(blockSizeCst); cgh.parallel_for>( program.get_kernel>(), nd_range<2>{range<2>(matSize, matSize), - range<2>(blockSize, blockSize)}, - [=](nd_item<2> it) { + range<2>(blockSizeCst, blockSizeCst)}, + [=](nd_item<2> it, kernel_handler handler) { // Current block int blockX = it.get_group(0); int blockY = it.get_group(1); @@ -201,6 +258,7 @@ void mat_multiply(cl::sycl::queue& q, T* MA, T* MB, T* MC, int matSize) { int localX = it.get_local(0); int localY = it.get_local(1); + std::size_t blockSize = h.get_specialization_constant(); // Start in the A matrix int a_start = matSize * blockSize * blockY; // End in the b matrix @@ -238,60 +296,53 @@ void mat_multiply(cl::sycl::queue& q, T* MA, T* MB, T* MC, int matSize) { } } ``` -In this example, `blockSize` is now a specialization constant holding the value same value as before, meaning that the value is now injected inside the module, allow the OpenCL consumer to use the value in the optimizations. -Note that the specialization constant ID is independent from the template parameter `T` from which the kernel depends on. This means that all kernel instances will share the same value. +In this example, `blockSize` is now depending on a specialization constant, meaning that the value can be now injected inside the module. The consumer can freely use the value to perform optimizations (loop unrolling for instance). +Note that the specialization constant ID is independent of the template parameter `T` from which the kernel depends on. This means that all kernel instances will share the same value. ## Specialization Constant Representation -Specialization constants are encapsulated into a `cl::sycl::experimental::spec_constant` immutable object which can be passed to a SYCL kernel as a parameter. -This object can only be constructed by the SYCL runtime. -Accessing the value is done either explicitly via a `get` function or an implicit conversion. +Specialization constants are represented by instances of `cl::sycl::specialization_id` in the namespace scope. +Each instance of `cl::sycl::specialization_id` holds a default value that can be set using the constexpr constructor. +Note that the device compiler must be able to evaluate the default value at compile time to establish a valid module. -The `cl::sycl::experimental::spec_constant` interface is defined as follows: +The `cl::sycl::specialization_id` interface is defined as follows: ```cpp namespace cl { namespace sycl { -namespace experimental { -template -class spec_constant { +template +class specialization_id { private: // Implementation defined constructor. - spec_constant(/* Implementation defined */); + specialization_id(const specialization_id&) = delete; + specialization_id(specialization_id&&) = delete; + + public: - spec_constant(); + static_assert(std::is_trivially_copyable::value, "T must be trivially copyable"); - T get() const; // explicit access. - operator T() const; // implicit conversion. + using type = T; + + // Argument `Args` are forwarded to an underlying T Ctor. + // This allow the user to setup a default value for the specialization_id instance. + // The initialization of T must be evaluated at compile time to be valid. + template + explicit constexpr specialization_id(Args&&...); }; -} // namespace experimental } // namespace sycl } // namespace cl ``` -Where `T` is the type of the constant. To be valid, the type `T` must be standard layout and trivially copyable. -The template parameter `ID` is a unique name to designate the specialization constant. -The name follows the same requirement and restrictions as the SYCL kernel names. -It is valid for a program to reuse a kernel name for a specialization constant name and vice versa. - -There is no guarantees about the size of the object, whether or not the constant is stored in memory is left as an implementation detail. - -A `cl::sycl::experimental::spec_constant` object is considered initialized once the result of a `cl::sycl::program::set_spec_constant` is assigned to it. - -Once initialized, `cl::sycl::experimental::spec_constant` objects are immutable, attempts to circumvent this property produces undefined behavior. - -`cl::sycl::experimental::spec_constant` is default constructible, although the object is not considered initialized until the result of the call to `cl::sycl::program::set_spec_constant` is assigned to it. - -Attempts to use an uninitialized `cl::sycl::experimental::spec_constant` produces undefined behavior. +Where `T` is the type of the constant. To be valid, the type `T` must be trivially copyable. +Instances of `cl::sycl::specialization_id` must be forward declarable. ## Building Programs with Specialization Constants -SYCL program requiring a specialization constant value to be built must first set them before building. - -The program interface is extended to include a mechanism to set the constant. +SYCL program using specialization constants can be set before being built. +The `program` interface is extended to include a mechanism to set the constant. ```cpp namespace cl { @@ -300,8 +351,25 @@ namespace sycl { class program { // ... public: - template - spec_constant set_spec_constant(T cst); + + /** + * Returns true if the current program can support specialization constants natively. + * + */ + bool has_native_spec_constant() const noexcept; + + /** + * Set a new the value for the specialization constant represented identified by the specialization_id instance. + */ + template &> + void set_specialization_constant(T cst); + + /** + * Set a new the value for the specialization constant represented identified by the specialization_id instance `s`. + */ + template + void set_specialization_constant(typename std::remove_reference_t::type); + // ... }; @@ -309,65 +377,65 @@ public: } // namespace cl ``` -The templated member function `set_spec_constant` takes a runtime value of type `T` that will be used to set the specialization constant named `ID`. -Multiple specialization constants can be set for the same program by calling `set_spec_constant` multiple times. -Previously created `cl::sycl::experimental::spec_constant` objects becomes invalids and any usage of invalided objects produce undefined behavior. - -A specialization constant value can be overwritten if the program was not built before by recalling `set_spec_constant` with the same `ID` and the new value. -Although the type `T` of the specialization constant must remain the same. +The templated member function `set_specialization_constant` binds a runtime value of type `T` to `specialization_id` id and the program. +If a value were already set for a given `specialization_id` id then the value is overwritten by the new value. Once all specialization constants are set, the program can be compile/built using program's function `compile_with_kernel_type`/`build_with_kernel_type`. +Once the program is in a build state, the specialization constant can no longer be changed for the program and call to `set_specialization_constant` will throw a `cl::sycl::spec_const_error` exception. -If a required specialization constant is not set before calling `compile_with_kernel_type` / `build_with_kernel_type`, a `cl::sycl::experimental::spec_const_error` is thrown and the build of the kernel fails. +## Getting Specialization Constants via the command group handler -For a same kernel, it is valid to set different specialization constants to different `cl::sycl::program` that builds it. +The handler interface is extended to include a mechanism to get a specialization constant. -After the kernel is built, it is no longer possible to set new specialization constants. -A `cl::sycl::experimental::spec_const_error` exception will be thrown if the user attempt change it after the kernel has been built. - -## Build issue caused by Specialization Constants - -The following error class is added: ```cpp namespace cl { namespace sycl { -namespace experimental { -class spec_const_error : public compile_program_error; +class handler { +// ... +public: + // Set a value for the specialization constant represented by `s` + // and return the associated spec_constant. + // Note, this call may require the underlying program to be rebuilt. + template & s> + void set_spec_constant(T cst); + + // Set a value for the specialization constant represented by `s` + // and return the associated spec_constant. + // Note, this call may require the underlying program to be rebuilt. + // Only if compiling with C++17 or above. + template + void set_specialization_constant(typename std::remove_reference_t::type); + +// ... +}; -} // namespace experimental } // namespace sycl } // namespace cl ``` -This error can be thrown if a specialization constant compilation error occurs. +Upon invocation of a `single_task`/`parallel_for`/`parallel_for_work_group` construct, the runtime will build the appropriate kernel if it has never been built for the set of specialization constant passed to the kernel. +The SYCL device compiler and runtime are responsible to make sure that it is valid to build the module in which the invoked kernel is defined using only the provided specialization constants. + +It is illegal to use this interface in conjunction with the `cl::sycl::program` interface. -## OpenCL Interoperability +It must be noted that setting a specialization constant has an underlying cost and that changing a constant value will force the OpenCL runtime to build a new kernel. -In SYCL, specialization constants use typenames to identify them rather than using the SPIR-V/OpenCL numerical identifiers. +## Build issue caused by Specialization Constants -To allow interoperability with OpenCL, uses can use this a special templated type as the SYCL specialization constant identifier to specify the numerical identifier of a specialization constant inside the module: +The following error class is added: ```cpp namespace cl { namespace sycl { -namespace experimental { -template -struct spec_constant_id { - static constexpr unsigned id = NID; -}; +class spec_const_error : public compile_program_error; -} // namespace experimental } // namespace sycl } // namespace cl ``` -The runtime will use the value `NID` provided by the template parameter to set the specialization constant. -If the specified identifier does not exist in the module, a `cl::sycl::experimental::spec_const_error` error is thrown. +This error can be thrown if a specialization constant compilation error occurs. -For example: -```cpp - // Create a specialization constant. - auto my_constant = program.set_spec_constant>(get_value()); -``` -In this call, the runtime will bind the value with the specialization constant with the identifier `42`. +## In kernel access to specialization constant + +Specialization constants are accessed in kernel using the API provided by the `kernel_handler`. diff --git a/spec-constant/sycl_kernel_handler.md b/spec-constant/sycl_kernel_handler.md new file mode 100644 index 0000000..7b9ea90 --- /dev/null +++ b/spec-constant/sycl_kernel_handler.md @@ -0,0 +1,94 @@ +# SYCL kernel_handle + +| | | +| ---------------- | ---------------------------------------| +| Name | kernel_handle | +| Date of creation | 17th Feb 2020 | +| Last updated | 17th Feb 2020 | +| Status | WIP | +| Current revision | 1 | +| Available | N/A | +| Reply-to | Victor Lomuller | +| Original author | Victor Lomuller | +| Contributors | TBD | + +## Overview + +The library implementation of certain device features require the use of a device side handler. +This proposal introduces a `kernel_handler` to provide access to extra device capabilities implementable as a library. + +## Motivation + +The initial proposal of specialization constant forced the user to explicitly get individual `specialization_constant` objects that needed to be propagated in the final program. + +Using a handler, the user has only 1 object to carry to access `specialization_constant` objects thus simplifying the interface. + +For now, the proposal is limited to support specialization constant but can be extended to handle barriers or other functionalities. + +## Revisions + +v1: + + * Initial proposal + +## `sycl::kernel_handler` + +The `sycl::kernel_handler` is a non-user constructible class only passed to user as an argument of the functor passed to `handler::parallel_for` and `handler::parallel_for_work_group`. + +```cpp +namespace sycl { +class kernel_handler { +private: + kernel_handler(); + +public: + + // Return the value associate with the specialization constant id `s`. + // The value returned is either the + template& s> + T get_specialization_constant(); + + template + typename std::remove_reference_t::type get_specialization_constant(); + +}; +} +``` + +## Update `sycl::handler` class definition + +Functor passed to `sycl::handler::single_task`, `sycl::handler::parallel_for` and `sycl::handler::parallel_for_work_group` can take an extra `sycl::kernel_handler` as extra by-value argument. + +Below is an example of invoking a SYCL kernel function with `single_task`: + +```cpp +myQueue.submit([&](handler & cgh) { + cgh.single_task([=] () {}); +}); +``` + +or + +```cpp +myQueue.submit([&](handler & cgh) { + cgh.single_task([=] (sycl::kernel_handler h) {}); +}); +``` + +Below is an example of invoking a SYCL kernel function with `parallel_for`: + +```cpp +myQueue.submit([&](handler & cgh) { + cgh.parallel_for(range<1>(numWorkItems), + [=] (id<1> index) {}); +}); +``` + +or + +```cpp +myQueue.submit([&](handler & cgh) { + cgh.parallel_for(range<1>(numWorkItems), + [=] (id<1> index, sycl::kernel_handler h) {}); +}); +```