diff --git a/extensions/cl_exp_defined_builtin_kernels.asciidoc b/extensions/cl_exp_defined_builtin_kernels.asciidoc
new file mode 100644
index 000000000..4ce4f52a1
--- /dev/null
+++ b/extensions/cl_exp_defined_builtin_kernels.asciidoc
@@ -0,0 +1,937 @@
+// Copyright 2018-2022 The Khronos Group. This work is licensed under a
+// Creative Commons Attribution 4.0 International License; see
+// http://creativecommons.org/licenses/by/4.0/
+
+:data-uri:
+:icons: font
+include::../config/attribs.txt[]
+:source-highlighter: coderay
+:stem:
+
+= cl_exp_defined_builtin_kernels
+
+The purpose of this extension is to provide a standardized set of built-in
+kernels with well-defined semantics useful for accelerating applications
+from various domains.  The extension specification is designed to rapidly
+expand and "live" via addition of new well-defined built-in kernel
+definitions and updating of previously defined ones.
+
+[float]
+== XXX - Not complete yet!!!
+
+
+== Name Strings
+
+`cl_exp_defined_builtin_kernels`
+
+== Contact
+
+TODO
+
+== Contributors
+
+Pekka Jääskeläinen, Intel and Tampere University. +
+Topi Leppänen, Tampere University. +
+Jan Solanti, Tampere University. +
+Ben Ashbaugh, Intel. +
+Henry Linjamäki, Intel. +
+
+== Notice
+
+TODO
+
+== Status
+
+Draft spec, NOT APPROVED!!
+
+== Version
+Built On: {docdate} +
+Version: 0.3.0
+
+== Dependencies
+
+This extension is written against the OpenCL Specification version 3.0.12.
+
+This extension requires OpenCL 1.2 or later.
+
+This extension requires cl_exp_tensor.
+
+== Overview
+
+OpenCL 1.2 specifies a built-in kernel as a kernel that is executed on
+an OpenCL device or custom device by fixed-function hardware or in firmware.
+Applications can query the built-in kernels supported by a device or custom
+device.
+
+Built-in kernels are referred to by a name (a C string) without any
+semantics attached to the functionality. The semantics behind the name
+is completely device specific, typically documented in vendor-specific
+extension specifications.
+
+The goal for this extension is to lower the bar for utilizing hardware
+accelerated functions in drivers by providing a library of
+well-defined built-in kernel with good coverage for common acceleration needs
+and which is designed to easily evolve over time.
+
+The device drivers that implement this extension can freely choose which
+subset of defined built-in-kernels (DBKs) they implement and advertise to the clients. The
+clients can use the DBKs to accelerate their applications by manually
+executing invoking the DBKs. The extension is designed to also support using
+automated task graph lowering tooling later.
+
+=== Background
+
+ASIC-based coarse-grained hardware accelerators are specialized logic meant to
+speed up execution of workloads of interest, or to provide improvements in
+energy-efficiency. Examples of contemporary workloads that are beneficially hardware
+accelerated over software-based implementations include video coding, deep learning,
+cryptography, software-defined radio and graphics rendering.
+
+FPGAs form a special case somewhere between instruction-set architectures and fixed
+function hardware accelerators. While advances in high-level synthesis tools
+have attempted to bridge the programmability gap between GPU and FPGA programming,
+FPGAs are still considered as devices which are challenging to achieve efficient
+implementations with. Due to extensive manual optimization work required for efficient
+implementations of the accelerated functionality, defining FPGA designs as
+a system of "hardware accelerator IPs" is still a widely used "application abstraction".
+FPGAs can be thus seen as a platform that can realize and integrate any
+hardware accelerator implementable with the programmable fabric.
+
+The means to utilize hardware accelerators have typically been
+vendor-specific and abstracted behind domain-specific libraries.
+The overhead with the "bunch of libraries"-approach is seen in the lowest level
+of integration: The libraries utilize a low level library (typically
+vendor-specific) to interface with the actual hardware, and thus does not
+integrate efficiently with other libraries or software-programmable processors
+that might be available on the same chip.
+
+=== Rationale
+
+OpenCL's built-in kernel abstraction allows pushing both hardware
+accelerated and software defined kernels to the same command-queues,
+providing a powerful means for asynchronous execution of heterogeneous
+task graphs on diverse heterogeneous platforms. The ability to invoke hardware
+accelerators while being able to synchronize and optimize data transfers at
+the lowest levels of the driver stack can provide significant latency benefits,
+especially when combined with the command-buffering mechanism.
+
+However, the built-in kernel abstraction works well only when it is widely adopted by
+vendors, and when multiple vendors implement the same definitions. Otherwise
+each vendor specifies and implements their own built-in kernels closely matching their
+own hardware accelerator properties, resulting in lack of cross-vendor
+portability in the API abstraction presented to the upper layers of
+heterogeneous computing software stacks.
+
+This extension standardizes a set of well-defined built-in kernels the
+clients can call from higher level programming stacks built with
+different languages and multiple libraries, possibly mix accelerator
+calls with calls to software kernel commands, and rely on the driver
+stack to optimize the execution (especially the synchronization and
+communication) as a low level heterogeneous task graph.  The
+heterogeneous task graph can be described using multiple
+command-queues and optionally cached using the command buffer
+extension (cl_khr_command_buffer).  It aims to promote the use of
+built-in kernels as a programming model for hardware accelerated
+functionality, to improve cross-vendor portability of hardware
+accelerated computing.
+
+
+== New API Functions
+
+[source,c]
+----
+#define CL_MAX_DBK_PROPERTIES 16
+
+clCreateProgramWithDefinedBuiltInKernels(
+    cl_context           context,
+    cl_uint              num_devices,
+    const cl_device_id*  device_list,
+    cl_uint              num_kernels,
+    const char**         kernel_names,
+    const cl_dbk_id_exp* kernel_ids,
+    const void**         kernel_attributes,
+    cl_int*              device_support_ret,
+    cl_int*              errcode_ret);
+----
+
+== New API Types
+
+[source,c]
+----
+typedef cl_uint       cl_dbk_id_exp;
+typedef cl_properties cl_dbk_properties_exp;
+
+typedef union {
+    cl_char    sc;
+    cl_uchar   uc;
+    cl_short   ss;
+    cl_ushort  us;
+    cl_int     si;
+    cl_uint    ui;
+    cl_long    sl;
+    cl_ulong   ul;
+    cl_half    fh;
+    cl_float   ff;
+    cl_double  fd;
+    void*      raw;
+} cl_tensor_datatype_union_exp;
+
+typedef struct cl_dbk_attributes_matmul_exp {
+    cl_tensor_desc                a;
+    cl_tensor_desc                b;
+    cl_tensor_desc                c;
+    cl_int                        trans_a;
+    cl_int                        trans_b;
+    cl_dbk_properties_exp         kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_matmul_exp;
+
+typedef struct cl_dbk_attributes_gemm_exp {
+    cl_tensor_desc                a;
+    cl_tensor_desc                b;
+    cl_tensor_desc                c_in;
+    cl_tensor_desc                c_out;
+    cl_bool                       trans_a;
+    cl_bool                       trans_b;
+    cl_tensor_datatype_union_exp  alpha;
+    cl_tensor_datatype_union_exp  beta;
+    cl_dbk_properties_exp         kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_gemm_exp;
+
+typedef struct cl_dbk_attributes_leaky_relu_exp {
+   cl_tensor_datatype_union_exp   coefficient;
+   cl_dbk_properties_exp          kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_leaky_relu_exp;
+----
+
+== New API Enums
+
+
+Accepted values to *cl_dbk_id_exp*:
+[source,c]
+----
+CL_DBK_MATMUL_EXP      0x????
+CL_DBK_GEMM_EXP        0x????
+CL_DBK_LEAKY_RELU_EXP  0x????
+----
+
+accepted values to *cl_dbk_properties_exp*:
+
+[source,c]
+----
+CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP  0x????
+CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP   0x????
+----
+
+New error codes:
+
+[source,c]
+----
+CL_DBK_UNSUPPORTED_EXP                0x????
+CL_DBK_UNSUPPORTED_PROPERTY_EXP       0x????
+CL_DBK_INVALID_ATTRIBUTE_EXP          0x????
+CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP   0x????
+----
+
+== Modifications to the OpenCL Specification
+
+(Add the following to section 5.8.1, *Creating Program Objects*) ::
++
+--
+
+To create a program object for a context and to load the information
+related to the defined built-in kernels into that object, call the
+function:
+
+[source,c]
+----
+clCreateProgramWithDefinedBuiltInKernels(
+    cl_context          context,
+    cl_uint             num_devices,
+    const cl_device_id* device_list,
+    cl_uint             num_kernels,
+    const cl_dbk_id*    kernel_ids,
+    const char**        kernel_names,
+    const void**        kernel_attributes,
+    cl_int*             device_errcode_ret,
+    cl_int*             errcode_ret);
+----
+
+* _context_ must be a valid OpenCL context.
+
+* _num_devices_ is the number of elements in _device_list_ and
+  _device_errcode_ret_ lists.
+
+* _device_list_ is a pointer to a list of devices that are in
+  _context_. _device_list_ must be a non-NULL value. The defined built-in
+  kernels are loaded for devices specified in this list.
+
+* _num_kernels_ is the number of elements in _kernel_ids_,
+  _kernel_attributes_, _kernel_names_ret_ and _device_errcode_ret_ lists.
+
+* _kernel_ids_ is the list of defined built-in kernels to
+  be loaded into the program.
+
+* _kernel_names_ is a list of names given for each kernel listed in
+  _kernel_ids_. Each string in the list must be non-NULL and unique.
+
+* _kernel_attributes_ is a list of pointers that point to the
+  respective attribute structure of each defined built-in kernel in
+  the _kernel_ids_ list.  The respective attribute structures for each
+  kernel identifiers are listed in <<appendix-dbk,Appendix TODO>>.
+
+* _device_errcode_ret_ will return an appropriate error code per
+  device. if _device_errcode_ret_ is NULL, no error code is returned.
+
+* _errcode_ret_ will return an appropriate error code. If
+  _errcode_ret_ is NULL, no error code is returned.
+
+The devices associated with the program object will be the list of
+devices specified by _device_list_ or subset of it. The list of
+devices specified by _device_list_ must be devices associated with
+_context_.
+
+*clCreateProgramWithDefinedBuiltInKernels* returns a valid non-zero
+program object and _errcode_ret_ is set to *CL_SUCCESS* if the program
+object is created successfully. The returned program is created for
+the devices that supports the requested built-in kernels indicated by
+*CL_SUCCESS* in the _device_errcode_ret_ list. In case of a failure to
+create program for a device, one of the following errors code is set
+in _device_errcode_ret_ list for the respective device:
+
+* *CL_DBK_UNSUPPORTED_EXP* if the device does not support one of the
+   built-in kernels listed in _kernel_ids_.
+
+* *CL_INVALID_PROPERTY* if a property list for a defined built-in
+   kernel description is invalid.
+
+* *CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP* if a defined built-in kernel
+   does not meet the requested precision.
+
+* *CL_OUT_OF_RESOURCES* if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+// TODO: if _device_errcode_ret_ is NULL should should an error be
+//       returned in _errcode_ret_ if a kernel is not supported in any
+//       device?
+
+If a program object is not created,
+*clCreateProgramWithDefinedBuiltInKernels* returns a NULL value with
+one of the following error codes returned in _errcode_ret_:
+
+* *CL_INVALID_CONTEXT* if _context_ is not a valid context.
+
+* *CL_INVALID_VALUE* if _device_list_ is NULL or _num_devices_ is zero.
+
+* *CL_INVALID_VALUE* if a kernel name is not unique within _kernel_names_.
+
+* *CL_INVALID_VALUE* if there is a NULL value in _kernel_names_.
+
+* *CL_INVALID_DBK_ID_EXP* if any value in the _kernel_ids_ is not a known
+   identifier for a built-in kernel.
+
+* *CL_INVALID_DBK_ATTRIBUTE_EXP* if a kernel attribute structure is
+   invalid for a built-in kernel.
+
+* *CL_DBK_UNSUPPORTED_EXP* if _device_errcode_ret_ is NULL and any
+   device in _device_list_ does not support a defined built-in kernel.
+
+* *CL_DBK_UNSUPPORTED_EXP* if _device_errcode_ret_ is non-NULL and all
+   devices in _device_list_ does not support a defined built-in kernel.
+
+* *CL_DBK_UNSUPPORTED_PROPERTY_EXP* If a kernel does not accept a
+   valid kernel property.
+
+* *CL_INVALID_DEVICE* if any device in _device_list_ is not in the list of
+  devices associated with _context_.
+
+* *CL_OUT_OF_RESOURCES* if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+* *CL_OUT_OF_HOST_MEMORY* if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
+--
+// End (Add the following to section 5.8.1, *Creating Program Objects*)
+
+(Modify section 5.10, *Executing Kernels*) ::
++
+--
+
+(Add following to *clEnqueueNDRangeKernel*) ::
++
+--
+For defined built-in kernels _work_dim_, _global_work_offset_,
+_global_work_size_ and _local_work_size_ parameters are meaningless
+and must be set to zero and NULL, respectively. OpenCL implementations
+decide how they distribute the workloads of the defined built-in
+kernels.
+--
+
+(Add the following to the list of error codes returned by *clEnqueueNDRangeKernel*) ::
++
+--
+
+* *CL_INVALID_GLOBAL_WORK_SIZE* if the _kernel_ is a defined built-in
+   kernel and _global_work_size_ is not NULL.
+
+* *CL_INVALID_GLOBAL_WORK_OFFSET* if the _kernel_ is a defined built-in
+   kernel and _global_work_offset_ is not NULL.
+
+* *CL_INVALID_LOCAL_WORK_SIZE* if the _kernel_ is a defined built-in
+   kernel and _local_work_size_ is not NULL.
+--
+--
+// End (Modify section 5.10, *Executing Kernels*)
+
+
+[[appendix-dbk]]
+=== Add new appendix "Defined Built-in Kernels" to OpenCL API Specification
+
+This chapter describes standard defined built-in kernels (DBK) with
+well-defined semantics. They are loaded into a program using
+*clCreateProgramWithDefinedBuiltinKernels* and the kernels in it are
+launched using *clEnqueueNDRangeKernel* with _work_dim_ set to zero
+and _global_work_offset_, _global_work_size_ and _local_work_size_ set
+to NULL.
+
+The general client-side abstraction of the DBKs is similar to a call
+to a C function of which implementation is hidden. The device driver
+are free to implement a DBK by invoking one or more coarse and fine-grained hardware accelerators combined with
+firmware to implement the semantics as efficiently as possible.
+
+It is the driver's responsibility to handle efficient synchronization and communication
+to the hardware accelerator, the internal accelerator state management and resource sharing
+across multiple OpenCL contexts.
+
+==== Reproducibility
+
+Identical DBKs or the same DBKs executed repeatedly with identical inputs are
+guaranteed to produce identical results, unless otherwise stated in
+the DBK's description, when:
+
+* enqueued to the same device,
+
+* on the same platform,
+
+* on the same vendor with the same driver version and
+
+* CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP property is not set on.
+
+In other cases, the DBKs may produce different results. Two DBKs for a
+device are considered identical if they are created using identical
+kernel identifier, kernel attributes and kernel properties. The result
+difference may occur because of different algorithms being used across
+devices, for example.
+
+DBKs may produce approximated results and the error, respect to
+infinitely precise result, can be optionally controlled by
+CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP when the property name is listed in
+the DBK's description. When the precision is not controlled by the
+application using CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP, the OpenCL
+precision of results are
+
+* chosen by the implementation for floating-point based tasks.
+
+* exact for integer based tasks.
+
+==== Kernel Interface
+
+DBKs operates on tensor objects, created with
+*clCreateBufferWithProperties* using `CL_MEM_TENSOR` property,
+generally in single-static assignment fashion. the Kernel arguments
+used for reading and writing tensors may not reference the same tensor
+object unless otherwise stated in the <<dbk-description-table,DBK descriptions>>.
+
+==== The Defined Built-in Kernels
+
+The list of recognized defined built-in kernels are listed in the
+following <<dbk-description-table,table>>. It is expected to be
+expanded and updated over the versions of this extensions, while
+preserving backwards compatibility.
+
+Each defined built-in kernel entry is organized as follows:
+
+* *Name*: Name of the defined built-in kernel (an enumeration).
+
+* *Kernel attributes*: The kernel attributes required for creating the
+  defined built-in kernel via
+  *clCreateProgramWithDefinedBuiltinKernels*. Attribute values are
+  immutable.
+
+* *Kernel arguments*: The kernel arguments.
+
+* *Description*: The description of the kernel in detail.
+
+* *Attribute validation rules*: Conditions of the kernel attribute for
+  the kernel. Implementation must return CL_DBK_INVALID_ATTRIBUTE_EXP on
+  *clCreateProgramWithDefinedBuiltinKernels* call if any of the conditions
+  are violated.
+
+* *Kernel mode properties*: List of <<dbk-property-table,kernel properties>>
+   (`cl_dbk_properties_exp`) the kernel may accept. The properties can
+   be used to tweak certain implementation details and behaviors in
+   the kernel execution. If a property not listed in the DBK
+   description is fed to *clCreateProgramWithDefinedBuiltinKernels*
+   call, then implementation must return
+   `CL_DBK_UNSUPPORTED_PROPERTY_EXP`.
+
+[[dbk-propery-table]]
+.Table of defined built-in kernel properties
+[cols="2,1,2",stripes=odd]
+|===
+| *DBK Mode Property* | *Property Value* | *Description*
+
+| CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP | float
+
+a| Require that the DBK produces the results which do not deviate more
+than the given amount value of ULPs (units in the last place) respect
+to infnitely precise result.
+
+| CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP | cl_bool
+
+a| Allow results of the kernel to be non-reproducible. This allows
+implementation to switch algorithm of the kernel on each launch for
+possibly better performance.
+// Idea from https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking
+
+|===
+
+
+[[dbk-description-table]]
+.Standard Built-in Kernels and Their Semantics. *The table has been populated with a small set of non-trivial example entries which are subject to change and the list to expand during drafting.*
+|===
+| Name: *CL_DBK_GEMM_EXP*
+| *Kernel Attributes*
+a|
+
+[source,c]
+----
+typedef struct cl_dbk_attributes_gemm_exp {
+    cl_tensor_desc a;
+    cl_tensor_desc b;
+    cl_tensor_desc c_in;
+    cl_tensor_desc c_out;
+    cl_bool trans_a;
+    cl_bool trans_b;
+    cl_tensor_datatype_union_exp alpha;
+    cl_tensor_datatype_union_exp beta;
+    cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_gemm_exp;
+----
+
+* _a_ is a tensor description for input matrix A.
+
+* _b_ is a tensor description for input matrix B.
+
+* _c_in_ is a tensor description for output matrix CIN.
+
+* _c_out_ is a tensor description for output matrix COUT.
+
+* _trans_a_ instruct to transpose the A matrix if the value is CL_TRUE.
+
+* _trans_b_ instruct to transpose the B matrix if the value is CL_TRUE.
+
+* _alpha_ is a value or pointer to value corresponponding to the
+  element type of _c_out_.
+
+* _beta_ is a value or pointer to value corresponponding to the
+  element type of _c_out_.
+
+* _kernel_props_ defined additional kernel properties.
+
+| *Kernel Arguments*
+a|
+. cl_mem: a tensor object for matrix A (read only).
+. cl_mem: a tensor object for matrix B (read only).
+. cl_mem: a tensor object for matrix C_IN (read only).
+. cl_mem: a tensor object for matrix C_OUT (write only).
+
+| *Description* a| Performs (batched) general matrix multiplication:
+
+[stem]
+++++
+bb"COUT"_(b,m,n) = "beta" * bb"CIN"_(b,m,n) + "alpha" * sum_(k)trans(bb"A", "trans_a")_(b,m,k)trans(bb"B", "trans_b") _(b,k,n)
+++++
+
+Where:
+
+[stem]
+++++
+trans(X_(b,i,j), tr) = {(X_(b,j,i), "if tr" = "CL_TRUE"), (X_(b,i,j), "otherwise") :}
+++++
+
+Second degree tensors of shape `(a, b)` are treated as third degree
+tensors of shape `(1, a, b)`.
+
+Operations of the matrix muliplication are performed in the precision
+of the `elementof(COUT)`.
+
+If an overflow occurs in the accumulation of the products, then `R`
+tensor's result will be undefined.
+
+`CIN` and `COUT` tensors may be the same object.
+
+| *Attribute validation rules*
+a|
+
+* `rankof(A) == rankof(B) == rankof(CIN) == rankof(COUT)`.
+* Let `shapeof(A~t~) == (b..., m, k)` and `shapeof(B~t~) = (b..., k,
+  n)` of tensors `A` and `B`, respectively, after possible tranposing.
+  `shapeof(COUT)` must be `(b..., m, n)`.
+* `shapeof(CIN) == shapeof(COUT)`.
+* `elementof(A) == elementof(B)`.
+* `elemkindof(COUT) == elemkindof(A)`.
+* `elementof(COUT) == elementof(A)` or `elementof(A)` is promotable to
+  `elementof(COUT)` without a loss of meaning.
+// E.g. cl_int -> cl_uint: loses meaning of negative values.
+| *Kernel mode properties*
+a|
+This DBK accepts the following kernel properties:
+
+* CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP
+* CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP
+|
+
+| Name: *CL_DBK_MATMUL_EXP*
+| *Kernel Attributes*
+a|
+
+[source,c]
+----
+typedef struct cl_dbk_attributes_matmul_exp {
+    cl_tensor_desc a;
+    cl_tensor_desc b;
+    cl_tensor_desc c;
+    cl_bool trans_a;
+    cl_bool trans_b;
+    cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_matmul_exp;
+----
+
+* _a_ is a tensor description for input matrix A.
+
+* _b_ is a tensor description for input matrix B.
+
+* _c_ is a tensor description for output matrix C.
+
+* _trans_a_ instruct to transpose the A matrix if the value is CL_TRUE.
+
+* _trans_b_ instruct to transpose the B matrix if the value is CL_TRUE.
+
+* _kernel_props_ defined additional kernel properties.
+
+| *Kernel Arguments*
+a|
+. cl_mem: a tensor object for matrix A (read only).
+. cl_mem: a tensor object for matrix B (read only).
+. cl_mem: a tensor object for matrix C (write only).
+
+| *Description* a| Performs (batched) matrix multiplication:
+
+[stem]
+++++
+bb"C"_(b,m,n) = sum_(k)trans(bb"A", "trans_a")_(b,m,k)trans(bb"B", "trans_b") _(b,k,n)
+++++
+
+Where:
+
+[stem]
+++++
+trans(X_(b,i,j), tr) = {(X_(b,j,i), "if tr" = "CL_TRUE"), (X_(b,i,j), "otherwise") :}
+++++
+
+Second degree tensors of shape `(a, b)` are treated as third degree
+tensors of shape `(1, a, b)`.
+
+Operations of the matrix muliplication are performed in the precision
+of the `elementof(COUT)`.
+
+If an overflow occurs in the accumulation of the products, then `R`
+tensor's result will be undefined.
+
+| *Attribute validation rules*
+a|
+
+* `rankof(A) == rankof(B) == rankof\(C)`.
+* Let `shapeof(A~t~) == (b..., m, k)` and `shapeof(B~t~) = (b..., k,
+  n)` of tensors `A` and `B`, respectively, after possible tranposing.
+  `shapeof\(C)` must be `(b..., m, n)`.
+* `elementof(A) == elementof(B)`.
+* `elemkindof\(C) == elemkindof(A)`.
+* `elementof\(C) == elementof(A)` or `elementof(A)` is promotable to
+  `elementof\(C)` without a loss of meaning.
+// E.g. cl_int -> cl_uint: loses meaning of negative values.
+| *Kernel mode properties*
+a|
+This DBK accepts the following kernel properties:
+
+* CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP
+|
+
+| Name: *CL_DBK_LEAKY_RELU_DBK*
+| *Kernel Attributes*
+a|
+
+[source,c]
+----
+typedef struct cl_dbk_attributes_leaky_relu_exp {
+   cl_tensor_datatype_union_exp coefficient;
+   cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_leaky_relu_exp;
+----
+* _alpha_ is a coefficient of leakage, a positive value.
+| *Kernel arguments*
+a|
+. cl_mem: a tensor object IN for input values.
+. cl_mem: a tensor object OUT for output value.
+| *Description*
+a|
+
+This element-wise built-in kernel performs a leaky ReLU operation as followed:
+
+[stem]
+++++
+"OUT"_(i) = {( -"alpha" * "IN"_(i), "if IN"_(i) \lt 0), ("IN"_(i), " otherwise") :}
+++++
+
+If target device does not support denormals, then the `alpha` value is
+flushed to zero before the operation is applied. This DBK accepts
+tensors of arbitrary rank.
+
+The `IN` and `OUT` tensors may be the same object.
+
+| *Kernel mode properties*
+| This DBK accepts the following kernel properties:
+
+* CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP
+* CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP
+
+| *Attribute validation rules*
+a|
+* `shapeof(in) == shapeof(out)`.
+* `elementof(in) == elementof(out)`.
+* `coefficient` must be a positive, finite value.
+|===
+
+==== Launching DBKs from the Device Side
+
+DBKs are primarily meant to be launched as kernel commands via
+host-side command queues.  Optionally, they can be callable from
+device-side via `enqueue_kernel`:
+
+TBC. This probably needs device-side function corresponding to
+*clCreateProgramWithDefinedBuiltinKernels*.
+
+== Sample Code
+
+[source,c]
+----
+constexpr size_t b = 64, m = 100, n = 200, k = 50;
+cl_int err;
+
+std::vector<float> lhs_data = ...;
+std::vector<float> rhs_data = ...;
+std::vector<float> bias_data = ...;
+std::vector<float> out_data(b * m * n);
+
+cl_tensor_layout_blas_exp row_major;
+row_major.leading_dims[0] = 2,
+row_major.leading_dims[1] = 1,
+
+cl_tensor_desc_exp lhs_desc;
+lhs_desc.rank = 3;
+lhs_desc.dtype = CL_TENSOR_FP32_EXP;
+lhs_desc.properties[0] = 0;
+lhs_desc.shape[0] = b;
+lhs_desc.shape[1] = m;
+lhs_desc.shape[2] = k;
+lhs_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+lhs_desc.layout = &row_major;
+
+cl_tensor_desc_exp rhs_desc;
+rhs_desc.rank = 3;
+rhs_desc.dtype = CL_TENSOR_FP32_EXP;
+rhs_desc.properties[0] = 0;
+rhs_desc.shape[0] = b;
+rhs_desc.shape[1] = k;
+rhs_desc.shape[2] = n;
+rhs_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+rhs_desc.layout = &row_major;
+
+cl_tensor_desc_exp out_desc;
+out_desc.rank = 3;
+out_desc.dtype = CL_TENSOR_FP32_EXP;
+out_desc.properties[0] = 0;
+out_desc.shape[0] = b;
+out_desc.shape[1] = m;
+out_desc.shape[2] = n;
+out_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+out_desc.layout = &row_major;
+
+cl_mem lhs_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, lhs_desc, 0},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 0, lhs_data.data(), &err);
+cl_mem rhs_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, rhs_desc, 0},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 0, rhs_data.data(), &err);
+cl_mem bias_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, out_desc, 0},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 0, rhs_data.data(), &err);
+cl_mem out_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, out_desc, 0},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_WRITE, 0, out_data.data(), &err);
+
+cl_tensor_datatype_union_exp alpha, beta, relu_coeff;
+alpha.sf = 2.0f;
+beta.sf = -1.0f;
+relu_coeff.sf = 0.01f;
+
+cl_dkb_attributes_gemm_exp gemm_attrs = {
+  lhs_desc, rhs_desc, out_desc, out_desc, 0, 0, alpha, beta, {}
+};
+gemm_attrs.kernel_props[0] = CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP;
+gemm_attrs.kernel_props[1] = 100; // in ILPs
+gemm_attrs.kernel_props[2] = 0;
+
+cl_dkb_attributes_leaky_relu_exp relu_attrs = {
+  out_desc, out_desc, relu_coeffs, {0}
+};
+
+cl_device_id target_devices[2] = {dev1, dev2};
+cl_int device_errcodes[2];
+auto prog = clCreateProgramWithDefinedBuiltInKernels(
+  ctx, 2, target_devices, 2,
+  {CL_DBK_GEMM_EXP, CL_DBK_LEAKY_RELU_EXP}, {"my_gemm", "my_relu"},
+  {&gemm_attrs, &relu_attrs}, &device_errcodes, &err);
+
+std::vector<cl_device_id> supported_devs;
+for (unsigned i = 0; i < 2; i++) {
+  if (device_errcodes[i] == CL_SUCCESS) {
+    supported_devs.push_back(target_devices[i]);
+  } else {
+     // Handle errors. Possible error cases (non-exhaustive):
+     //
+     // * CL_DBK_UNSUPPORTED_EXP: The DBK is not supported on the device.
+     // * CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP The DBK implementation does not
+     //   meet the requested precision.
+  }
+}
+
+err = clBuildProgram(
+  prog, supported_devs.size(), supported_devs.data(), "", nullptr, nullptr);
+
+auto gemm_kernel = clCreateKernel(prog, "my_gemm", &err);
+clSetKernelArg(gemm_kernel, 0, sizeof(cl_mem), &lhs_tensor);
+clSetKernelArg(gemm_kernel, 1, sizeof(cl_mem), &rhs_tensor);
+clSetKernelArg(gemm_kernel, 2, sizeof(cl_mem), &bias_tensor);
+clSetKernelArg(gemm_kernel, 3, sizeof(cl_mem), &out_tensor);
+
+auto relu_kernel = clCreateKernel(prog, "my_relu", &err);
+clSetKernelArg(relu_kernel, 0, sizeof(cl_mem), &out_tensor);
+clSetKernelArg(relu_kernel, 1, sizeof(cl_mem), &out_tensor);
+
+cmq_q = /* Create an in-order command queue. */;
+
+clEnqueueNDRangeKernel(
+  cmd_q, 0, nullptr, nullptr, nullptr, gemm_kernel, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, 0, nullptr, nullptr, nullptr, relu_kernel, 0, nullptr, nullptr);
+clEnqueueMapBuffer(
+  cmd_q, out_tensor, CL_TRUE, CL_MAP_READ, 0, b * m * n, 0, nullptr, nullptr);
+----
+
+=== Open questions
+
+. Should we enable launching DBKs from the device side without requiring device-side enqueue? The main problem is those with NDRange as they are not simple single-WI helper functions.
++
+--
+*UNRESOLVED*
+
+--
+
+. Should the NDRange be used at all in DBKs? It feels sort of unnatural as typically the NDRange is used to imply SPMD parallelism while the hardware/firmware is free to choose whatever parallelization strategy to implement the function. On the other hand, similar applies to software kernel launches as the NDRange-launched work-items can be executed serially if adhering to barrier semantics.
++
+--
+*RESOLVED*. Decided to go forward without NDRange (and global offset
+ as consequence), as there are currently no known uses for the
+ NDRange, and let OpenCL implementations decide the parallelization
+ strategy.
+
+--
+
+. Different accelerators prefer different channel orders (NHWC vs. NCHW...) for the processed data. Should the channel order be passed as a DBK argument (like in the example GEMM's row/column order) or is it better to have different DBK variations for each?
++
+--
+*RESOLVED*. The memory layout information is a property of the tensors so
+ there is no need for DBK arguments for the layout or DBK variants.
+
+--
+
+. How to denote tensors' memory layout preference? Some of the DBKs are more efficient on a given device as they map more naturally to the underlying HW accelerator, but the slower variations (for example, with unoptimal channel order in NN accelerators) might be still beneficially accelerated.
++
+--
+*UNRESOLVED*.
+
+--
+
+. Since the defined built-in kernel concept is basically just a C-like API inside another API, should it be made more generic and thus directly usable for SYCL and Vulkan as well?
++
+--
+*UNRESOLVED*
+
+--
+
+. What other DBK mode properties we should have? Here are some ideas:
+** Perform accumulation with saturation.
+** Finite math only
+** Flush denormals to zero.
+
++
+--
+*UNRESOLVED*
+--
+
+. Should we reuse (and remove "deprecation" status on) clEnqueueTask
+for launching DBKs as DBKs don't make use of global offset and size
+and local size parameters?
++
+--
+*UNRESOLVED*
+--
+
+== Version History
+
+[cols="5,10,15,40",options="header",grid="rows"]
+|====
+| *Version* | *Date*     | *Author* | *Description*
+| 0.1.0     | 2022-12-13 |
+Pekka Jääskeläinen +
+Ben Ashbaugh a|
+First formulation as an extension specification like proposed by Ben Ashbaugh.
+
+| 0.2.0     | 2023-11-23 |
+Henry Linjamäki +
+Pekka Jääskeläinen +
+Ben Ashbaugh
+a|
+Add APIs for defined built-in kernel (DBK) creation. Model DBKs on
+tensor type. Add sample code.
+
+| 0.3.0     | 2024-8-20  |
+Henry Linjamäki +
+Pekka Jääskeläinen +
+Freddie Witherden a|
+* Rework document structure match to the cl_exp_extension_template.
+* Reflect changes of the `cl_exp_tensor` extension here.
+* Add "Kernel Interface" section into the DBK Appendix.
+* Add GEMM DBK.
+* Change DBK creation interface.
+
+| 0.3.1     | 2024-8-22  |
+Henry Linjamäki +
+Pekka Jääskekäinen +
+RABijl (@GitHub) a|
+* Rename extension name from 'khr' to 'exp'.
+* Resolve two open questions.
+* Small fixes.
+|====
diff --git a/extensions/cl_exp_defined_builtin_kernels.html b/extensions/cl_exp_defined_builtin_kernels.html
new file mode 100644
index 000000000..49807f978
--- /dev/null
+++ b/extensions/cl_exp_defined_builtin_kernels.html
@@ -0,0 +1,1936 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 2.0.16">
+<title>cl_exp_defined_builtin_kernels</title>
+<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
+<style>
+/*! Asciidoctor default stylesheet | MIT License | https://asciidoctor.org */
+/* Uncomment the following line when using as a custom stylesheet */
+/* @import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700"; */
+html{font-family:sans-serif;-webkit-text-size-adjust:100%}
+a{background:none}
+a:focus{outline:thin dotted}
+a:active,a:hover{outline:0}
+h1{font-size:2em;margin:.67em 0}
+b,strong{font-weight:bold}
+abbr{font-size:.9em}
+abbr[title]{cursor:help;border-bottom:1px dotted #dddddf;text-decoration:none}
+dfn{font-style:italic}
+hr{height:0}
+mark{background:#ff0;color:#000}
+code,kbd,pre,samp{font-family:monospace;font-size:1em}
+pre{white-space:pre-wrap}
+q{quotes:"\201C" "\201D" "\2018" "\2019"}
+small{font-size:80%}
+sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
+sup{top:-.5em}
+sub{bottom:-.25em}
+img{border:0}
+svg:not(:root){overflow:hidden}
+figure{margin:0}
+audio,video{display:inline-block}
+audio:not([controls]){display:none;height:0}
+fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
+legend{border:0;padding:0}
+button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
+button,input{line-height:normal}
+button,select{text-transform:none}
+button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}
+button[disabled],html input[disabled]{cursor:default}
+input[type=checkbox],input[type=radio]{padding:0}
+button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
+textarea{overflow:auto;vertical-align:top}
+table{border-collapse:collapse;border-spacing:0}
+*,::before,::after{box-sizing:border-box}
+html,body{font-size:100%}
+body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Noto Serif","DejaVu Serif",serif;line-height:1;position:relative;cursor:auto;-moz-tab-size:4;-o-tab-size:4;tab-size:4;word-wrap:anywhere;-moz-osx-font-smoothing:grayscale;-webkit-font-smoothing:antialiased}
+a:hover{cursor:pointer}
+img,object,embed{max-width:100%;height:auto}
+object,embed{height:100%}
+img{-ms-interpolation-mode:bicubic}
+.left{float:left!important}
+.right{float:right!important}
+.text-left{text-align:left!important}
+.text-right{text-align:right!important}
+.text-center{text-align:center!important}
+.text-justify{text-align:justify!important}
+.hide{display:none}
+img,object,svg{display:inline-block;vertical-align:middle}
+textarea{height:auto;min-height:50px}
+select{width:100%}
+.subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#7a2518;font-weight:400;margin-top:0;margin-bottom:.25em}
+div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0}
+a{color:#2156a5;text-decoration:underline;line-height:inherit}
+a:hover,a:focus{color:#1d4b8f}
+a img{border:0}
+p{line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
+p aside{font-size:.875em;line-height:1.35;font-style:italic}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Open Sans","DejaVu Sans",sans-serif;font-weight:300;font-style:normal;color:#ba3925;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
+h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
+h1{font-size:2.125em}
+h2{font-size:1.6875em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
+h4,h5{font-size:1.125em}
+h6{font-size:1em}
+hr{border:solid #dddddf;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em}
+em,i{font-style:italic;line-height:inherit}
+strong,b{font-weight:bold;line-height:inherit}
+small{font-size:60%;line-height:inherit}
+code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;color:rgba(0,0,0,.9)}
+ul,ol,dl{line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
+ul,ol{margin-left:1.5em}
+ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0}
+ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
+ul.square{list-style-type:square}
+ul.circle{list-style-type:circle}
+ul.disc{list-style-type:disc}
+ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
+dl dt{margin-bottom:.3125em;font-weight:bold}
+dl dd{margin-bottom:1.25em}
+blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd}
+blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
+@media screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
+h1{font-size:2.75em}
+h2{font-size:2.3125em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
+h4{font-size:1.4375em}}
+table{background:#fff;margin-bottom:1.25em;border:1px solid #dedede;word-wrap:normal}
+table thead,table tfoot{background:#f7f8f7}
+table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
+table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)}
+table tr.even,table tr.alt{background:#f8f8f7}
+table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{line-height:1.6}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
+h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400}
+.center{margin-left:auto;margin-right:auto}
+.stretch{width:100%}
+.clearfix::before,.clearfix::after,.float-group::before,.float-group::after{content:" ";display:table}
+.clearfix::after,.float-group::after{clear:both}
+:not(pre).nobreak{word-wrap:normal}
+:not(pre).nowrap{white-space:nowrap}
+:not(pre).pre-wrap{white-space:pre-wrap}
+:not(pre):not([class^=L])>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background:#f7f7f8;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}
+pre{color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;line-height:1.45;text-rendering:optimizeSpeed}
+pre code,pre pre{color:inherit;font-size:inherit;line-height:inherit}
+pre>code{display:block}
+pre.nowrap,pre.nowrap pre{white-space:pre;word-wrap:normal}
+em em{font-style:normal}
+strong strong{font-weight:400}
+.keyseq{color:rgba(51,51,51,.8)}
+kbd{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;display:inline-block;color:rgba(0,0,0,.8);font-size:.65em;line-height:1.45;background:#f7f7f7;border:1px solid #ccc;border-radius:3px;box-shadow:0 1px 0 rgba(0,0,0,.2),inset 0 0 0 .1em #fff;margin:0 .15em;padding:.2em .5em;vertical-align:middle;position:relative;top:-.1em;white-space:nowrap}
+.keyseq kbd:first-child{margin-left:0}
+.keyseq kbd:last-child{margin-right:0}
+.menuseq,.menuref{color:#000}
+.menuseq b:not(.caret),.menuref{font-weight:inherit}
+.menuseq{word-spacing:-.02em}
+.menuseq b.caret{font-size:1.25em;line-height:.8}
+.menuseq i.caret{font-weight:bold;text-align:center;width:.45em}
+b.button::before,b.button::after{position:relative;top:-1px;font-weight:400}
+b.button::before{content:"[";padding:0 3px 0 2px}
+b.button::after{content:"]";padding:0 2px 0 3px}
+p a>code:hover{color:rgba(0,0,0,.9)}
+#header,#content,#footnotes,#footer{width:100%;margin:0 auto;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
+#header::before,#header::after,#content::before,#content::after,#footnotes::before,#footnotes::after,#footer::before,#footer::after{content:" ";display:table}
+#header::after,#content::after,#footnotes::after,#footer::after{clear:both}
+#content{margin-top:1.25em}
+#content::before{content:none}
+#header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0}
+#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #dddddf}
+#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #dddddf;padding-bottom:8px}
+#header .details{border-bottom:1px solid #dddddf;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:flex;flex-flow:row wrap}
+#header .details span:first-child{margin-left:-.125em}
+#header .details span.email a{color:rgba(0,0,0,.85)}
+#header .details br{display:none}
+#header .details br+span::before{content:"\00a0\2013\00a0"}
+#header .details br+span.author::before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
+#header .details br+span#revremark::before{content:"\00a0|\00a0"}
+#header #revnumber{text-transform:capitalize}
+#header #revnumber::after{content:"\00a0"}
+#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #dddddf;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
+#toc{border-bottom:1px solid #e7e7e9;padding-bottom:.5em}
+#toc>ul{margin-left:.125em}
+#toc ul.sectlevel0>li>a{font-style:italic}
+#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
+#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
+#toc li{line-height:1.3334;margin-top:.3334em}
+#toc a{text-decoration:none}
+#toc a:active{text-decoration:underline}
+#toctitle{color:#7a2518;font-size:1.2em}
+@media screen and (min-width:768px){#toctitle{font-size:1.375em}
+body.toc2{padding-left:15em;padding-right:0}
+#toc.toc2{margin-top:0!important;background:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #e7e7e9;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto}
+#toc.toc2 #toctitle{margin-top:0;margin-bottom:.8rem;font-size:1.2em}
+#toc.toc2>ul{font-size:.9em;margin-bottom:0}
+#toc.toc2 ul ul{margin-left:0;padding-left:1em}
+#toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
+body.toc2.toc-right{padding-left:0;padding-right:15em}
+body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #e7e7e9;left:auto;right:0}}
+@media screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
+#toc.toc2{width:20em}
+#toc.toc2 #toctitle{font-size:1.375em}
+#toc.toc2>ul{font-size:.95em}
+#toc.toc2 ul ul{padding-left:1.25em}
+body.toc2.toc-right{padding-left:0;padding-right:20em}}
+#content #toc{border:1px solid #e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;border-radius:4px}
+#content #toc>:first-child{margin-top:0}
+#content #toc>:last-child{margin-bottom:0}
+#footer{max-width:none;background:rgba(0,0,0,.8);padding:1.25em}
+#footer-text{color:hsla(0,0%,100%,.8);line-height:1.44}
+#content{margin-bottom:.625em}
+.sect1{padding-bottom:.625em}
+@media screen and (min-width:768px){#content{margin-bottom:1.25em}
+.sect1{padding-bottom:1.25em}}
+.sect1:last-child{padding-bottom:0}
+.sect1+.sect1{border-top:1px solid #e7e7e9}
+#content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
+#content h1>a.anchor::before,h2>a.anchor::before,h3>a.anchor::before,#toctitle>a.anchor::before,.sidebarblock>.content>.title>a.anchor::before,h4>a.anchor::before,h5>a.anchor::before,h6>a.anchor::before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
+#content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
+#content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#ba3925;text-decoration:none}
+#content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
+details,.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
+details{margin-left:1.25rem}
+details>summary{cursor:pointer;display:block;position:relative;line-height:1.6;margin-bottom:.625rem;-webkit-tap-highlight-color:transparent}
+details>summary::before{content:"";border:solid transparent;border-left:solid;border-width:.3em 0 .3em .5em;position:absolute;top:.5em;left:-1.25rem;transform:translateX(15%)}
+details[open]>summary::before{border:solid transparent;border-top:solid;border-width:.5em .3em 0;transform:translateY(15%)}
+details>summary::after{content:"";width:1.25rem;height:1em;position:absolute;top:.3em;left:-1.25rem}
+.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Noto Serif","DejaVu Serif",serif;font-size:1rem;font-style:italic}
+table.tableblock.fit-content>caption.title{white-space:nowrap;width:0}
+.paragraph.lead>p,#preamble>.sectionbody>[class=paragraph]:first-of-type p{font-size:1.21875em;line-height:1.6;color:rgba(0,0,0,.85)}
+.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
+.admonitionblock>table td.icon{text-align:center;width:80px}
+.admonitionblock>table td.icon img{max-width:none}
+.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
+.admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #dddddf;color:rgba(0,0,0,.6);word-wrap:anywhere}
+.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
+.exampleblock>.content{border:1px solid #e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;border-radius:4px}
+.exampleblock>.content>:first-child{margin-top:0}
+.exampleblock>.content>:last-child{margin-bottom:0}
+.sidebarblock{border:1px solid #dbdbd6;margin-bottom:1.25em;padding:1.25em;background:#f3f3f2;border-radius:4px}
+.sidebarblock>:first-child{margin-top:0}
+.sidebarblock>:last-child{margin-bottom:0}
+.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
+.exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0}
+.literalblock pre,.listingblock>.content>pre{border-radius:4px;overflow-x:auto;padding:1em;font-size:.8125em}
+@media screen and (min-width:768px){.literalblock pre,.listingblock>.content>pre{font-size:.90625em}}
+@media screen and (min-width:1280px){.literalblock pre,.listingblock>.content>pre{font-size:1em}}
+.literalblock pre,.listingblock>.content>pre:not(.highlight),.listingblock>.content>pre[class=highlight],.listingblock>.content>pre[class^="highlight "]{background:#f7f7f8}
+.literalblock.output pre{color:#f7f7f8;background:rgba(0,0,0,.9)}
+.listingblock>.content{position:relative}
+.listingblock code[data-lang]::before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:inherit;opacity:.5}
+.listingblock:hover code[data-lang]::before{display:block}
+.listingblock.terminal pre .command::before{content:attr(data-prompt);padding-right:.5em;color:inherit;opacity:.5}
+.listingblock.terminal pre .command:not([data-prompt])::before{content:"$"}
+.listingblock pre.highlightjs{padding:0}
+.listingblock pre.highlightjs>code{padding:1em;border-radius:4px}
+.listingblock pre.prettyprint{border-width:0}
+.prettyprint{background:#f7f7f8}
+pre.prettyprint .linenums{line-height:1.45;margin-left:2em}
+pre.prettyprint li{background:none;list-style-type:inherit;padding-left:0}
+pre.prettyprint li code[data-lang]::before{opacity:1}
+pre.prettyprint li:not(:first-child) code[data-lang]::before{display:none}
+table.linenotable{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.linenotable td[class]{color:inherit;vertical-align:top;padding:0;line-height:inherit;white-space:normal}
+table.linenotable td.code{padding-left:.75em}
+table.linenotable td.linenos{border-right:1px solid;opacity:.35;padding-right:.5em}
+pre.pygments .lineno{border-right:1px solid;opacity:.35;display:inline-block;margin-right:.75em}
+pre.pygments .lineno::before{content:"";margin-right:-.125em}
+.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
+.quoteblock:not(.excerpt)>.title{margin-left:-1.5em;margin-bottom:.75em}
+.quoteblock blockquote,.quoteblock p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
+.quoteblock blockquote{margin:0;padding:0;border:0}
+.quoteblock blockquote::before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)}
+.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
+.quoteblock .attribution{margin-top:.75em;margin-right:.5ex;text-align:right}
+.verseblock{margin:0 1em 1.25em}
+.verseblock pre{font-family:"Open Sans","DejaVu Sans",sans-serif;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
+.verseblock pre strong{font-weight:400}
+.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
+.quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic}
+.quoteblock .attribution br,.verseblock .attribution br{display:none}
+.quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.025em;color:rgba(0,0,0,.6)}
+.quoteblock.abstract blockquote::before,.quoteblock.excerpt blockquote::before,.quoteblock .quoteblock blockquote::before{display:none}
+.quoteblock.abstract blockquote,.quoteblock.abstract p,.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{line-height:1.6;word-spacing:0}
+.quoteblock.abstract{margin:0 1em 1.25em;display:block}
+.quoteblock.abstract>.title{margin:0 0 .375em;font-size:1.15em;text-align:center}
+.quoteblock.excerpt>blockquote,.quoteblock .quoteblock{padding:0 0 .25em 1em;border-left:.25em solid #dddddf}
+.quoteblock.excerpt,.quoteblock .quoteblock{margin-left:0}
+.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{color:inherit;font-size:1.0625rem}
+.quoteblock.excerpt .attribution,.quoteblock .quoteblock .attribution{color:inherit;font-size:.85rem;text-align:left;margin-right:0}
+p.tableblock:last-child{margin-bottom:0}
+td.tableblock>.content{margin-bottom:1.25em;word-wrap:anywhere}
+td.tableblock>.content>:last-child{margin-bottom:-1.25em}
+table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
+table.grid-all>*>tr>*{border-width:1px}
+table.grid-cols>*>tr>*{border-width:0 1px}
+table.grid-rows>*>tr>*{border-width:1px 0}
+table.frame-all{border-width:1px}
+table.frame-ends{border-width:1px 0}
+table.frame-sides{border-width:0 1px}
+table.frame-none>colgroup+*>:first-child>*,table.frame-sides>colgroup+*>:first-child>*{border-top-width:0}
+table.frame-none>:last-child>:last-child>*,table.frame-sides>:last-child>:last-child>*{border-bottom-width:0}
+table.frame-none>*>tr>:first-child,table.frame-ends>*>tr>:first-child{border-left-width:0}
+table.frame-none>*>tr>:last-child,table.frame-ends>*>tr>:last-child{border-right-width:0}
+table.stripes-all tr,table.stripes-odd tr:nth-of-type(odd),table.stripes-even tr:nth-of-type(even),table.stripes-hover tr:hover{background:#f8f8f7}
+th.halign-left,td.halign-left{text-align:left}
+th.halign-right,td.halign-right{text-align:right}
+th.halign-center,td.halign-center{text-align:center}
+th.valign-top,td.valign-top{vertical-align:top}
+th.valign-bottom,td.valign-bottom{vertical-align:bottom}
+th.valign-middle,td.valign-middle{vertical-align:middle}
+table thead th,table tfoot th{font-weight:bold}
+tbody tr th{background:#f7f8f7}
+tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold}
+p.tableblock>code:only-child{background:none;padding:0}
+p.tableblock{font-size:1em}
+ol{margin-left:1.75em}
+ul li ol{margin-left:1.5em}
+dl dd{margin-left:1.125em}
+dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
+ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em}
+ul.checklist,ul.none,ol.none,ul.no-bullet,ol.no-bullet,ol.unnumbered,ul.unstyled,ol.unstyled{list-style-type:none}
+ul.no-bullet,ol.no-bullet,ol.unnumbered{margin-left:.625em}
+ul.unstyled,ol.unstyled{margin-left:0}
+ul.checklist>li>p:first-child{margin-left:-1em}
+ul.checklist>li>p:first-child>.fa-square-o:first-child,ul.checklist>li>p:first-child>.fa-check-square-o:first-child{width:1.25em;font-size:.8em;position:relative;bottom:.125em}
+ul.checklist>li>p:first-child>input[type=checkbox]:first-child{margin-right:.25em}
+ul.inline{display:flex;flex-flow:row wrap;list-style:none;margin:0 0 .625em -1.25em}
+ul.inline>li{margin-left:1.25em}
+.unstyled dl dt{font-weight:400;font-style:normal}
+ol.arabic{list-style-type:decimal}
+ol.decimal{list-style-type:decimal-leading-zero}
+ol.loweralpha{list-style-type:lower-alpha}
+ol.upperalpha{list-style-type:upper-alpha}
+ol.lowerroman{list-style-type:lower-roman}
+ol.upperroman{list-style-type:upper-roman}
+ol.lowergreek{list-style-type:lower-greek}
+.hdlist>table,.colist>table{border:0;background:none}
+.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
+td.hdlist1,td.hdlist2{vertical-align:top;padding:0 .625em}
+td.hdlist1{font-weight:bold;padding-bottom:1.25em}
+td.hdlist2{word-wrap:anywhere}
+.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
+.colist td:not([class]):first-child{padding:.4em .75em 0;line-height:1;vertical-align:top}
+.colist td:not([class]):first-child img{max-width:none}
+.colist td:not([class]):last-child{padding:.25em 0}
+.thumb,.th{line-height:0;display:inline-block;border:4px solid #fff;box-shadow:0 0 0 1px #ddd}
+.imageblock.left{margin:.25em .625em 1.25em 0}
+.imageblock.right{margin:.25em 0 1.25em .625em}
+.imageblock>.title{margin-bottom:0}
+.imageblock.thumb,.imageblock.th{border-width:6px}
+.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
+.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
+.image.left{margin-right:.625em}
+.image.right{margin-left:.625em}
+a.image{text-decoration:none;display:inline-block}
+a.image object{pointer-events:none}
+sup.footnote,sup.footnoteref{font-size:.875em;position:static;vertical-align:super}
+sup.footnote a,sup.footnoteref a{text-decoration:none}
+sup.footnote a:active,sup.footnoteref a:active{text-decoration:underline}
+#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
+#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em;border-width:1px 0 0}
+#footnotes .footnote{padding:0 .375em 0 .225em;line-height:1.3334;font-size:.875em;margin-left:1.2em;margin-bottom:.2em}
+#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none;margin-left:-1.05em}
+#footnotes .footnote:last-of-type{margin-bottom:0}
+#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
+.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
+.gist .file-data>table td.line-data{width:99%}
+div.unbreakable{page-break-inside:avoid}
+.big{font-size:larger}
+.small{font-size:smaller}
+.underline{text-decoration:underline}
+.overline{text-decoration:overline}
+.line-through{text-decoration:line-through}
+.aqua{color:#00bfbf}
+.aqua-background{background:#00fafa}
+.black{color:#000}
+.black-background{background:#000}
+.blue{color:#0000bf}
+.blue-background{background:#0000fa}
+.fuchsia{color:#bf00bf}
+.fuchsia-background{background:#fa00fa}
+.gray{color:#606060}
+.gray-background{background:#7d7d7d}
+.green{color:#006000}
+.green-background{background:#007d00}
+.lime{color:#00bf00}
+.lime-background{background:#00fa00}
+.maroon{color:#600000}
+.maroon-background{background:#7d0000}
+.navy{color:#000060}
+.navy-background{background:#00007d}
+.olive{color:#606000}
+.olive-background{background:#7d7d00}
+.purple{color:#600060}
+.purple-background{background:#7d007d}
+.red{color:#bf0000}
+.red-background{background:#fa0000}
+.silver{color:#909090}
+.silver-background{background:#bcbcbc}
+.teal{color:#006060}
+.teal-background{background:#007d7d}
+.white{color:#bfbfbf}
+.white-background{background:#fafafa}
+.yellow{color:#bfbf00}
+.yellow-background{background:#fafa00}
+span.icon>.fa{cursor:default}
+a span.icon>.fa{cursor:inherit}
+.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
+.admonitionblock td.icon .icon-note::before{content:"\f05a";color:#19407c}
+.admonitionblock td.icon .icon-tip::before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
+.admonitionblock td.icon .icon-warning::before{content:"\f071";color:#bf6900}
+.admonitionblock td.icon .icon-caution::before{content:"\f06d";color:#bf3400}
+.admonitionblock td.icon .icon-important::before{content:"\f06a";color:#bf0000}
+.conum[data-value]{display:inline-block;color:#fff!important;background:rgba(0,0,0,.8);border-radius:50%;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
+.conum[data-value] *{color:#fff!important}
+.conum[data-value]+b{display:none}
+.conum[data-value]::after{content:attr(data-value)}
+pre .conum[data-value]{position:relative;top:-.125em}
+b.conum *{color:inherit!important}
+.conum:not([data-value]):empty{display:none}
+dt,th.tableblock,td.content,div.footnote{text-rendering:optimizeLegibility}
+h1,h2,p,td.content,span.alt,summary{letter-spacing:-.01em}
+p strong,td.content strong,div.footnote strong{letter-spacing:-.005em}
+p,blockquote,dt,td.content,span.alt,summary{font-size:1.0625rem}
+p{margin-bottom:1.25rem}
+.sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em}
+.exampleblock>.content{background:#fffef7;border-color:#e0e0dc;box-shadow:0 1px 4px #e0e0dc}
+.print-only{display:none!important}
+@page{margin:1.25cm .75cm}
+@media print{*{box-shadow:none!important;text-shadow:none!important}
+html{font-size:80%}
+a{color:inherit!important;text-decoration:underline!important}
+a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
+a[href^="http:"]:not(.bare)::after,a[href^="https:"]:not(.bare)::after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
+abbr[title]{border-bottom:1px dotted}
+abbr[title]::after{content:" (" attr(title) ")"}
+pre,blockquote,tr,img,object,svg{page-break-inside:avoid}
+thead{display:table-header-group}
+svg{max-width:100%}
+p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
+h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
+#header,#content,#footnotes,#footer{max-width:none}
+#toc,.sidebarblock,.exampleblock>.content{background:none!important}
+#toc{border-bottom:1px solid #dddddf!important;padding-bottom:0!important}
+body.book #header{text-align:center}
+body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em}
+body.book #header .details{border:0!important;display:block;padding:0!important}
+body.book #header .details span:first-child{margin-left:0!important}
+body.book #header .details br{display:block}
+body.book #header .details br+span::before{content:none!important}
+body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
+body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always}
+.listingblock code[data-lang]::before{display:block}
+#footer{padding:0 .9375em}
+.hide-on-print{display:none!important}
+.print-only{display:block!important}
+.hide-for-print{display:none!important}
+.show-for-print{display:inherit!important}}
+@media amzn-kf8,print{#header>h1:first-child{margin-top:1.25rem}
+.sect1{padding:0!important}
+.sect1+.sect1{border:0}
+#footer{background:none}
+#footer-text{color:rgba(0,0,0,.6);font-size:.9em}}
+@media amzn-kf8{#header,#content,#footnotes,#footer{padding:0}}
+</style>
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
+<style>
+/* Stylesheet for CodeRay to match GitHub theme | MIT License | http://foundation.zurb.com */
+pre.CodeRay{background:#f7f7f8}
+.CodeRay .line-numbers{border-right:1px solid currentColor;opacity:.35;padding:0 .5em 0 0}
+.CodeRay span.line-numbers{display:inline-block;margin-right:.75em}
+.CodeRay .line-numbers strong{color:#000}
+table.CodeRay{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.CodeRay td{vertical-align:top;line-height:inherit}
+table.CodeRay td.line-numbers{text-align:right}
+table.CodeRay td.code{padding:0 0 0 .75em}
+.CodeRay .debug{color:#fff !important;background:#000080 !important}
+.CodeRay .annotation{color:#007}
+.CodeRay .attribute-name{color:#000080}
+.CodeRay .attribute-value{color:#700}
+.CodeRay .binary{color:#509}
+.CodeRay .comment{color:#998;font-style:italic}
+.CodeRay .char{color:#04d}
+.CodeRay .char .content{color:#04d}
+.CodeRay .char .delimiter{color:#039}
+.CodeRay .class{color:#458;font-weight:bold}
+.CodeRay .complex{color:#a08}
+.CodeRay .constant,.CodeRay .predefined-constant{color:#008080}
+.CodeRay .color{color:#099}
+.CodeRay .class-variable{color:#369}
+.CodeRay .decorator{color:#b0b}
+.CodeRay .definition{color:#099}
+.CodeRay .delimiter{color:#000}
+.CodeRay .doc{color:#970}
+.CodeRay .doctype{color:#34b}
+.CodeRay .doc-string{color:#d42}
+.CodeRay .escape{color:#666}
+.CodeRay .entity{color:#800}
+.CodeRay .error{color:#808}
+.CodeRay .exception{color:inherit}
+.CodeRay .filename{color:#099}
+.CodeRay .function{color:#900;font-weight:bold}
+.CodeRay .global-variable{color:#008080}
+.CodeRay .hex{color:#058}
+.CodeRay .integer,.CodeRay .float{color:#099}
+.CodeRay .include{color:#555}
+.CodeRay .inline{color:#000}
+.CodeRay .inline .inline{background:#ccc}
+.CodeRay .inline .inline .inline{background:#bbb}
+.CodeRay .inline .inline-delimiter{color:#d14}
+.CodeRay .inline-delimiter{color:#d14}
+.CodeRay .important{color:#555;font-weight:bold}
+.CodeRay .interpreted{color:#b2b}
+.CodeRay .instance-variable{color:#008080}
+.CodeRay .label{color:#970}
+.CodeRay .local-variable{color:#963}
+.CodeRay .octal{color:#40e}
+.CodeRay .predefined{color:#369}
+.CodeRay .preprocessor{color:#579}
+.CodeRay .pseudo-class{color:#555}
+.CodeRay .directive{font-weight:bold}
+.CodeRay .type{font-weight:bold}
+.CodeRay .predefined-type{color:inherit}
+.CodeRay .reserved,.CodeRay .keyword {color:#000;font-weight:bold}
+.CodeRay .key{color:#808}
+.CodeRay .key .delimiter{color:#606}
+.CodeRay .key .char{color:#80f}
+.CodeRay .value{color:#088}
+.CodeRay .regexp .delimiter{color:#808}
+.CodeRay .regexp .content{color:#808}
+.CodeRay .regexp .modifier{color:#808}
+.CodeRay .regexp .char{color:#d14}
+.CodeRay .regexp .function{color:#404;font-weight:bold}
+.CodeRay .string{color:#d20}
+.CodeRay .string .string .string{background:#ffd0d0}
+.CodeRay .string .content{color:#d14}
+.CodeRay .string .char{color:#d14}
+.CodeRay .string .delimiter{color:#d14}
+.CodeRay .shell{color:#d14}
+.CodeRay .shell .delimiter{color:#d14}
+.CodeRay .symbol{color:#990073}
+.CodeRay .symbol .content{color:#a60}
+.CodeRay .symbol .delimiter{color:#630}
+.CodeRay .tag{color:#008080}
+.CodeRay .tag-special{color:#d70}
+.CodeRay .variable{color:#036}
+.CodeRay .insert{background:#afa}
+.CodeRay .delete{background:#faa}
+.CodeRay .change{color:#aaf;background:#007}
+.CodeRay .head{color:#f8f;background:#505}
+.CodeRay .insert .insert{color:#080}
+.CodeRay .delete .delete{color:#800}
+.CodeRay .change .change{color:#66f}
+.CodeRay .head .head{color:#f4f}
+</style>
+</head>
+<body class="article">
+<div id="header">
+<h1>cl_exp_defined_builtin_kernels</h1>
+</div>
+<div id="content">
+<div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>The purpose of this extension is to provide a standardized set of built-in
+kernels with well-defined semantics useful for accelerating applications
+from various domains.  The extension specification is designed to rapidly
+expand and "live" via addition of new well-defined built-in kernel
+definitions and updating of previously defined ones.</p>
+</div>
+<h2 id="_xxx_not_complete_yet" class="float">XXX - Not complete yet!!!</h2>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_name_strings">Name Strings</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p><code>cl_exp_defined_builtin_kernels</code></p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_contact">Contact</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>TODO</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_contributors">Contributors</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Pekka Jääskeläinen, Intel and Tampere University.<br>
+Topi Leppänen, Tampere University.<br>
+Jan Solanti, Tampere University.<br>
+Ben Ashbaugh, Intel.<br>
+Henry Linjamäki, Intel.<br></p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_notice">Notice</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>TODO</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_status">Status</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Draft spec, NOT APPROVED!!</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_version">Version</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Built On: 2024-08-22<br>
+Version: 0.3.0</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_dependencies">Dependencies</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This extension is written against the OpenCL Specification version 3.0.12.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires OpenCL 1.2 or later.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires cl_exp_tensor.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_overview">Overview</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>OpenCL 1.2 specifies a built-in kernel as a kernel that is executed on
+an OpenCL device or custom device by fixed-function hardware or in firmware.
+Applications can query the built-in kernels supported by a device or custom
+device.</p>
+</div>
+<div class="paragraph">
+<p>Built-in kernels are referred to by a name (a C string) without any
+semantics attached to the functionality. The semantics behind the name
+is completely device specific, typically documented in vendor-specific
+extension specifications.</p>
+</div>
+<div class="paragraph">
+<p>The goal for this extension is to lower the bar for utilizing hardware
+accelerated functions in drivers by providing a library of
+well-defined built-in kernel with good coverage for common acceleration needs
+and which is designed to easily evolve over time.</p>
+</div>
+<div class="paragraph">
+<p>The device drivers that implement this extension can freely choose which
+subset of defined built-in-kernels (DBKs) they implement and advertise to the clients. The
+clients can use the DBKs to accelerate their applications by manually
+executing invoking the DBKs. The extension is designed to also support using
+automated task graph lowering tooling later.</p>
+</div>
+<div class="sect2">
+<h3 id="_background">Background</h3>
+<div class="paragraph">
+<p>ASIC-based coarse-grained hardware accelerators are specialized logic meant to
+speed up execution of workloads of interest, or to provide improvements in
+energy-efficiency. Examples of contemporary workloads that are beneficially hardware
+accelerated over software-based implementations include video coding, deep learning,
+cryptography, software-defined radio and graphics rendering.</p>
+</div>
+<div class="paragraph">
+<p>FPGAs form a special case somewhere between instruction-set architectures and fixed
+function hardware accelerators. While advances in high-level synthesis tools
+have attempted to bridge the programmability gap between GPU and FPGA programming,
+FPGAs are still considered as devices which are challenging to achieve efficient
+implementations with. Due to extensive manual optimization work required for efficient
+implementations of the accelerated functionality, defining FPGA designs as
+a system of "hardware accelerator IPs" is still a widely used "application abstraction".
+FPGAs can be thus seen as a platform that can realize and integrate any
+hardware accelerator implementable with the programmable fabric.</p>
+</div>
+<div class="paragraph">
+<p>The means to utilize hardware accelerators have typically been
+vendor-specific and abstracted behind domain-specific libraries.
+The overhead with the "bunch of libraries"-approach is seen in the lowest level
+of integration: The libraries utilize a low level library (typically
+vendor-specific) to interface with the actual hardware, and thus does not
+integrate efficiently with other libraries or software-programmable processors
+that might be available on the same chip.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_rationale">Rationale</h3>
+<div class="paragraph">
+<p>OpenCL&#8217;s built-in kernel abstraction allows pushing both hardware
+accelerated and software defined kernels to the same command-queues,
+providing a powerful means for asynchronous execution of heterogeneous
+task graphs on diverse heterogeneous platforms. The ability to invoke hardware
+accelerators while being able to synchronize and optimize data transfers at
+the lowest levels of the driver stack can provide significant latency benefits,
+especially when combined with the command-buffering mechanism.</p>
+</div>
+<div class="paragraph">
+<p>However, the built-in kernel abstraction works well only when it is widely adopted by
+vendors, and when multiple vendors implement the same definitions. Otherwise
+each vendor specifies and implements their own built-in kernels closely matching their
+own hardware accelerator properties, resulting in lack of cross-vendor
+portability in the API abstraction presented to the upper layers of
+heterogeneous computing software stacks.</p>
+</div>
+<div class="paragraph">
+<p>This extension standardizes a set of well-defined built-in kernels the
+clients can call from higher level programming stacks built with
+different languages and multiple libraries, possibly mix accelerator
+calls with calls to software kernel commands, and rely on the driver
+stack to optimize the execution (especially the synchronization and
+communication) as a low level heterogeneous task graph.  The
+heterogeneous task graph can be described using multiple
+command-queues and optionally cached using the command buffer
+extension (cl_khr_command_buffer).  It aims to promote the use of
+built-in kernels as a programming model for hardware accelerated
+functionality, to improve cross-vendor portability of hardware
+accelerated computing.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_new_api_functions">New API Functions</h2>
+<div class="sectionbody">
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="preprocessor">#define</span> CL_MAX_DBK_PROPERTIES <span class="integer">16</span>
+
+clCreateProgramWithDefinedBuiltInKernels(
+    cl_context           context,
+    cl_uint              num_devices,
+    <span class="directive">const</span> cl_device_id*  device_list,
+    cl_uint              num_kernels,
+    <span class="directive">const</span> <span class="predefined-type">char</span>**         kernel_names,
+    <span class="directive">const</span> cl_dbk_id_exp* kernel_ids,
+    <span class="directive">const</span> <span class="directive">void</span>**         kernel_attributes,
+    cl_int*              device_support_ret,
+    cl_int*              errcode_ret);</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_new_api_types">New API Types</h2>
+<div class="sectionbody">
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> cl_uint       cl_dbk_id_exp;
+<span class="keyword">typedef</span> cl_properties cl_dbk_properties_exp;
+
+<span class="keyword">typedef</span> <span class="keyword">union</span> {
+    cl_char    sc;
+    cl_uchar   uc;
+    cl_short   ss;
+    cl_ushort  us;
+    cl_int     si;
+    cl_uint    ui;
+    cl_long    sl;
+    cl_ulong   ul;
+    cl_half    fh;
+    cl_float   ff;
+    cl_double  fd;
+    <span class="directive">void</span>*      raw;
+} cl_tensor_datatype_union_exp;
+
+<span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_matmul_exp {
+    cl_tensor_desc                a;
+    cl_tensor_desc                b;
+    cl_tensor_desc                c;
+    cl_int                        trans_a;
+    cl_int                        trans_b;
+    cl_dbk_properties_exp         kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_matmul_exp;
+
+<span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_gemm_exp {
+    cl_tensor_desc                a;
+    cl_tensor_desc                b;
+    cl_tensor_desc                c_in;
+    cl_tensor_desc                c_out;
+    cl_bool                       trans_a;
+    cl_bool                       trans_b;
+    cl_tensor_datatype_union_exp  alpha;
+    cl_tensor_datatype_union_exp  beta;
+    cl_dbk_properties_exp         kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_gemm_exp;
+
+<span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_leaky_relu_exp {
+   cl_tensor_datatype_union_exp   coefficient;
+   cl_dbk_properties_exp          kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_leaky_relu_exp;</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_new_api_enums">New API Enums</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Accepted values to <strong>cl_dbk_id_exp</strong>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">CL_DBK_MATMUL_EXP      <span class="integer">0</span>x????
+CL_DBK_GEMM_EXP        <span class="integer">0</span>x????
+CL_DBK_LEAKY_RELU_EXP  <span class="integer">0</span>x????</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>accepted values to <strong>cl_dbk_properties_exp</strong>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP  <span class="integer">0</span>x????
+CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP   <span class="integer">0</span>x????</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>New error codes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">CL_DBK_UNSUPPORTED_EXP                <span class="integer">0</span>x????
+CL_DBK_UNSUPPORTED_PROPERTY_EXP       <span class="integer">0</span>x????
+CL_DBK_INVALID_ATTRIBUTE_EXP          <span class="integer">0</span>x????
+CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP   <span class="integer">0</span>x????</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_modifications_to_the_opencl_specification">Modifications to the OpenCL Specification</h2>
+<div class="sectionbody">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Add the following to section 5.8.1, <strong>Creating Program Objects</strong>) </dt>
+<dd>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>To create a program object for a context and to load the information
+related to the defined built-in kernels into that object, call the
+function:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">clCreateProgramWithDefinedBuiltInKernels(
+    cl_context          context,
+    cl_uint             num_devices,
+    <span class="directive">const</span> cl_device_id* device_list,
+    cl_uint             num_kernels,
+    <span class="directive">const</span> cl_dbk_id*    kernel_ids,
+    <span class="directive">const</span> <span class="predefined-type">char</span>**        kernel_names,
+    <span class="directive">const</span> <span class="directive">void</span>**        kernel_attributes,
+    cl_int*             device_errcode_ret,
+    cl_int*             errcode_ret);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>context</em> must be a valid OpenCL context.</p>
+</li>
+<li>
+<p><em>num_devices</em> is the number of elements in <em>device_list</em> and
+<em>device_errcode_ret</em> lists.</p>
+</li>
+<li>
+<p><em>device_list</em> is a pointer to a list of devices that are in
+<em>context</em>. <em>device_list</em> must be a non-NULL value. The defined built-in
+kernels are loaded for devices specified in this list.</p>
+</li>
+<li>
+<p><em>num_kernels</em> is the number of elements in <em>kernel_ids</em>,
+<em>kernel_attributes</em>, <em>kernel_names_ret</em> and <em>device_errcode_ret</em> lists.</p>
+</li>
+<li>
+<p><em>kernel_ids</em> is the list of defined built-in kernels to
+be loaded into the program.</p>
+</li>
+<li>
+<p><em>kernel_names</em> is a list of names given for each kernel listed in
+<em>kernel_ids</em>. Each string in the list must be non-NULL and unique.</p>
+</li>
+<li>
+<p><em>kernel_attributes</em> is a list of pointers that point to the
+respective attribute structure of each defined built-in kernel in
+the <em>kernel_ids</em> list.  The respective attribute structures for each
+kernel identifiers are listed in <a href="#appendix-dbk">Appendix TODO</a>.</p>
+</li>
+<li>
+<p><em>device_errcode_ret</em> will return an appropriate error code per
+device. if <em>device_errcode_ret</em> is NULL, no error code is returned.</p>
+</li>
+<li>
+<p><em>errcode_ret</em> will return an appropriate error code. If
+<em>errcode_ret</em> is NULL, no error code is returned.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The devices associated with the program object will be the list of
+devices specified by <em>device_list</em> or subset of it. The list of
+devices specified by <em>device_list</em> must be devices associated with
+<em>context</em>.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateProgramWithDefinedBuiltInKernels</strong> returns a valid non-zero
+program object and <em>errcode_ret</em> is set to <strong>CL_SUCCESS</strong> if the program
+object is created successfully. The returned program is created for
+the devices that supports the requested built-in kernels indicated by
+<strong>CL_SUCCESS</strong> in the <em>device_errcode_ret</em> list. In case of a failure to
+create program for a device, one of the following errors code is set
+in <em>device_errcode_ret</em> list for the respective device:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>CL_DBK_UNSUPPORTED_EXP</strong> if the device does not support one of the
+built-in kernels listed in <em>kernel_ids</em>.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_PROPERTY</strong> if a property list for a defined built-in
+kernel description is invalid.</p>
+</li>
+<li>
+<p><strong>CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP</strong> if a defined built-in kernel
+does not meet the requested precision.</p>
+</li>
+<li>
+<p><strong>CL_OUT_OF_RESOURCES</strong> if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If a program object is not created,
+<strong>clCreateProgramWithDefinedBuiltInKernels</strong> returns a NULL value with
+one of the following error codes returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>CL_INVALID_CONTEXT</strong> if <em>context</em> is not a valid context.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_VALUE</strong> if <em>device_list</em> is NULL or <em>num_devices</em> is zero.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_VALUE</strong> if a kernel name is not unique within <em>kernel_names</em>.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_VALUE</strong> if there is a NULL value in <em>kernel_names</em>.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_DBK_ID_EXP</strong> if any value in the <em>kernel_ids</em> is not a known
+identifier for a built-in kernel.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_DBK_ATTRIBUTE_EXP</strong> if a kernel attribute structure is
+invalid for a built-in kernel.</p>
+</li>
+<li>
+<p><strong>CL_DBK_UNSUPPORTED_EXP</strong> if <em>device_errcode_ret</em> is NULL and any
+device in <em>device_list</em> does not support a defined built-in kernel.</p>
+</li>
+<li>
+<p><strong>CL_DBK_UNSUPPORTED_EXP</strong> if <em>device_errcode_ret</em> is non-NULL and all
+devices in <em>device_list</em> does not support a defined built-in kernel.</p>
+</li>
+<li>
+<p><strong>CL_DBK_UNSUPPORTED_PROPERTY_EXP</strong> If a kernel does not accept a
+valid kernel property.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_DEVICE</strong> if any device in <em>device_list</em> is not in the list of
+devices associated with <em>context</em>.</p>
+</li>
+<li>
+<p><strong>CL_OUT_OF_RESOURCES</strong> if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p><strong>CL_OUT_OF_HOST_MEMORY</strong> if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">(Modify section 5.10, <strong>Executing Kernels</strong>) </dt>
+<dd>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Add following to <strong>clEnqueueNDRangeKernel</strong>) </dt>
+</dl>
+</div>
+</div>
+</div>
+<div class="paragraph">
+<p>For defined built-in kernels <em>work_dim</em>, <em>global_work_offset</em>,
+<em>global_work_size</em> and <em>local_work_size</em> parameters are meaningless
+and must be set to zero and NULL, respectively. OpenCL implementations
+decide how they distribute the workloads of the defined built-in
+kernels.</p>
+</div>
+</dd>
+</dl>
+</div>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Add the following to the list of error codes returned by <strong>clEnqueueNDRangeKernel</strong>) </dt>
+</dl>
+</div>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>CL_INVALID_GLOBAL_WORK_SIZE</strong> if the <em>kernel</em> is a defined built-in
+kernel and <em>global_work_size</em> is not NULL.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_GLOBAL_WORK_OFFSET</strong> if the <em>kernel</em> is a defined built-in
+kernel and <em>global_work_offset</em> is not NULL.</p>
+</li>
+<li>
+<p><strong>CL_INVALID_LOCAL_WORK_SIZE</strong> if the <em>kernel</em> is a defined built-in
+kernel and <em>local_work_size</em> is not NULL.</p>
+</li>
+</ul>
+</div>
+<div class="openblock">
+<div class="content">
+
+</div>
+</div>
+<div class="sect2">
+<h3 id="appendix-dbk">Add new appendix "Defined Built-in Kernels" to OpenCL API Specification</h3>
+<div class="paragraph">
+<p>This chapter describes standard defined built-in kernels (DBK) with
+well-defined semantics. They are loaded into a program using
+<strong>clCreateProgramWithDefinedBuiltinKernels</strong> and the kernels in it are
+launched using <strong>clEnqueueNDRangeKernel</strong> with <em>work_dim</em> set to zero
+and <em>global_work_offset</em>, <em>global_work_size</em> and <em>local_work_size</em> set
+to NULL.</p>
+</div>
+<div class="paragraph">
+<p>The general client-side abstraction of the DBKs is similar to a call
+to a C function of which implementation is hidden. The device driver
+are free to implement a DBK by invoking one or more coarse and fine-grained hardware accelerators combined with
+firmware to implement the semantics as efficiently as possible.</p>
+</div>
+<div class="paragraph">
+<p>It is the driver&#8217;s responsibility to handle efficient synchronization and communication
+to the hardware accelerator, the internal accelerator state management and resource sharing
+across multiple OpenCL contexts.</p>
+</div>
+<div class="sect3">
+<h4 id="_reproducibility">Reproducibility</h4>
+<div class="paragraph">
+<p>Identical DBKs or the same DBKs executed repeatedly with identical inputs are
+guaranteed to produce identical results, unless otherwise stated in
+the DBK&#8217;s description, when:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>enqueued to the same device,</p>
+</li>
+<li>
+<p>on the same platform,</p>
+</li>
+<li>
+<p>on the same vendor with the same driver version and</p>
+</li>
+<li>
+<p>CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP property is not set on.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>In other cases, the DBKs may produce different results. Two DBKs for a
+device are considered identical if they are created using identical
+kernel identifier, kernel attributes and kernel properties. The result
+difference may occur because of different algorithms being used across
+devices, for example.</p>
+</div>
+<div class="paragraph">
+<p>DBKs may produce approximated results and the error, respect to
+infinitely precise result, can be optionally controlled by
+CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP when the property name is listed in
+the DBK&#8217;s description. When the precision is not controlled by the
+application using CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP, the OpenCL
+precision of results are</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>chosen by the implementation for floating-point based tasks.</p>
+</li>
+<li>
+<p>exact for integer based tasks.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_kernel_interface">Kernel Interface</h4>
+<div class="paragraph">
+<p>DBKs operates on tensor objects, created with
+<strong>clCreateBufferWithProperties</strong> using <code>CL_MEM_TENSOR</code> property,
+generally in single-static assignment fashion. the Kernel arguments
+used for reading and writing tensors may not reference the same tensor
+object unless otherwise stated in the <a href="#dbk-description-table">DBK descriptions</a>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_the_defined_built_in_kernels">The Defined Built-in Kernels</h4>
+<div class="paragraph">
+<p>The list of recognized defined built-in kernels are listed in the
+following <a href="#dbk-description-table">table</a>. It is expected to be
+expanded and updated over the versions of this extensions, while
+preserving backwards compatibility.</p>
+</div>
+<div class="paragraph">
+<p>Each defined built-in kernel entry is organized as follows:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Name</strong>: Name of the defined built-in kernel (an enumeration).</p>
+</li>
+<li>
+<p><strong>Kernel attributes</strong>: The kernel attributes required for creating the
+defined built-in kernel via
+<strong>clCreateProgramWithDefinedBuiltinKernels</strong>. Attribute values are
+immutable.</p>
+</li>
+<li>
+<p><strong>Kernel arguments</strong>: The kernel arguments.</p>
+</li>
+<li>
+<p><strong>Description</strong>: The description of the kernel in detail.</p>
+</li>
+<li>
+<p><strong>Attribute validation rules</strong>: Conditions of the kernel attribute for
+the kernel. Implementation must return CL_DBK_INVALID_ATTRIBUTE_EXP on
+<strong>clCreateProgramWithDefinedBuiltinKernels</strong> call if any of the conditions
+are violated.</p>
+</li>
+<li>
+<p><strong>Kernel mode properties</strong>: List of <a href="#dbk-property-table">kernel properties</a>
+(<code>cl_dbk_properties_exp</code>) the kernel may accept. The properties can
+be used to tweak certain implementation details and behaviors in
+the kernel execution. If a property not listed in the DBK
+description is fed to <strong>clCreateProgramWithDefinedBuiltinKernels</strong>
+call, then implementation must return
+<code>CL_DBK_UNSUPPORTED_PROPERTY_EXP</code>.</p>
+</li>
+</ul>
+</div>
+<table id="dbk-propery-table" class="tableblock frame-all grid-all stripes-odd stretch">
+<caption class="title">Table 1. Table of defined built-in kernel properties</caption>
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>DBK Mode Property</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Property Value</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">float</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Require that the DBK produces the results which do not deviate more
+than the given amount value of ULPs (units in the last place) respect
+to infnitely precise result.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Allow results of the kernel to be non-reproducible. This allows
+implementation to switch algorithm of the kernel on each launch for
+possibly better performance.</p>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+<table id="dbk-description-table" class="tableblock frame-all grid-all stretch">
+<caption class="title">Table 2. Standard Built-in Kernels and Their Semantics. <strong>The table has been populated with a small set of non-trivial example entries which are subject to change and the list to expand during drafting.</strong></caption>
+<colgroup>
+<col style="width: 100%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Name: <strong>CL_DBK_GEMM_EXP</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel Attributes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_gemm_exp {
+    cl_tensor_desc a;
+    cl_tensor_desc b;
+    cl_tensor_desc c_in;
+    cl_tensor_desc c_out;
+    cl_bool trans_a;
+    cl_bool trans_b;
+    cl_tensor_datatype_union_exp alpha;
+    cl_tensor_datatype_union_exp beta;
+    cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_gemm_exp;</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>a</em> is a tensor description for input matrix A.</p>
+</li>
+<li>
+<p><em>b</em> is a tensor description for input matrix B.</p>
+</li>
+<li>
+<p><em>c_in</em> is a tensor description for output matrix CIN.</p>
+</li>
+<li>
+<p><em>c_out</em> is a tensor description for output matrix COUT.</p>
+</li>
+<li>
+<p><em>trans_a</em> instruct to transpose the A matrix if the value is CL_TRUE.</p>
+</li>
+<li>
+<p><em>trans_b</em> instruct to transpose the B matrix if the value is CL_TRUE.</p>
+</li>
+<li>
+<p><em>alpha</em> is a value or pointer to value corresponponding to the
+element type of <em>c_out</em>.</p>
+</li>
+<li>
+<p><em>beta</em> is a value or pointer to value corresponponding to the
+element type of <em>c_out</em>.</p>
+</li>
+<li>
+<p><em>kernel_props</em> defined additional kernel properties.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel Arguments</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>cl_mem: a tensor object for matrix A (read only).</p>
+</li>
+<li>
+<p>cl_mem: a tensor object for matrix B (read only).</p>
+</li>
+<li>
+<p>cl_mem: a tensor object for matrix C_IN (read only).</p>
+</li>
+<li>
+<p>cl_mem: a tensor object for matrix C_OUT (write only).</p>
+</li>
+</ol>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Description</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Performs (batched) general matrix multiplication:</p>
+</div>
+<div class="stemblock">
+<div class="content">
+\$bb"COUT"_(b,m,n) = "beta" * bb"CIN"_(b,m,n) + "alpha" * sum_(k)trans(bb"A", "trans_a")_(b,m,k)trans(bb"B", "trans_b") _(b,k,n)\$
+</div>
+</div>
+<div class="paragraph">
+<p>Where:</p>
+</div>
+<div class="stemblock">
+<div class="content">
+\$trans(X_(b,i,j), tr) = {(X_(b,j,i), "if tr" = "CL_TRUE"), (X_(b,i,j), "otherwise") :}\$
+</div>
+</div>
+<div class="paragraph">
+<p>Second degree tensors of shape <code>(a, b)</code> are treated as third degree
+tensors of shape <code>(1, a, b)</code>.</p>
+</div>
+<div class="paragraph">
+<p>Operations of the matrix muliplication are performed in the precision
+of the <code>elementof(COUT)</code>.</p>
+</div>
+<div class="paragraph">
+<p>If an overflow occurs in the accumulation of the products, then <code>R</code>
+tensor&#8217;s result will be undefined.</p>
+</div>
+<div class="paragraph">
+<p><code>CIN</code> and <code>COUT</code> tensors may be the same object.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Attribute validation rules</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="ulist">
+<ul>
+<li>
+<p><code>rankof(A) == rankof(B) == rankof(CIN) == rankof(COUT)</code>.</p>
+</li>
+<li>
+<p>Let <code>shapeof(A<sub>t</sub>) == (b&#8230;&#8203;, m, k)</code> and <code>shapeof(B<sub>t</sub>) = (b&#8230;&#8203;, k,
+n)</code> of tensors <code>A</code> and <code>B</code>, respectively, after possible tranposing.
+<code>shapeof(COUT)</code> must be <code>(b&#8230;&#8203;, m, n)</code>.</p>
+</li>
+<li>
+<p><code>shapeof(CIN) == shapeof(COUT)</code>.</p>
+</li>
+<li>
+<p><code>elementof(A) == elementof(B)</code>.</p>
+</li>
+<li>
+<p><code>elemkindof(COUT) == elemkindof(A)</code>.</p>
+</li>
+<li>
+<p><code>elementof(COUT) == elementof(A)</code> or <code>elementof(A)</code> is promotable to
+<code>elementof(COUT)</code> without a loss of meaning.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel mode properties</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>This DBK accepts the following kernel properties:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP</p>
+</li>
+<li>
+<p>CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Name: <strong>CL_DBK_MATMUL_EXP</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel Attributes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_matmul_exp {
+    cl_tensor_desc a;
+    cl_tensor_desc b;
+    cl_tensor_desc c;
+    cl_bool trans_a;
+    cl_bool trans_b;
+    cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_matmul_exp;</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>a</em> is a tensor description for input matrix A.</p>
+</li>
+<li>
+<p><em>b</em> is a tensor description for input matrix B.</p>
+</li>
+<li>
+<p><em>c</em> is a tensor description for output matrix C.</p>
+</li>
+<li>
+<p><em>trans_a</em> instruct to transpose the A matrix if the value is CL_TRUE.</p>
+</li>
+<li>
+<p><em>trans_b</em> instruct to transpose the B matrix if the value is CL_TRUE.</p>
+</li>
+<li>
+<p><em>kernel_props</em> defined additional kernel properties.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel Arguments</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>cl_mem: a tensor object for matrix A (read only).</p>
+</li>
+<li>
+<p>cl_mem: a tensor object for matrix B (read only).</p>
+</li>
+<li>
+<p>cl_mem: a tensor object for matrix C (write only).</p>
+</li>
+</ol>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Description</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Performs (batched) matrix multiplication:</p>
+</div>
+<div class="stemblock">
+<div class="content">
+\$bb"C"_(b,m,n) = sum_(k)trans(bb"A", "trans_a")_(b,m,k)trans(bb"B", "trans_b") _(b,k,n)\$
+</div>
+</div>
+<div class="paragraph">
+<p>Where:</p>
+</div>
+<div class="stemblock">
+<div class="content">
+\$trans(X_(b,i,j), tr) = {(X_(b,j,i), "if tr" = "CL_TRUE"), (X_(b,i,j), "otherwise") :}\$
+</div>
+</div>
+<div class="paragraph">
+<p>Second degree tensors of shape <code>(a, b)</code> are treated as third degree
+tensors of shape <code>(1, a, b)</code>.</p>
+</div>
+<div class="paragraph">
+<p>Operations of the matrix muliplication are performed in the precision
+of the <code>elementof(COUT)</code>.</p>
+</div>
+<div class="paragraph">
+<p>If an overflow occurs in the accumulation of the products, then <code>R</code>
+tensor&#8217;s result will be undefined.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Attribute validation rules</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="ulist">
+<ul>
+<li>
+<p><code>rankof(A) == rankof(B) == rankof(C)</code>.</p>
+</li>
+<li>
+<p>Let <code>shapeof(A<sub>t</sub>) == (b&#8230;&#8203;, m, k)</code> and <code>shapeof(B<sub>t</sub>) = (b&#8230;&#8203;, k,
+n)</code> of tensors <code>A</code> and <code>B</code>, respectively, after possible tranposing.
+<code>shapeof(C)</code> must be <code>(b&#8230;&#8203;, m, n)</code>.</p>
+</li>
+<li>
+<p><code>elementof(A) == elementof(B)</code>.</p>
+</li>
+<li>
+<p><code>elemkindof(C) == elemkindof(A)</code>.</p>
+</li>
+<li>
+<p><code>elementof(C) == elementof(A)</code> or <code>elementof(A)</code> is promotable to
+<code>elementof(C)</code> without a loss of meaning.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel mode properties</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>This DBK accepts the following kernel properties:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Name: <strong>CL_DBK_LEAKY_RELU_DBK</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel Attributes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> cl_dbk_attributes_leaky_relu_exp {
+   cl_tensor_datatype_union_exp coefficient;
+   cl_dbk_properties kernel_props[CL_MAX_DBK_PROPERTIES];
+} cl_dbk_attributes_leaky_relu_exp;</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>alpha</em> is a coefficient of leakage, a positive value.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel arguments</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>cl_mem: a tensor object IN for input values.</p>
+</li>
+<li>
+<p>cl_mem: a tensor object OUT for output value.</p>
+</li>
+</ol>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Description</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>This element-wise built-in kernel performs a leaky ReLU operation as followed:</p>
+</div>
+<div class="stemblock">
+<div class="content">
+\$"OUT"_(i) = {( -"alpha" * "IN"_(i), "if IN"_(i) \lt 0), ("IN"_(i), " otherwise") :}\$
+</div>
+</div>
+<div class="paragraph">
+<p>If target device does not support denormals, then the <code>alpha</code> value is
+flushed to zero before the operation is applied. This DBK accepts
+tensors of arbitrary rank.</p>
+</div>
+<div class="paragraph">
+<p>The <code>IN</code> and <code>OUT</code> tensors may be the same object.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel mode properties</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This DBK accepts the following kernel properties:</p>
+<p class="tableblock">* CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP
+* CL_DBK_PROPERTY_NON_DETERMINISTIC_EXP</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Attribute validation rules</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="ulist">
+<ul>
+<li>
+<p><code>shapeof(in) == shapeof(out)</code>.</p>
+</li>
+<li>
+<p><code>elementof(in) == elementof(out)</code>.</p>
+</li>
+<li>
+<p><code>coefficient</code> must be a positive, finite value.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_launching_dbks_from_the_device_side">Launching DBKs from the Device Side</h4>
+<div class="paragraph">
+<p>DBKs are primarily meant to be launched as kernel commands via
+host-side command queues.  Optionally, they can be callable from
+device-side via <code>enqueue_kernel</code>:</p>
+</div>
+<div class="paragraph">
+<p>TBC. This probably needs device-side function corresponding to
+<strong>clCreateProgramWithDefinedBuiltinKernels</strong>.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_sample_code">Sample Code</h2>
+<div class="sectionbody">
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">constexpr size_t b = <span class="integer">64</span>, m = <span class="integer">100</span>, n = <span class="integer">200</span>, k = <span class="integer">50</span>;
+cl_int err;
+
+std::vector&lt;<span class="predefined-type">float</span>&gt; lhs_data = ...;
+std::vector&lt;<span class="predefined-type">float</span>&gt; rhs_data = ...;
+std::vector&lt;<span class="predefined-type">float</span>&gt; bias_data = ...;
+std::vector&lt;<span class="predefined-type">float</span>&gt; out_data(b * m * n);
+
+cl_tensor_layout_blas_exp row_major;
+row_major.leading_dims[<span class="integer">0</span>] = <span class="integer">2</span>,
+row_major.leading_dims[<span class="integer">1</span>] = <span class="integer">1</span>,
+
+cl_tensor_desc_exp lhs_desc;
+lhs_desc.rank = <span class="integer">3</span>;
+lhs_desc.dtype = CL_TENSOR_FP32_EXP;
+lhs_desc.properties[<span class="integer">0</span>] = <span class="integer">0</span>;
+lhs_desc.shape[<span class="integer">0</span>] = b;
+lhs_desc.shape[<span class="integer">1</span>] = m;
+lhs_desc.shape[<span class="integer">2</span>] = k;
+lhs_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+lhs_desc.layout = &amp;row_major;
+
+cl_tensor_desc_exp rhs_desc;
+rhs_desc.rank = <span class="integer">3</span>;
+rhs_desc.dtype = CL_TENSOR_FP32_EXP;
+rhs_desc.properties[<span class="integer">0</span>] = <span class="integer">0</span>;
+rhs_desc.shape[<span class="integer">0</span>] = b;
+rhs_desc.shape[<span class="integer">1</span>] = k;
+rhs_desc.shape[<span class="integer">2</span>] = n;
+rhs_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+rhs_desc.layout = &amp;row_major;
+
+cl_tensor_desc_exp out_desc;
+out_desc.rank = <span class="integer">3</span>;
+out_desc.dtype = CL_TENSOR_FP32_EXP;
+out_desc.properties[<span class="integer">0</span>] = <span class="integer">0</span>;
+out_desc.shape[<span class="integer">0</span>] = b;
+out_desc.shape[<span class="integer">1</span>] = m;
+out_desc.shape[<span class="integer">2</span>] = n;
+out_desc.layout_type = CL_TENSOR_LAYOUT_BLAS_EXP;
+out_desc.layout = &amp;row_major;
+
+cl_mem lhs_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, lhs_desc, <span class="integer">0</span>},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, <span class="integer">0</span>, lhs_data.data(), &amp;err);
+cl_mem rhs_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, rhs_desc, <span class="integer">0</span>},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, <span class="integer">0</span>, rhs_data.data(), &amp;err);
+cl_mem bias_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, out_desc, <span class="integer">0</span>},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, <span class="integer">0</span>, rhs_data.data(), &amp;err);
+cl_mem out_tensor = clCreateBufferWithProperties(
+  ctx, {CL_MEM_TENSOR_EXP, out_desc, <span class="integer">0</span>},
+  CL_MEM_USE_HOST_PTR | CL_MEM_READ_WRITE, <span class="integer">0</span>, out_data.data(), &amp;err);
+
+cl_tensor_datatype_union_exp alpha, beta, relu_coeff;
+alpha.sf = <span class="float">2</span><span class="float">.0f</span>;
+beta.sf = -<span class="float">1</span><span class="float">.0f</span>;
+relu_coeff.sf = <span class="float">0</span><span class="float">.01f</span>;
+
+cl_dkb_attributes_gemm_exp gemm_attrs = {
+  lhs_desc, rhs_desc, out_desc, out_desc, <span class="integer">0</span>, <span class="integer">0</span>, alpha, beta, {}
+};
+gemm_attrs.kernel_props[<span class="integer">0</span>] = CL_DBK_PROPERTY_MAX_RELATIVE_ERROR_EXP;
+gemm_attrs.kernel_props[<span class="integer">1</span>] = <span class="integer">100</span>; <span class="comment">// in ILPs</span>
+gemm_attrs.kernel_props[<span class="integer">2</span>] = <span class="integer">0</span>;
+
+cl_dkb_attributes_leaky_relu_exp relu_attrs = {
+  out_desc, out_desc, relu_coeffs, {<span class="integer">0</span>}
+};
+
+cl_device_id target_devices[<span class="integer">2</span>] = {dev1, dev2};
+cl_int device_errcodes[<span class="integer">2</span>];
+<span class="directive">auto</span> prog = clCreateProgramWithDefinedBuiltInKernels(
+  ctx, <span class="integer">2</span>, target_devices, <span class="integer">2</span>,
+  {CL_DBK_GEMM_EXP, CL_DBK_LEAKY_RELU_EXP}, {<span class="string"><span class="delimiter">&quot;</span><span class="content">my_gemm</span><span class="delimiter">&quot;</span></span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">my_relu</span><span class="delimiter">&quot;</span></span>},
+  {&amp;gemm_attrs, &amp;relu_attrs}, &amp;device_errcodes, &amp;err);
+
+std::vector&lt;cl_device_id&gt; supported_devs;
+<span class="keyword">for</span> (<span class="predefined-type">unsigned</span> i = <span class="integer">0</span>; i &lt; <span class="integer">2</span>; i++) {
+  <span class="keyword">if</span> (device_errcodes[i] == CL_SUCCESS) {
+    supported_devs.push_back(target_devices[i]);
+  } <span class="keyword">else</span> {
+     <span class="comment">// Handle errors. Possible error cases (non-exhaustive):</span>
+     <span class="comment">//</span>
+     <span class="comment">// * CL_DBK_UNSUPPORTED_EXP: The DBK is not supported on the device.</span>
+     <span class="comment">// * CL_DBK_UNMET_MAX_RELATIVE_ERROR_EXP The DBK implementation does not</span>
+     <span class="comment">//   meet the requested precision.</span>
+  }
+}
+
+err = clBuildProgram(
+  prog, supported_devs.size(), supported_devs.data(), <span class="string"><span class="delimiter">&quot;</span><span class="delimiter">&quot;</span></span>, nullptr, nullptr);
+
+<span class="directive">auto</span> gemm_kernel = clCreateKernel(prog, <span class="string"><span class="delimiter">&quot;</span><span class="content">my_gemm</span><span class="delimiter">&quot;</span></span>, &amp;err);
+clSetKernelArg(gemm_kernel, <span class="integer">0</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;lhs_tensor);
+clSetKernelArg(gemm_kernel, <span class="integer">1</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;rhs_tensor);
+clSetKernelArg(gemm_kernel, <span class="integer">2</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;bias_tensor);
+clSetKernelArg(gemm_kernel, <span class="integer">3</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;out_tensor);
+
+<span class="directive">auto</span> relu_kernel = clCreateKernel(prog, <span class="string"><span class="delimiter">&quot;</span><span class="content">my_relu</span><span class="delimiter">&quot;</span></span>, &amp;err);
+clSetKernelArg(relu_kernel, <span class="integer">0</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;out_tensor);
+clSetKernelArg(relu_kernel, <span class="integer">1</span>, <span class="keyword">sizeof</span>(cl_mem), &amp;out_tensor);
+
+cmq_q = <span class="comment">/* Create an in-order command queue. */</span>;
+
+clEnqueueNDRangeKernel(
+  cmd_q, <span class="integer">0</span>, nullptr, nullptr, nullptr, gemm_kernel, <span class="integer">0</span>, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, <span class="integer">0</span>, nullptr, nullptr, nullptr, relu_kernel, <span class="integer">0</span>, nullptr, nullptr);
+clEnqueueMapBuffer(
+  cmd_q, out_tensor, CL_TRUE, CL_MAP_READ, <span class="integer">0</span>, b * m * n, <span class="integer">0</span>, nullptr, nullptr);</code></pre>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_open_questions">Open questions</h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Should we enable launching DBKs from the device side without requiring device-side enqueue? The main problem is those with NDRange as they are not simple single-WI helper functions.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should the NDRange be used at all in DBKs? It feels sort of unnatural as typically the NDRange is used to imply SPMD parallelism while the hardware/firmware is free to choose whatever parallelization strategy to implement the function. On the other hand, similar applies to software kernel launches as the NDRange-launched work-items can be executed serially if adhering to barrier semantics.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>. Decided to go forward without NDRange (and global offset
+ as consequence), as there are currently no known uses for the
+ NDRange, and let OpenCL implementations decide the parallelization
+ strategy.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Different accelerators prefer different channel orders (NHWC vs. NCHW&#8230;&#8203;) for the processed data. Should the channel order be passed as a DBK argument (like in the example GEMM&#8217;s row/column order) or is it better to have different DBK variations for each?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>. The memory layout information is a property of the tensors so
+ there is no need for DBK arguments for the layout or DBK variants.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>How to denote tensors' memory layout preference? Some of the DBKs are more efficient on a given device as they map more naturally to the underlying HW accelerator, but the slower variations (for example, with unoptimal channel order in NN accelerators) might be still beneficially accelerated.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong>.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Since the defined built-in kernel concept is basically just a C-like API inside another API, should it be made more generic and thus directly usable for SYCL and Vulkan as well?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>What other DBK mode properties we should have? Here are some ideas:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>Perform accumulation with saturation.</p>
+</li>
+<li>
+<p>Finite math only</p>
+</li>
+<li>
+<p>Flush denormals to zero.</p>
+</li>
+</ul>
+</div>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should we reuse (and remove "deprecation" status on) clEnqueueTask
+for launching DBKs as DBKs don&#8217;t make use of global offset and size
+and local size parameters?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+</ol>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_version_history">Version History</h2>
+<div class="sectionbody">
+<table class="tableblock frame-all grid-rows stretch">
+<colgroup>
+<col style="width: 7.1428%;">
+<col style="width: 14.2857%;">
+<col style="width: 21.4285%;">
+<col style="width: 57.143%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Version</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Date</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Author</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.1.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2022-12-13</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Pekka Jääskeläinen<br>
+Ben Ashbaugh</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>First formulation as an extension specification like proposed by Ben Ashbaugh.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.2.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2023-11-23</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Henry Linjamäki<br>
+Pekka Jääskeläinen<br>
+Ben Ashbaugh</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Add APIs for defined built-in kernel (DBK) creation. Model DBKs on
+tensor type. Add sample code.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.3.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2024-8-20</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Henry Linjamäki<br>
+Pekka Jääskeläinen<br>
+Freddie Witherden</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="ulist">
+<ul>
+<li>
+<p>Rework document structure match to the cl_exp_extension_template.</p>
+</li>
+<li>
+<p>Reflect changes of the <code>cl_exp_tensor</code> extension here.</p>
+</li>
+<li>
+<p>Add "Kernel Interface" section into the DBK Appendix.</p>
+</li>
+<li>
+<p>Add GEMM DBK.</p>
+</li>
+<li>
+<p>Change DBK creation interface.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.3.1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2024-8-22</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Henry Linjamäki<br>
+Pekka Jääskekäinen<br>
+RABijl (@GitHub)</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="ulist">
+<ul>
+<li>
+<p>Rename extension name from 'khr' to 'exp'.</p>
+</li>
+<li>
+<p>Resolve two open questions.</p>
+</li>
+<li>
+<p>Small fixes.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div id="footer">
+<div id="footer-text">
+Last updated 2024-08-22 13:22:34 +0300
+</div>
+</div>
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  messageStyle: "none",
+  tex2jax: {
+    inlineMath: [["\\(", "\\)"]],
+    displayMath: [["\\[", "\\]"]],
+    ignoreClass: "nostem|nolatexmath"
+  },
+  asciimath2jax: {
+    delimiters: [["\\$", "\\$"]],
+    ignoreClass: "nostem|noasciimath"
+  },
+  TeX: { equationNumbers: { autoNumber: "none" } }
+})
+MathJax.Hub.Register.StartupHook("AsciiMath Jax Ready", function () {
+  MathJax.InputJax.AsciiMath.postfilterHooks.Add(function (data, node) {
+    if ((node = data.script.parentNode) && (node = node.parentNode) && node.classList.contains("stemblock")) {
+      data.math.root.display = "block"
+    }
+    return data
+  })
+})
+</script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/MathJax.js?config=TeX-MML-AM_HTMLorMML"></script>
+</body>
+</html>
\ No newline at end of file