Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_intel_unified_shared_memory version 1.1 #905

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 38 additions & 27 deletions extensions/cl_intel_unified_shared_memory.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Shipping
== Version

Built On: {docdate} +
Revision: 1.0.0
Revision: 1.1.0

== Dependencies

Expand Down Expand Up @@ -743,9 +743,19 @@ Arguments to the kernel are referred to by indices that go from 0 for the leftmo

_arg_value_ is the pointer value that should be used as the argument specified by _arg_index_.
The pointer value will be used as the argument by all API calls that enqueue a kernel until the argument value is set to a different pointer value by a subsequent call.
A pointer into Unified Shared Memory allocation may only be set as an argument value for an argument declared to be a pointer to `global` or `constant` memory.
A pointer may only be set as an argument value for an argument declared to be a pointer to `global` or `constant` memory.

[[valid-usm-pointer-argument-definition]]
The definition of a valid pointer value was changed in extension version 1.1.0:

* For extension versions prior to version 1.1.0:
For devices supporting shared system allocations, any pointer value is valid.
Otherwise, the pointer value must be `NULL` or must point into a Unified Shared Memory allocation returned by *clHostMemAllocINTEL*, *clDeviceMemAllocINTEL*, or *clSharedMemAllocINTEL*.
* For extension versions 1.1.0 and newer:
For all devices, any pointer value is valid and may be set as an argument to a kernel.

In this definition, a valid pointer value means that the function will not return an error.
It still may not be valid to dereference the pointer inside of a kernel if the memory that the pointer points to is not accessible on the device.

*clSetKernelArgMemPointerINTEL* returns `CL_SUCCESS` if the function is executed successfully.
Otherwise, it will return one of the following errors:
Expand Down Expand Up @@ -795,6 +805,8 @@ The following errors may be returned by *clSetKernelExecInfo* for these new _par
* `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL` and no devices in the context associated with _kernel_ support device Unified Shared Memory allocations.
* `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_INDIRECT_SHARED_ACCESS_INTEL` and no devices in the context associated with _kernel_ support shared Unified Shared Memory allocations.

The <<valid-usm-pointer-argument-definition,definition of a valid pointer value>> specified using `CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL` was changed in extension version 1.1.0.

==== Filling and Copying Unified Shared Memory

The function
Expand Down Expand Up @@ -1243,21 +1255,27 @@ Note that some flags will not be valid, such as `CL_MEM_USE_HOST_PTR`.
. Should it be an error to set an unknown pointer as a kernel argument using *clSetKernelArgMemPointerINTEL* if no devices support shared system allocations?
+
--
*UNRESOLVED*:
Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced.
`RESOLVED`:
The behavior of *clSetKernelArgMemPointerINTEL* was changed in version 1.1.0 of this extension.

Prior to version 1.1.0, it was considered an error to set an arbitrary pointer value as an argument to a kernel if no devices support system USM.
This was helpful to identify possible programming errors, however it did not match the behavior of passing a pointer to a function on the host, where it is only a programming error if an invalid pointer is dereferenced.
To provide a similar programming experience, the error condition was relaxed in version 1.1.0, and any arbitrary pointer value may be passed to a kernel.

If we relax the error condition for *clSetKernelArgMemPointerINTEL* then we could also consider relaxing the error condition for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL`) similarly.
The behavior was also changed for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL`), similarly.

Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
If desired, additional checks to identify possible programming errors may still be provided via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer].
--

. Should we support a "rect" memcpy similar to *clEnqueueCopyBufferRect*?
. Should we support a 2D "rect" memcpy similar to *clEnqueueCopyBufferRect*?
+
--
*UNRESOLVED*:
This would be a fairly straightforward addition if it is useful.

Note that there is no similar SVM "rect" memcpy.
Note that there is no similar 2D "rect" memcpy for SVM.

We could also support a 2D "rect" fill or memset, though there are no similar functions for `cl_mem` buffers or SVM.
--

. Should there be an upper limit on the size of an allocation using *clHostMemAllocINTEL*?
Expand Down Expand Up @@ -1301,35 +1319,28 @@ Possible resolutions:
* Do nothing and keep the existing error behavior.
--

. Can a device USM allocation for a parent device be accessed by its sub-devices?
Can a single device shared USM allocation associated with a parent device be accessed by its sub-devices?
+
--
*UNRESOLVED*:
Since a sub-device is a partition of a parent device a USM allocation against a parent device should be accessible by its sub-devices.
We could document this expectation explicitly in this extension if it is not already covered by the main OpenCL specification.

Note that a USM allocation against a sub-device need not be accessible by its parent device or by other sibling sub-devices, though some implementations may support this, just like some implementations optionally support access to USM allocations from other devices.
--

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|A|2019-01-18|Ben Ashbaugh|*Initial revision*
|B|2019-03-25|Ben Ashbaugh|Minor name changes.
|C|2019-06-18|Ben Ashbaugh|Moved flags argument into properties.
|D|2019-07-19|Ben Ashbaugh|Editorial fixes.
|E|2019-07-22|Ben Ashbaugh|Allocation properties should be const.
|F|2019-07-26|Ben Ashbaugh|Removed DEFAULT mem alloc flag.
|G|2019-08-23|Ben Ashbaugh|Added mem alloc query for associated device.
|H|2019-10-11|Ben Ashbaugh|Added initial list and description of error codes.
|I|2019-11-14|Ben Ashbaugh|Switched from a memset to a memfill API.
|J|2019-11-18|Ben Ashbaugh|Updated a few more error conditions.
|K|2019-12-18|Krzysztof Gibala|Updated write combine description.
|L|2020-01-15|Ben Ashbaugh|Added invalid arg case to setkernelarg API.
|M|2020-01-17|Ben Ashbaugh|Minor name changes, removed const from memfree API.
|N|2020-01-22|Ben Ashbaugh|Updated write combine description.
|O|2020-01-23|Ben Ashbaugh|Added aliases for USM migration flags.
|P|2020-02-28|Ben Ashbaugh|Added blocking memfree API.
|Q|2020-03-12|Ben Ashbaugh|Name tweak for blocking memfree API, added comparison to SVM, allow zero memory advice.
|R|2020-08-21|Ben Ashbaugh|Fixed enum name typo in table.
|S|2020-08-26|Maciej Dziuban|Added initial placement flags for shared allocations.
|1.0.0|2021-11-07|Ben Ashbaugh|Added version and other minor updates prior to posting on the OpenCL registry.
|1.0.0|2022-11-08|Ben Ashbaugh|Added new issues regarding error behavior for clSetKernelArgMemPointerINTEL and rect copies.
|1.0.1|2023-08-28|Ben Ashbaugh|Documented error conditions for clSetKernelExecInfo.
|1.1.0|2024-07-30|Ben Ashbaugh|Modified error behavior for clSetKernelArgMemPointerINTEL and clSetKernelExecInfo.
|========================================

//************************************************************************
Expand Down