Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clEnqueueSVMUnmap #890

Open
bananAshkar opened this issue Mar 9, 2023 · 14 comments · May be fixed by #979
Open

clEnqueueSVMUnmap #890

bananAshkar opened this issue Mar 9, 2023 · 14 comments · May be fixed by #979
Labels
needs-cts-coverage The CTS needs to be extended OpenCL API Spec Issues related to the OpenCL API specification.

Comments

@bananAshkar
Copy link

about the function clEnqueueSVMUnmap , what happens if we call this function without calling clEnqueueSVMMap first? this is not well-defined in openCL 3.0 Reference Page (spec)

@bashbaug
Copy link
Contributor

bashbaug commented Mar 9, 2023

Interesting question.

We currently say for the description of the svm_ptr argument to clEnqueueSVMUnmap:

svm_ptr is a pointer that was specified in a previous call to clEnqueueSVMMap. If svm_ptr is allocated using clSVMAlloc then it must be allocated from the same context from which command_queue was created. Otherwise the behavior is undefined.

IMHO this is ambiguous whether "otherwise the behavior is undefined" applies to just the sentence about being allocated from the same context as the command queue or whether it applies to svm_ptr being a pointer previously used with clEnqueueSVMMap also.

The only relevant error condition is:

CL_INVALID_VALUE if svm_ptr is NULL.

Do you have a preference how you would like this to behave?

It would also be interesting to know how different implementations behave in this case.

@aharon-abramson
Copy link
Contributor

In our implementation, we don't check if the buffer has been mapped or not, and we perform the unmapping operation anyway, which may be redundant, but harmless.
I believe that returning an error would be better in this case.

@kpet
Copy link
Contributor

kpet commented Aug 1, 2023

We (Arm) return CL_INVALID_VALUE in that case. We'd support this becoming the specified behaviour.

bashbaug added a commit to bashbaug/SimpleOpenCLSamples that referenced this issue Aug 2, 2023
bashbaug added a commit to bashbaug/SimpleOpenCLSamples that referenced this issue Aug 2, 2023
bashbaug added a commit to bashbaug/SimpleOpenCLSamples that referenced this issue Aug 2, 2023
@bashbaug
Copy link
Contributor

bashbaug commented Aug 2, 2023

I tried a couple of Intel OpenCL implementations and it looks like our CPU implementation returns CL_INVALID_VALUE for this case and our GPU implementation returns CL_SUCCESS. I saw similar mixed behavior in a couple of the non-Intel OpenCL implementations I tried, too.

I don't think it would be too hard for our GPU implementation to return an error if this is what we decide to do. Theoretically this could cause an app to fail, since we would be returning an error (and not generating an event) when we weren't previously, but this probably won't be a problem, especially given that other OpenCL implementations are generating an error in this case.

In case it's helpful: https://github.com/bashbaug/SimpleOpenCLSamples/blob/svm-unmap-test/samples/99_svmunmap/main.cpp

@lakshmih
Copy link
Contributor

Our (Qualcomm) implementation returns CL_INVALID_VALUE for this situation.

@karolherbst
Copy link
Contributor

We don't return an error in Mesa for this, but it would be consistent with clEnqueueUnmapMemObject to return one.

@bashbaug
Copy link
Contributor

Discussed in the August 22nd teleconference:

  • Implementations should return an error in this case - CL_INVALID_VALUE.
  • We don't currently have a CTS test for this case, but we'll look into adding one (pending a negative testing discussion).

@bashbaug bashbaug added OpenCL API Spec Issues related to the OpenCL API specification. needs-cts-coverage The CTS needs to be extended labels Oct 16, 2023
@bashbaug
Copy link
Contributor

I have a (draft) PR written up that adds this error condition in #979, but I'm starting to have second thoughts whether it is actually possible to reliably return an error in this case. Consider a tricky case like the following (pseudocode):

event1 = clCreateUserEvent();
clEnqueueSVMUnmap(queue1, ptr, depends on event1);

event0 = clCreateUserEvent();
clEnqueueSVMMap(queue0, ptr, depends on event0);

Observations are:

  1. We can't determine at the call to clEnqueueSVMUnmap whether the ptr is mapped or not - if event0 is set to complete first then the map will execute first, and if event1 is set to complete first then the unmap will execute first.
  2. There are other cases without using user events where an SVM map may execute before an SVM unmap but where there are no guarantees.
  3. Buffer and image memory objects don't have a similar problem for two reasons: the pointer to unmap is (usually) unknown until the memory object is mapped, therefore as a practical matter the call to map the memory object must (usually) occur before the call to unmap the memory object. It is also possible to query the memory object's map count, which provides another mechanism to determine if the memory object has been mapped or not.

Given (3), is the expectation that implementations track which SVM regions have been mapped similarly, even though there is (currently) no queryable map count? Should the example above generate an error because clEnqueueSVMUnmap was called before clEnqueueSVMMap, even though the map may actually execute before the unmap?

@aharon-abramson
Copy link
Contributor

Given your interesting examples, I believe we can live without this error case. However, maybe we need a note in the spec to explain why we don't handle this case.

@bashbaug
Copy link
Contributor

We observed in the teleconference on October 17th that there could be a similar issue with buffer and image memory objects, also. Specifically, when is the map count updated, and is it used to identify the (synchronous) runtime error?

Consider this example:

ptr = clEnqueueMapBuffer(queue0, buffer, blocking_map = CL_TRUE);
// map was blocking, so now MAP_COUNT is one

event0 = clCreateUserEvent();
clEnqueueUnmapMemObject(queue0, ptr, depends on event0);
// what is the MAP_COUNT now?

event1 = clCreateUserEvent();
clEnqueueUnmapMemObject(queue1, ptr, depends on even1); // is this an error?
// what is the MAP_COUNT now?

I haven't written code to try this case specifically, but I do have code in the OpenCL Intercept Layer to display the queryed map count after mapping and unmapping memory objects and empirically it seems like most implementations upate the map count as soon as the host API executes.

@karolherbst
Copy link
Contributor

I just noticed that I never implemented CL_MEM_MAP_COUNT in rusticl... Maybe we should have some CTS for it? Though given that the spec isn't really giving a lot of details on CL_MEM_MAP_COUNT that might be a bit challenging to do.

@bashbaug
Copy link
Contributor

I made a quick tester to try this out:
https://github.com/bashbaug/SimpleOpenCLSamples/compare/map-unmap-tester

Results were pretty similar across all of the implementations I've tried, though there were a few minor differences. I agree it would be good to have some CTS tests to exercise this, at least for the common cases.

In summary:

  • The implementations I tested seemed to increase the map count immediately after calling clEnqueueMapBuffer, regardless whether the call was blocking or non-blocking. Note though, I didn't test holding a call back with a user event dependency (yet).
  • Most of the implementations I tested also seemed to decrease the map count immediately after calling clEnqueueUnmapMemObject, even when the call was held back with a user event dependency, though this isn't universal.
  • Regardless of the map count, all implementations I tested returned an error when the number of unmap calls exceeds the number of map calls, even if some of the unmap calls are held back with a user event dependency.

Example output:

Running on platform: Intel(R) OpenCL Graphics
Running on device: Intel(R) Arc(TM) A750 Graphics
Initially: map count is 0
After blocking map: map count is 1
After non-blocking map: map count is 2
After another non-blocking map: map count is 3
After finish: map count is 3
After unmap: map count is 2
After finish: map count is 2
After unmap with event dependency: map count is 1
After unmap with event dependency: map count is 0
Error: commandQueue.enqueueUnmapMemObject(buf, ptr2, &deps2) returned -30
After duplicate unmap with event dependency: map count is 0
After setting event status to complete: map count is 0
After finish: map count is 0

@bashbaug
Copy link
Contributor

bashbaug commented Mar 5, 2024

Consider separating into multiple issues:

  • Do we want to return an error from clEnqueueSVMUnmap for an arbitrary pointer?
  • Do we want to return an error from clEnqueueSVMUnamp if it is a valid SVM pointer but it has not been mapped?
  • When should (must?) the map count on a memory object update?
  • When is it an error to unmap a memory object multiple times?

All of these scenarios need CTS tests.

@aharon-abramson
Copy link
Contributor

  • We should return an error when calling clEnqueueSVMUnmap for an arbitrary pointer.
  • Delay the check if the pointer has been mapped to the unmap command's execution, i.e. the API should return CL_SUCCESS, but the event should have a negative status.
  • The count should be updated when the map/unmap commands have been completed.
  • It is an error to unmap a memory object with a map counter that equals 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cts-coverage The CTS needs to be extended OpenCL API Spec Issues related to the OpenCL API specification.
Projects
Status: Needs WG discussion
Development

Successfully merging a pull request may close this issue.

6 participants