-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
I implemented dlpack v1 for CuPy (see cupy/cupy#8683), and there are two choices that are important for other implementations and maybe the spec:
- We chose to export the
cudaManageddevice when possible even ifdl_device=(CPU, 0)was requested. I.e. we promise that the data can be used on theCPUdevice, but cupy currently will still give you the actual (compatible) device!- Note: NumPy is OK with this in the case of cuda managed memory. But it may not yet be OK with it in the case of future/other similar devices. (I.e. NumPy may need to trust the producer in this case, or we just keep it a bit of a fuzzy thing where we assume the consumer should know the device, possible based on version.)
- If user passes
dl_device=(CPU, 0), stream=.... We had discussed that the semantics must be related to the device that the data is on, I think. CuPy supports this:stream=None(or nothing passed), will synchronize the device to host copy (i.e. wait until the data is CPU available).stream=consumer_streamwill not synchronize. The user could in theory work with the data (e.g. anothercudaAsyncCopy) onconsumer_stream, or synchronize themselves (e.g. if multiple copies needed).- REASON: One reason is that synchronizing in the second case would achieve nothing that
stream=Nonedoesn't already achieve. It would effectively do the samestream=Noneand also synchronize theconsumer_stream. (But that stream does not need to be synchronized!)
CC @leofang.
Metadata
Metadata
Assignees
Labels
No labels