Skip to content

Implementation in CuPy with stream= and dl_device=cpu choices #152

@seberg

Description

@seberg

I implemented dlpack v1 for CuPy (see cupy/cupy#8683), and there are two choices that are important for other implementations and maybe the spec:

  1. We chose to export the cudaManaged device when possible even if dl_device=(CPU, 0) was requested. I.e. we promise that the data can be used on the CPU device, but cupy currently will still give you the actual (compatible) device!
    • Note: NumPy is OK with this in the case of cuda managed memory. But it may not yet be OK with it in the case of future/other similar devices. (I.e. NumPy may need to trust the producer in this case, or we just keep it a bit of a fuzzy thing where we assume the consumer should know the device, possible based on version.)
  2. If user passes dl_device=(CPU, 0), stream=.... We had discussed that the semantics must be related to the device that the data is on, I think. CuPy supports this:
    • stream=None (or nothing passed), will synchronize the device to host copy (i.e. wait until the data is CPU available).
    • stream=consumer_stream will not synchronize. The user could in theory work with the data (e.g. another cudaAsyncCopy) on consumer_stream, or synchronize themselves (e.g. if multiple copies needed).
    • REASON: One reason is that synchronizing in the second case would achieve nothing that stream=None doesn't already achieve. It would effectively do the same stream=None and also synchronize the consumer_stream. (But that stream does not need to be synchronized!)

CC @leofang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions