[BUG]: Ensure cudaMemcpy is called by thrust::copy #210

gonzalobg · 2023-07-12T11:25:43Z

We should ensure that this:

void test(double* in, double* out, size_t n) {
    thrust::copy(thrust::device, in, in + n, out);
}

calls cudaMemcpy or cudaMemcpyAsync with cudaMemcpyDefault.

Right now it does not seem to be happening. @jrhemstad

The text was updated successfully, but these errors were encountered:

miscco · 2023-07-12T11:30:53Z

Thanks a lot for raising this potential performance issue. We have moved to our new mono repo, I will duplicate the issue there.

jrhemstad · 2023-07-12T14:00:57Z

Thanks a lot for raising this potential performance issue. We have moved to our new mono repo, I will duplicate the issue there.

I'll just transfer it.

jrhemstad · 2023-07-12T14:05:30Z

So it seems that what we're missing here is that we should detect when the input/output iterator satisfy is_contiguous_iterator we should just use a memcpy?

miscco · 2023-07-12T14:43:53Z

We are currently in the process of merging the first part of <ranges> so detection of contiguous_iterator should be easy enough soon^TM

jrhemstad · 2023-07-12T14:44:46Z

We may want to land this change sooner, and Thrust already has an is_contiguous_iterator trait, so we can use that in the mean time.

gevtushenko · 2023-07-12T14:53:03Z

So it seems that what we're missing here is that we should detect when the input/output iterator satisfy is_contiguous_iterator we should just use a memcpy?

@jrhemstad we already identify when it's safe to use memcpy: 1, 2. To my understanding, the issue is that we are using explicit cudaMemcpyHostToDevice and cudaMemcpyDeviceToHost instead of cudaMemcpyDefault that's requested in this issue. @gonzalobg to verify.

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

gevtushenko · 2023-07-18T07:53:36Z

The issue was accidentally close, reopening it.

jrhemstad · 2023-07-19T16:31:42Z

@gonzalobg could you provide a reproducer for this so we can be sure it is addressed correctly?

gonzalobg · 2023-07-19T16:35:05Z

The only way I could imagine testing this is with benchmarks: benchmarking memory BW and making sure it matches cudaMemcpy. Is that what you are looking for?

miscco · 2023-07-19T17:08:54Z

I believe we were talking about potential bugs, where managed memory is involved and we cal cudaMemcpyAsync with cudaMemcpyDeviceToDevice and it crashes

gevtushenko · 2023-07-19T17:11:14Z

@jrhemstad we already identify when it's safe to use memcpy: 1, 2. To my understanding, the issue is that we are using explicit cudaMemcpyHostToDevice and cudaMemcpyDeviceToHost instead of cudaMemcpyDefault that's requested in this issue. @gonzalobg to verify.

@gonzalobg I thought the issue is about using cudaMemcpyDefault. When you specify device execution policy and provide pointers, we'll currently segfault because cudaMemcpyAsync is used with explicit direction:

#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

int main() {
  constexpr int n = 10;
  thrust::device_vector<int> src(n);
  int dst[n];

  int *src_ptr = thrust::raw_pointer_cast(src.data());
  int *dst_ptr = dst;

  thrust::copy_n(thrust::device, src_ptr, n, dst_ptr);
}

// terminate called after throwing an instance of 'thrust::system::system_error'
// what():  __copy:: D->D: failed: cudaErrorInvalidValue: invalid argument

gonzalobg · 2023-07-19T19:19:22Z

The issue I am running into is H2D copies being slow for iterators of raw pointer type (T*) because instead of using cudaMemcpy with cudaMemcpyDefault we are doing something else in those cases.

These pointers are allocated with cudaMallocManaged or malloc.

jrhemstad transferred this issue from NVIDIA/thrust Jul 12, 2023

miscco changed the title ~~Ensure cudaMemcpy is called by thrust::copy~~ [FEA]: Ensure cudaMemcpy is called by thrust::copy Jul 12, 2023

miscco added nvbug Has an associated internal NVIDIA NVBug. thrust For all items related to Thrust. libcu++ For all items related to libcu++ bug Something isn't working right. labels Jul 12, 2023

miscco changed the title ~~[FEA]: Ensure cudaMemcpy is called by thrust::copy~~ [BUG]: Ensure cudaMemcpy is called by thrust::copy Jul 12, 2023

miscco added a commit to miscco/cccl that referenced this issue Jul 12, 2023

Enable use of cudaMemcpyAsync for thrust::copy

de055e0

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

miscco added a commit to miscco/cccl that referenced this issue Jul 12, 2023

Enable use of cudaMemcpyAsync for thrust::copy

2ae6b5b

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

miscco added a commit to miscco/cccl that referenced this issue Jul 12, 2023

Enable use of cudaMemcpyAsync for thrust::copy

3f8dedd

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

miscco mentioned this issue Jul 12, 2023

Enable use of cudaMemcpyAsync for thrust::copy #211

Merged

miscco added a commit to miscco/cccl that referenced this issue Jul 14, 2023

Enable use of cudaMemcpyAsync for thrust::copy

c659362

In case of contigous ranges of trivially relocatable types we can directly utilize `cudaMemcpyAsync` instead of going through transform. Fixes NVIDIA#210

miscco closed this as completed in ce6a462 Jul 18, 2023

gevtushenko reopened this Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Ensure cudaMemcpy is called by thrust::copy #210

[BUG]: Ensure cudaMemcpy is called by thrust::copy #210

gonzalobg commented Jul 12, 2023 •

edited

Loading

miscco commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

miscco commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

gevtushenko commented Jul 12, 2023

gevtushenko commented Jul 18, 2023

jrhemstad commented Jul 19, 2023

gonzalobg commented Jul 19, 2023

miscco commented Jul 19, 2023

gevtushenko commented Jul 19, 2023

gonzalobg commented Jul 19, 2023

[BUG]: Ensure cudaMemcpy is called by thrust::copy #210

[BUG]: Ensure cudaMemcpy is called by thrust::copy #210

Comments

gonzalobg commented Jul 12, 2023 • edited Loading

miscco commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

miscco commented Jul 12, 2023

jrhemstad commented Jul 12, 2023

gevtushenko commented Jul 12, 2023

gevtushenko commented Jul 18, 2023

jrhemstad commented Jul 19, 2023

gonzalobg commented Jul 19, 2023

miscco commented Jul 19, 2023

gevtushenko commented Jul 19, 2023

gonzalobg commented Jul 19, 2023

gonzalobg commented Jul 12, 2023 •

edited

Loading