[BUG]: extra unnecessary cudaStreamSynchronize #261

gonzalobg · 2023-07-24T14:02:37Z

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Performance

Thrust

thrust::transform_reduce(thrust::device, ... flow is the following:

The cudaStreamSynchronize in step 3 is unnecessary when get_value is going to perform a stream-ordered operation and synchronize.

This was discovered in TeaLeaf and impacts std::transform_reduce performance.

Use the transform reduce algorithm on trivial types on device memory or unified memory.

No unnecessary cudaStreamSynchronize.

No response

No response

No response

No response

The text was updated successfully, but these errors were encountered:

Provide feedback