You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Optimize. CUB Thread-Level Reduction is a pure sequential operation, leaving the opportunity for ILP optimizations with tree-reduction strategy. This optimization is not straightforward because some instructions provide a 3-way version depending on specific GPU archs
Cleanup. There are several overloading functions and the calls sequence is not easy to follow. In addition, the prefix version could inhibit vector optimizations.
Expose. Add Thread-Level Reduction in the cub:: namespace. Provide documentation and add related tests
Describe the solution you'd like
Optimize, Cleanup, and Expose CUB Thread-Level Reduction
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
prefix
version could inhibit vector optimizations.cub::
namespace. Provide documentation and add related testsDescribe the solution you'd like
Optimize, Cleanup, and Expose CUB Thread-Level Reduction
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: