-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce CUB transform reduce #1091
Introduce CUB transform reduce #1091
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 👍
Q: Would it make sense to also offer a block-level TransformReduce? I wonder if it could help reduce register usage? |
I think this is a good idea! @leofang could you please file an issue with this request so we can prioritize it? |
Yup done 🙂 #1121 |
b4ba1ac
to
0ab0966
Compare
Description
closes #435
This PR introduces
cub::DeviceReduce::TransformReduce
. Compared to using transform iterator withcub::DeviceReduce::Reduce
, transform reduce preserves vectorized loads, providing about 10% perf improvement onstd::uint8_t
. This PR doesn't lead to any SASS differences incub::DeviceReduce::Reduce
. PR also utilizes newcub::DeviceReduce::TransformReduce
in Thrust.Checklist