Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll out address stability optimization #2403

Open
1 of 3 tasks
Tracked by #1947
bernhardmgruber opened this issue Sep 11, 2024 · 2 comments · May be fixed by #2404
Open
1 of 3 tasks
Tracked by #1947

Roll out address stability optimization #2403

bernhardmgruber opened this issue Sep 11, 2024 · 2 comments · May be fixed by #2404
Assignees
Labels
thrust For all items related to Thrust.

Comments

@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Sep 11, 2024

Once #2389 is merged, thrust::transform queries a trait in libcu++ whether it is safe to make copies of the arguments to the transformation function before passing them. This allows various optimizations in cub::DeviceTransform.

For now, the user has to explicitly opt into this optimization by annotating their function objects or specializing the trait. We should introduce some automatic detection of whether copying of arguments is possible.

This detection can/must consider:

  • known function objects (such as thrust::plus or cuda::std::plus) for which we (or the C++ standard) defines the behavior
  • whether the parameter types are value types (copying creates independendent objects) or (proxy) reference types (copying retains the reference to the original object)

Tasks

  1. cub libcu++ thrust
    bernhardmgruber
  2. bernhardmgruber
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Sep 11, 2024
@bernhardmgruber bernhardmgruber self-assigned this Sep 11, 2024
@bernhardmgruber bernhardmgruber added the thrust For all items related to Thrust. label Sep 11, 2024
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Nov 4, 2024
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Nov 6, 2024
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Nov 6, 2024
@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Nov 6, 2024

Here is a list of what I think we can proclaim to allow copying arguments before calling. A ✔ indicates that the optimization was implemented and merged.

Proclaiming certain categories of types as allowing copies of their arguments:

  1. ✔ WORKS: known functors wrapping other functors can forwarded to their inner functor, e.g. thrust::zip_function, cub::InequalityWrapper, cuda::std::__not_fn_t
  2. ✔ WORKS: standard functors with fundamental types, e.g. thrust::plus<int>, cuda::std::plus<float*>
  3. WORKS: standard functors with known types, e.g. thrust::plus<::cuda::std::complex<float>>
  4. FAILS: standard functors with any unknown type: thrust::plus<MyInt> or thrust::plus<::cuda::std::complex<MyCustomFloat>>
    • even though we know what thrust::plus does, we don't know what MyInt::operator+() or MyCustomFloat::operator+() does
  5. WORKS: transparent standard functors, e.g. thrust::plus<void>
    • their operator()(T&& a, T&& b) { return forward(a) + forward(b); } can later be called with any kind of type and we don't know what T::operator+() does
    • however, if we know the argument types (which we do inside thrust::transform) we can allow the optimization.
  6. WORKS: unknown functors with a single operator()(int...) with known/fundamental argument types all taken by value
  7. FAILS: unknown functors with a single operator()(T&...) where any type is a reference
  8. FAILS: unknown functors with a single operator()(T...) where any type is unknown (even if taken by value)
    • the unknown type could have reference semantics, e.g. a type like std::reference_wrapper<int>. In a transform implementation calling the functor like functor(*it), where it is an iterator into global memory containing just ints, a by-value std::reference_wrapper<int> argument causes a temporary std::reference_wrapper<int> to be constructed which binds to the address of the int in global memory, and is then copied into the functor, retaining the address.
    • we cannot build an exhaustive list of all copyable types with reference semantics
  9. WORKS: unknown functors with a single operator()(T...) , where each T is taken by value and a deduced template parameter.
    • this may be hard to detect
    • requires again to know the argument types (which we do inside thrust::transform)
  10. FAILS: unknown functors with multiple operator()(...)s * we cannot find out which overload would be called inside the transformation implementation (C++ does not allow that)
    • we cannot inspect the signatures of the overload sets

In #2389 we made a late design choice to simplify the trait to proclaims_copyable_arguments<Functor> from proclaims_copyable_arguments<Functor, Args...>, where Args... would be the arguments that the functor will be called with (the iterator value types). The latter version would have given us a few more detection possibilities if operator() was a template, but we deemed the usability improvements of not having the variadic template parameter as more important.

This really leaves us in a pitiful situation where we can barely allow copying for any function object that we don't know all types involved (i.e. they are thrust functors + fundamental types or thrust types). In addition, hardcoding a set of blessed thrust types (e.g. as template argument to thrust::plus<T>, is also cumbersome.

I therefore suggest we should consider giving up stable address guarantees in some situations. E.g.:

  • Allow to optimize case 4 and 5. If users implement an operator+ (for thrust::plus) or operator== (for thrust::equal_to) that relies on the argument address, we won't have them covered.
  • Allow to optimize case 8. when all types are taken by value. If a user uses a type by value with reference semantics (like std::reference_wrapper, we won't have them covered.

We could also consider documenting that users are not allowed to rely on the address of arguments to their functors. We cannot give them a warning when they do though. Further down the road, e.g. at a major library release, we could consider enabling the optimization for all types in the end, which would cause undefined behavior in user programs (a bad breaking change). However, I have observed that in such cases the program crashes due to pointer arithmetic being performed between different state spaces. I don't know whether this is deterministic, but it could be a cost we may deem worth it at some point.

bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Nov 12, 2024
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Nov 12, 2024
@bernhardmgruber bernhardmgruber changed the title Automatic address stability detection Roll out address stability optimization Nov 12, 2024
@bernhardmgruber
Copy link
Contributor Author

I discussed this with @gevtushenko today and we concluded to focus on the low-hanging fruits for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thrust For all items related to Thrust.
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant