Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Latest CCCL breaks CuPy #2109

Open
1 task done
leofang opened this issue Jul 30, 2024 · 7 comments
Open
1 task done

[BUG]: Latest CCCL breaks CuPy #2109

leofang opened this issue Jul 30, 2024 · 7 comments
Assignees
Labels
bug Something isn't working right. thrust For all items related to Thrust.

Comments

@leofang
Copy link
Member

leofang commented Jul 30, 2024

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

CCCL diff (between yesterday and today): cupy/cccl@2246a87...d4f928e

Yesterday it built fine. Today we see this (from https://ci.preferred.jp/cupy.linux.cuda-build/166014/):

[00:03:14.784005 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5136)	  cupy/cuda/cupy_thrust.cu(156): error: inherited member is not allowed	
[00:03:14.784282 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5137)	    bool thrust::less<complex<float>>::operator() (	
[00:03:14.784834 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5138)	                                       ^	
[00:03:14.785276 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5139)		
[00:03:14.785932 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5140)	  cupy/cuda/cupy_thrust.cu(156): error: declaration is incompatible with "bool cuda::std::__4::less<_Tp>::operator()(const _Tp &, const _Tp &) const [with _Tp=thrust::THRUST_200600_700_NS::complex<float>]" (declared at line 350 of /tmp/tmp.8dn9ZcMlIG/cupy/_core/include/cupy/_cccl/libcudacxx/cuda/std/__functional/operations.h)	
[00:03:14.786311 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5143)	    bool thrust::less<complex<float>>::operator() (	
[00:03:14.786743 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5144)	                                       ^	
[00:03:14.787154 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5145)		
[00:03:14.787625 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5146)	  cupy/cuda/cupy_thrust.cu(154): error: "constexpr" is not valid here	
[00:03:14.788046 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5147)	    constexpr	
[00:03:14.788484 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5148)	    ^	
[00:03:14.788905 STDOUT 1352]](https://ci.preferred.jp/cupy.linux.cuda-build/166014/#L5149)

How to Reproduce

Build cupy/cupy#8412 from source

Expected behavior

No compile-time errors.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@leofang leofang added the bug Something isn't working right. label Jul 30, 2024
@leofang
Copy link
Member Author

leofang commented Jul 30, 2024

Build cupy/cupy#8412 from source

To unblock myself I've force-pushed back to the snapshot from yesterday, but starting from this commit cupy/cupy@08e6a3c it should be reproducible. By tracing the code change following the compiler error, I believe the culprit is #1872, in particular #1872 (review) (cc: @bernhardmgruber for vis).

@bernhardmgruber
Copy link
Contributor

@leofang thank you for reporting this and I am sorry for the breakage!

The specializations of Thrust's API that cupy performs looks quite extraordinary to me (I actually had to educate myself on the syntax) and I would like to suggest a better way.

In your current implementation, you specialize a member of thrust::less directly:

bool thrust::less<float>::operator() (
    const float& lhs, const float& rhs) const {
    return _real_less<float>(lhs, rhs);
}

which relies on thrust::less having an actual operator(). We changed this in #1872, since we now pull in the operator from ::cuda::std::less. If you specialize the entire functor instead, your code should work before and after the changes of #1872:

template <>
struct thrust::less<float> {
    bool operator() (float lhs, float rhs) const {
        return _real_less<float>(lhs, rhs);
    }
};

This is arguably safer and will also be forward compatible when we eventually alias thust::less to cuda::std::less entirely (we cannot do this yet, because we need to give some notice with the deprecation warnings).

As a general remark: I think redefining the meaning of thrust::less<float> is quite dangerous, since some algorithms depend on the exact meaning of this function object to provide more efficient implementations. E.g., thrust::sort detects whether the comparator is thrust::less<T> to branch into radix sort, and then takes the mathematical properties of "less than" as given, independently of how thrust::less<T> is defined.

@leofang
Copy link
Member Author

leofang commented Jul 30, 2024

Hi Bernhard, thanks for your reply.

which relies on thrust::less having an actual operator(). We changed this in #1872
...
(we cannot do this yet, because we need to give some notice with the deprecation warnings).

To me this change of behavior requires a deprecation cycle, though I do have a hard time to see where compile-time warnings can be injected for this particular case.

As a general remark: I think redefining the meaning of thrust::less<float> is quite dangerous,
...
E.g., thrust::sort detects whether the comparator is thrust::less<T> to branch into radix sort, and then takes the mathematical properties of "less than" as given, independently of how thrust::less<T> is defined.

Is this expectation documented anywhere? Would it be a concern if the comparator fulfills this "less than" property?

To give some context, CuPy had to define a custom comparator in order to follow NumPy behavior, where NaNs are sorted to the end of an array (see code comments here).

An extra note: we cannot keep breaking critical users like CuPy just casually (#1494). I've been following @jrhemstad's advertisement and encouraging the CuPy team to stay at CCCL head, but my push would be questionable if they keep hitting maintenance issues.

@jrhemstad
Copy link
Collaborator

To me this change of behavior requires a deprecation cycle

It does not. The CuPy code is incorrect in how it provides a custom comparison operator. The proper way to provide a custom comparison operator is to pass one as an argument, not reach into internal bits of the library and modify them at will.

This is covered under our compatibility guidelines: https://github.com/NVIDIA/cccl?tab=readme-ov-file#compatibility-guidelines

Do not add any declarations to the thrust::, cub::, nv::, or cuda:: namespaces unless an exception is noted for a specific symbol, e.g., specializing a type trait.

I'll grant that this could be improved to say "Do not add any declarations or template specializations...", but the point remains the same.

All that said, we don't like breaking people (even when it wasn't our fault), so we will be helping to contribute a fix to CuPy.

I look forward to when you have the time to help us set up #1494!

@bernhardmgruber bernhardmgruber self-assigned this Jul 31, 2024
@bernhardmgruber
Copy link
Contributor

PR for CuPi: cupy/cupy#8446

@leofang
Copy link
Member Author

leofang commented Jul 31, 2024

I had a somewhat lengthy discussion with Jake offline. Below is a summary of what I asked regarding specializing thrust::less vs thrust::less::operator noted above (and the offline thread), for posterity:

At one point we were even encouraged to specialize a CUB internal template (in the context of 64-bit support), so you must forgive CuPy developers to have the impression that everything template is customizable at will.

Perhaps let me ask this way: Why is specializing thrust::less ok (as suggested above and done in cupy/cupy#8446) but specializing thrust::less::operator is not? As per the (being updated) rule number 1, CCCL discourages users from specializing all things under thrust:: anyway. But as a user, all we see in the public API of thrust::less is that there’s one and only one member function we need to define, so it’s unclear where the line is between the two approaches from user perspective.

@jrhemstad
Copy link
Collaborator

everything template is customizable at will

The standard practice in C++ is that you should not specialize a template that does not belong to you unless you've been explicitly granted permission to do so. Everyone is at a different place with their experience and expertise in C++, so it's not unreasonable that someone may not be familiar with this yet.

Being given permission to specialize one template in one library does not grant permission to specialize another template in a different library 😅 .

Why is specializing thrust::less ok (as suggested above and done in cupy/cupy#8446) but specializing thrust::less::operator is not?

I explained in cupy/cupy#8446 (comment) that I don't think either specialization is correct. I think @bernhardmgruber was trying to be courteous and do the minimal possible set of changes to unblock cupy, but ultimately, the specialization is still ill-formed.

The good news is that it's very easy to fix by just using a custom comparator type instead of using thrust::less directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. thrust For all items related to Thrust.
Projects
Status: In Review
Development

No branches or pull requests

3 participants