-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to Catch2 and rework device histogram test #1695
Conversation
199a3c6
to
c148fc5
Compare
🟨 CI Results [ Failed: 81 | Passed: 117 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
🟨 CI Results [ Failed: 11 | Passed: 187 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
4adb374
to
f970f23
Compare
🟨 CI Results [ Failed: 1 | Passed: 197 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
ee4f41d
to
5b25b78
Compare
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
a0f759a
to
1da22e6
Compare
🟨 CI Results [ Failed: 8 | Passed: 190 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
1da22e6
to
e735b74
Compare
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
I discussed this PR with @gevtushenko last week and we concluded that no separate review of a behavior-retaining refactoring is necessary. I shall just move ahead with reorganizing the histogram tests as described in #1381. |
e735b74
to
ab42590
Compare
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
ab42590
to
bc66ab9
Compare
🟩 CI Results [ Failed: 0 | Passed: 198 | Total: 198 ]
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
// TODO(bgruber): what numbers to put here? radix_sort_keys has GENERATE(1, 3, 9, 15); | ||
const int entropy_reduction = GENERATE(-1, 5); // entropy_reduction = -1 -> all samples == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gevtushenko you probably know best what entropy reductions to apply :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entropy reduction here corresponds to number of iterations. Zero iterations means entropy is 1.0. Five terations leads to entropy between 0.201 and 0.0. I don't think testing with zero iterations adds anything to the test, because we just test regular input. I'd test 3 (entropy=0.337), and 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd test 3 (entropy=0.337), and 5.
Does that mean we drop the entropy -1
(all input values are zero) or not?
e967636
to
9deace0
Compare
🟨 CI Results: Pass: 90%/198 | Total Time: 1d 12h | Avg Time: 10m 57s | Hits: 91%/106803
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
9deace0
to
ad9a928
Compare
🟨 CI Results: Pass: 90%/198 | Total Time: 2d 06h | Avg Time: 16m 22s | Hits: 86%/118548
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
6c9d5af
to
5de9f56
Compare
5de9f56
to
59f8316
Compare
🟨 CI Results: Pass: 96%/198 | Total Time: 1d 04h | Avg Time: 8m 33s | Hits: 98%/128459
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
4f0d8c4
to
643e3d4
Compare
{ | ||
const auto* sample_ptr = cast_if_half_pointer(thrust::raw_pointer_cast(d_samples.data())); | ||
auto d_levels = array<c2h::device_vector<LevelT>, ActiveChannels>{}; | ||
std::copy(h_levels.begin(), h_levels.end(), d_levels.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use ``cuda::std::copy`
Please audit uses of std::
and replace with cuda::std::
is possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use ``cuda::std::copy`
I wonder why? I understand, it's needed for device code. But in ordinary host code setting up some testing stuff? Is it so we can have more test coverage of libcudacxx? Or, so we just eat our own dogfood? I am curious!
I will of course apply the requested change!
Changing all uses of std::
seems excessive. What's wrong with std::size_t
? Some Catch2 stuff only works with std::vector
, but not cuda::std::vector
. std::declval
should also be fine.
I know that there is an argument that we should be able to compile CUB without a standard library. But does that apply to unit test? How can be compile Catch2 without a stdlib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have @gevtushenko chime in, but IMHO we should use a cuda::std::
feature when it is available to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernhardmgruber @miscco the initial requirement for ::cuda::std::*
usage is related to production headers. Apart from device annotation, this requirement is caused by the fact that NVRTC doesn't provide standard library.
Regarding tests, the motivation for using std::
algorithms was that we needed a third party for verification. Since thrust::
algorithms are implemented using cub::
algorithms, we can't compute reference output using thrust::
. There's no such concern when using cuda::
algorithms, because those are going to be serial. I can see @miscco point for using cuda::
in our tests for dogfooding purposes or when we need a feature that's backported to older C++ dialects, but I wouldn't say that it's a requirement at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco I cannot use cuda::std::copy
because of execution check:
/home/bgruber/dev/cccl/libcudacxx/lib/cmake/libcudacxx/../../../include/cuda/std/__algorithm/copy.h(42): error #20011-D: calling a __host__ function("thrust::THRUST_200500_860_NS::detail::vector_base<unsigned int, ::c2h::checked_cuda_allocator<unsigned int> > & thrust::THRUST_200500_860_NS::detail::vector_base<unsigned int, ::c2h::checked_cuda_allocator<unsigned int> > ::operator =<unsigned int, ::thrust::THRUST_200500_860_NS::mr::stateless_resource_allocator<unsigned int, ::c2h::checked_host_memory_resource> > (const ::thrust::THRUST_200500_860_NS::detail::vector_base<T1, T2> &)") from a __host__ __device__ function("cuda::std::__4::__copy< ::cuda::std::__4::_ClassicAlgPolicy, const ::thrust::THRUST_200500_860_NS::detail::vector_base<unsigned int, ::thrust::THRUST_200500_860_NS::mr::stateless_resource_allocator<unsigned int, ::c2h::checked_host_memory_resource> > *, ::thrust::THRUST_200500_860_NS::detail::vector_base<unsigned int, ::c2h::checked_cuda_allocator<unsigned int> > *> ") is not allowed
cuda::std::copy
is host/device, but thrust::device_vector<T>::operator=
is only host.
🟨 CI Results: Pass: 96%/198 | Total Time: 4d 04h | Avg Time: 30m 20s | Hits: 22%/128459
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🟩 CI Results: Pass: 100%/198 | Total Time: 1d 09h | Avg Time: 10m 02s | Hits: 98%/132629
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test looks great! A few minor comments below.
{ | ||
const auto* sample_ptr = cast_if_half_pointer(thrust::raw_pointer_cast(d_samples.data())); | ||
auto d_levels = array<c2h::device_vector<LevelT>, ActiveChannels>{}; | ||
std::copy(h_levels.begin(), h_levels.end(), d_levels.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernhardmgruber @miscco the initial requirement for ::cuda::std::*
usage is related to production headers. Apart from device annotation, this requirement is caused by the fact that NVRTC doesn't provide standard library.
Regarding tests, the motivation for using std::
algorithms was that we needed a third party for verification. Since thrust::
algorithms are implemented using cub::
algorithms, we can't compute reference output using thrust::
. There's no such concern when using cuda::
algorithms, because those are going to be serial. I can see @miscco point for using cuda::
in our tests for dogfooding purposes or when we need a feature that's backported to older C++ dialects, but I wouldn't say that it's a requirement at this point.
// TODO(bgruber): what numbers to put here? radix_sort_keys has GENERATE(1, 3, 9, 15); | ||
const int entropy_reduction = GENERATE(-1, 5); // entropy_reduction = -1 -> all samples == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entropy reduction here corresponds to number of iterations. Zero iterations means entropy is 1.0. Five terations leads to entropy between 0.201 and 0.0. I don't think testing with zero iterations adds anything to the test, because we just test regular input. I'd test 3 (entropy=0.337), and 5.
* Unify generic and half overloads * Drop backend selection, since only CUB backend is used * Refactor transform ops * Avoid naked new * Drop unused include * Remove array extents for function parameters that decay to pointers * Fix warnings * Remove CDP temporary storage allocation * Reorganize dispatch * Simplify InitializeBinsWithSolution * Refactor channel params selection * Replace caching allocator by thrust vectors * Refactor InitializeBinsWithSolution * Other stuff
5be148c
to
a323296
Compare
|
||
// Compute reference result | ||
auto fp_scales = array<LevelT, ActiveChannels>{}; // only used when LevelT is floating point | ||
cs::ignore = fp_scales; // casting to void was insufficient. TODO(bgruber): use [[maybe_unsued]] in C++17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco using cuda::std::ignore =
does produce the following error on CTK 11.1:
/home/coder/cccl/cub/test/catch2_test_device_histogram.cu(281): error #174-D: expression has no effect
Using std::ignore =
produces no such warning.
a323296
to
bb77b5a
Compare
bb77b5a
to
890bec4
Compare
* Remove icc test disable * Drop verbose * Drop perf benchmark * Optimize includes * Rename template args * Use more Catch2 generators * Drop comments * Use cuda::std::array for num_levels * Simplify template parameter lists * Rework sample generation * More test variations * Refactor * More refactoring and fixes * Drop canary checks and refactor * Improve diagnostics of instantiating too large CubVector * Improve diagnostics of building large histograms * Adjust some namespace qualifiers Fixes: NVIDIA#1381
890bec4
to
8953ae5
Compare
🟩 CI Results: Pass: 100%/233 | Total Time: 1d 10h | Avg Time: 8m 49s | Hits: 98%/232058
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 233)
# | Runner |
---|---|
162 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
This PR refactors and ports the CUB device histogram unit tests to Catch2. Almost all of the previous test coverage has been retained, while adding several new tests, especially for more histogram sample data types. Test execution time has been optimized by testing certain features and edge cases individually under common settings, instead of the entire cartesian product. The current test suite is a compromise and I am open to suggestions what to add or omit.
Before this PR:
After this PR:
Fixes: #1381.
This PR requires (and includes) the following changes proposed separatly:
const
of the device histogram API: Const-qualify histogram pointer input parameters #1762.