Skip to content

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jan 14, 2026

#6204 changed cub::DeviceTransform APIs from taking cudaStream_t to environments. Special handling preserved support for cudaStream_t. However, user-provided stream types with conversion operators to cudaStream_t were now queried as environments, failing to return a stream.

This PR treats any type that is convertible to a cudaStream_t not as an environment and extracts the underlying stream.

Fixes NVBug 5813928

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner January 14, 2026 23:13
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jan 14, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Jan 14, 2026
num_items,
::cuda::std::move(transform_op),
get_stream(env));
::cuda::std::move(env));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by fix

Comment on lines +44 to +53
auto run = [&](auto streamish) {
cub::DeviceTransform::Transform(cuda::std::make_tuple(a, b), result.begin(), num_items, _1 + _2, streamish);
};
SECTION("raw stream")
{
cub::DeviceTransform::Transform(cuda::std::make_tuple(a, b), result.begin(), num_items, _1 + _2, stream);
run(stream);
}
SECTION("custom stream")
{
run(custom_stream{stream});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that the stream is extracted in the debugger, but I wonder if I could write the unit test in a way to detect if the default stream was taken anywhere. Does anybody know if I can query the stream whether something was really enqueued there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution would be to start a graph capture on a stream and see if anything was captured, but that might have some limitations, not sure if its applicable here

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to better understand where the issue with the conversions from cudaStream_t lies so that we can extent our get_stream CPO

What was the exact case failing? from what I can see it the second and third branch in the if constexpr should already work

Maybe we just need another clause for get_stream

}
else
{
return ::cuda::std::execution::__query_or(env, ::cuda::get_stream, ::cuda::stream_ref{cudaStream_t{}}).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to what we have in get_stream

I believe the issue is that we are checking whether tis convertible to stream_ref and not cudaStream_t

But at the same time cudaStream_t should be convertible to stream_ref

@bernhardmgruber bernhardmgruber marked this pull request as draft January 15, 2026 08:52
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 15, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Jan 15, 2026
@bernhardmgruber
Copy link
Contributor Author

Discussed this PR with @miscco and we concluded that the environment query for cuda::get_stream misses a check whether the environment is convertible to a cudaStream_t. He will add this and then we can replace the get_stream in this PR by just the environment query.

@miscco
Copy link
Contributor

miscco commented Jan 15, 2026

/ok to test cc8f29e

@miscco miscco marked this pull request as ready for review January 15, 2026 09:26
@miscco miscco requested a review from a team as a code owner January 15, 2026 09:26
@miscco miscco requested a review from griwes January 15, 2026 09:26
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Jan 15, 2026
@miscco
Copy link
Contributor

miscco commented Jan 15, 2026

I have updated our get_stream CPO to ensure that it also works with types that are convertible to ::cudaStream_t as apposed to ::cuda::stream_ref

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) January 15, 2026 12:36
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@bernhardmgruber
Copy link
Contributor Author

I started the backport already from the current state: #7263, since I need to rebase and change this PR again, since #6914 was merged and we have conflicts.

bernhardmgruber and others added 2 commits January 16, 2026 00:28
NVIDIA#6204 changed cub::DeviceTransform APIs from taking cudaStream_t to environments. Special handling preserved support for cudaStream_t. However, user-provided stream types with conversion operators to cudaStream_t were now queried as environments, failing to return a stream.
We should not have a special overload for the `get_stream` overload but should ensure that we can pass types that are convertible to `::cudaStream_t` and extract that stream

I have expanded the get_stream CPO to accept a `::cudaStream_t __stream` so that we can work with those types
@bernhardmgruber bernhardmgruber force-pushed the fix_custom_stream_transform branch from cc8f29e to 8ff0cdd Compare January 15, 2026 23:29
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 6h 06m: Pass: 100%/126 | Total: 5d 13h | Max: 5h 42m | Hits: 89%/250239

See results here.

@bernhardmgruber bernhardmgruber merged commit 31f8a13 into NVIDIA:main Jan 16, 2026
139 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jan 16, 2026
@github-actions
Copy link
Contributor

Successfully created backport PR for branch/3.2.x:

github-actions bot pushed a commit that referenced this pull request Jan 16, 2026
* Fix extracting CUDA stream in cub::DeviceTransform

#6204 changed cub::DeviceTransform APIs from taking cudaStream_t to environments. Special handling preserved support for cudaStream_t. However, user-provided stream types with conversion operators to cudaStream_t were now queried as environments, failing to return a stream.

* Properly use `get_stream` in device transform

We should not have a special overload for the `get_stream` overload but should ensure that we can pass types that are convertible to `::cudaStream_t` and extract that stream

I have expanded the get_stream CPO to accept a `::cudaStream_t __stream` so that we can work with those types

Co-authored-by: Michael Schellenberger Costa <[email protected]>
(cherry picked from commit 31f8a13)
@bernhardmgruber bernhardmgruber deleted the fix_custom_stream_transform branch January 16, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants