Fix atomic_ref narrow-type memcheck false positive #6442

Aminsed · 2025-11-02T16:25:18Z

Summary

introduce __atomic_ref_small_tag and reference_small.h so cuda::atomic_ref on sub-4-byte types performs byte-granular locking instead of widened RMWs
update atomic dispatch plumbing to route narrow references through the new implementation without disturbing existing owners
add a focused atomic_ref<int8_t> regression test covering host and device access patterns (sanitizer-friendly)

Motivation

#6430 reports that cuda::atomic_ref on narrow types trips compute-sanitizer memcheck because the implementation promotes byte writes to 32-bit RMWs. The sanitizer flags that widened access as an invalid global read, which blocks users (e.g., libcudf) from running their pipelines under memcheck.

Explanation

The patch introduces a new storage tag for narrow atomic_ref instances and backs it with a small device-side lock table. Each operation acquires a byte-granular lock, performs the necessary load/store/update, and releases the lock. Host execution still relies on the existing libatomic wrappers. This approach keeps behavior identical while preventing memcheck from seeing out-of-bounds accesses.

Rationale

Minimally invasive: the change only touches the atomic_ref storage layer; existing owning atomics and the dispatch macros stay untouched.
Deterministic under sanitizer: the lock path guarantees single-byte reads/writes, so memcheck sees the exact footprint users expect.
Regression safeguards: the new test exercises atomic_ref<int8_t> on both host and device, proving the narrow path works end-to-end (and keeps sanitizer latency low).

Testing

nvcc -std=c++20 -x cu libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/8b_integral_ref.pass.cpp -Ilibcudacxx/include -Ilibcudacxx/test/support -o /tmp/atomic_ref_small_test
compute-sanitizer --tool memcheck /tmp/atomic_ref_small_test

@miscco @griwes

copy-pr-bot · 2025-11-02T16:25:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

miscco · 2025-11-03T08:08:04Z

Thanks @Aminsed for looking into this.

We need to discuss this internally. atomics are highly performance sensitive and already a compile time sink. Given that this is something we as the implementation know to work properly, we might want to eat / disable the warning instead of adding complexity.

@wmaxey please give this a look and investigate any performance / code size issues

Aminsed · 2025-11-03T13:28:43Z

@miscco Thanks for raising the concern. I agree we shouldn't make the byte-wide path the default. I've reworked the series so the lock-backed implementation only comes into play when _LIBCUDACXX_ATOMIC_REF_ENABLE_MEMCHECK_SAFE is defined. Without that opt-in the codegen stays identical to the existing 4-byte CAS path, so there shouldn't be any runtime or compile-time impact on normal builds. Defining the macro lets us flip the byte-granular path on for targeted memcheck runs; I re-ran the regression test + memcheck with it enabled to make sure the warning still disappears. Let me know if you'd prefer a different toggle shape or additional measurements.

Aminsed · 2025-11-03T14:43:38Z

After benchmarking the opt-in macro and some thoughts I agree with @miscco that is hauling in extra code for minimal gain. I’m going to close this out, added complexity isn’t justified here.

Aminsed requested a review from a team as a code owner November 2, 2025 16:25

Aminsed requested a review from griwes November 2, 2025 16:25

github-project-automation bot added this to CCCL Nov 2, 2025

github-project-automation bot moved this to Todo in CCCL Nov 2, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 2, 2025

Fix atomic_ref narrow-type memcheck false positive

ba21911

Aminsed force-pushed the fix-atomic-ref-narrow branch from 62ffafc to ba21911 Compare November 2, 2025 16:32

Aminsed mentioned this pull request Nov 2, 2025

Round up small-type groupby outputs to 4-byte boundary rapidsai/cudf#20455

Merged

3 tasks

miscco requested a review from wmaxey November 3, 2025 08:01

libcudacxx: make byte-lock atomic_ref opt-in

7c0afe4

Aminsed closed this Nov 3, 2025

github-project-automation bot moved this from In Review to Done in CCCL Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix atomic_ref narrow-type memcheck false positive #6442

Fix atomic_ref narrow-type memcheck false positive #6442

Uh oh!

Aminsed commented Nov 2, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 2, 2025

Uh oh!

miscco commented Nov 3, 2025

Uh oh!

Aminsed commented Nov 3, 2025

Uh oh!

Aminsed commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix atomic_ref narrow-type memcheck false positive #6442

Fix atomic_ref narrow-type memcheck false positive #6442

Uh oh!

Conversation

Aminsed commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Explanation

Rationale

Testing

Uh oh!

copy-pr-bot bot commented Nov 2, 2025

Uh oh!

miscco commented Nov 3, 2025

Uh oh!

Aminsed commented Nov 3, 2025

Uh oh!

Aminsed commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aminsed commented Nov 2, 2025 •

edited

Loading