Skip to content

Conversation

@Aminsed
Copy link
Contributor

@Aminsed Aminsed commented Nov 2, 2025

Summary

  • introduce __atomic_ref_small_tag and reference_small.h so cuda::atomic_ref on sub-4-byte types performs byte-granular locking instead of widened RMWs
  • update atomic dispatch plumbing to route narrow references through the new implementation without disturbing existing owners
  • add a focused atomic_ref<int8_t> regression test covering host and device access patterns (sanitizer-friendly)

Motivation

#6430 reports that cuda::atomic_ref on narrow types trips compute-sanitizer memcheck because the implementation promotes byte writes to 32-bit RMWs. The sanitizer flags that widened access as an invalid global read, which blocks users (e.g., libcudf) from running their pipelines under memcheck.

Explanation

The patch introduces a new storage tag for narrow atomic_ref instances and backs it with a small device-side lock table. Each operation acquires a byte-granular lock, performs the necessary load/store/update, and releases the lock. Host execution still relies on the existing libatomic wrappers. This approach keeps behavior identical while preventing memcheck from seeing out-of-bounds accesses.

Rationale

  • Minimally invasive: the change only touches the atomic_ref storage layer; existing owning atomics and the dispatch macros stay untouched.
  • Deterministic under sanitizer: the lock path guarantees single-byte reads/writes, so memcheck sees the exact footprint users expect.
  • Regression safeguards: the new test exercises atomic_ref<int8_t> on both host and device, proving the narrow path works end-to-end (and keeps sanitizer latency low).

Testing

  • nvcc -std=c++20 -x cu libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/8b_integral_ref.pass.cpp -Ilibcudacxx/include -Ilibcudacxx/test/support -o /tmp/atomic_ref_small_test
  • compute-sanitizer --tool memcheck /tmp/atomic_ref_small_test

@miscco @griwes

@Aminsed Aminsed requested a review from a team as a code owner November 2, 2025 16:25
@Aminsed Aminsed requested a review from griwes November 2, 2025 16:25
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 2, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 2, 2025
@miscco
Copy link
Contributor

miscco commented Nov 3, 2025

Thanks @Aminsed for looking into this.

We need to discuss this internally. atomics are highly performance sensitive and already a compile time sink. Given that this is something we as the implementation know to work properly, we might want to eat / disable the warning instead of adding complexity.

@wmaxey please give this a look and investigate any performance / code size issues

@Aminsed
Copy link
Contributor Author

Aminsed commented Nov 3, 2025

@miscco Thanks for raising the concern. I agree we shouldn't make the byte-wide path the default. I've reworked the series so the lock-backed implementation only comes into play when _LIBCUDACXX_ATOMIC_REF_ENABLE_MEMCHECK_SAFE is defined. Without that opt-in the codegen stays identical to the existing 4-byte CAS path, so there shouldn't be any runtime or compile-time impact on normal builds. Defining the macro lets us flip the byte-granular path on for targeted memcheck runs; I re-ran the regression test + memcheck with it enabled to make sure the warning still disappears. Let me know if you'd prefer a different toggle shape or additional measurements.

@Aminsed
Copy link
Contributor Author

Aminsed commented Nov 3, 2025

After benchmarking the opt-in macro and some thoughts I agree with @miscco that is hauling in extra code for minimal gain. I’m going to close this out, added complexity isn’t justified here.

@Aminsed Aminsed closed this Nov 3, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants