-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
device global variables are a mess #2441
Comments
Bottom line, I think we should explore ways to simplify the syntax to declare and use global variables, constant memory variables, and |
Updated 2024.12.16 to support uninitialised variables (to support Here is one idea to simplify the syntax: namespace {
template <typename... Args>
consteval size_t count_arguments(Args&&...) {
return sizeof...(Args);
}
} // namespace
#if defined(__CUDA_ARCH__) or defined(__HIP_DEVICE_COMPILE__)
// CUDA and HIP: Use __device__ and initialize directly
#define DEVICE_GLOBAL(type, name, ...) __device__ std::type_identity_t<type> name __VA_OPT__({__VA_ARGS__})
#elif defined(__SYCL_DEVICE_ONLY__)
// SYCL: Use device_global and initialize via brace-enclosed list
#define DEVICE_GLOBAL(type, name, ...) \
sycl::ext::oneapi::experimental::device_global< \
std::conditional_t<std::is_unbounded_array_v<type>, std::remove_extent_t<type>[count_arguments(__VA_ARGS__)], type>> \
name __VA_OPT__({__VA_ARGS__})
#else
// CPU backend
#define DEVICE_GLOBAL(type, name, ...) std::type_identity_t<type> name __VA_OPT__({__VA_ARGS__})
#endif It can be used as // global device simple variables
DEVICE_GLOBAL(float, uninitialised_variable);
DEVICE_GLOBAL(float, initialised_variable, 99.);
// global device arrays
DEVICE_GLOBAL(float[4], uninitialised_array);
DEVICE_GLOBAL(float[4], initialised_array, 9.2, -0.1, 5.7, -4.3);
DEVICE_GLOBAL(float[], auto_sized_array, 9.2, -0.1, 5.7, -4.3);
// global device aggregates
struct Dim{ float x; float y; float z; float w; };
DEVICE_GLOBAL(Dim, uninitialised_aggregate);
DEVICE_GLOBAL(Dim, initialised_aggregate, 9.2, -0.1, 5.7, -4.3);
// extern, static, inline, const, and other attributes
[[maybe_unused]] extern DEVICE_GLOBAL(float, uninitialised_extern;
[[maybe_unused]] static DEVICE_GLOBAL(float, initialised_static, 99.);
[[maybe_unused]] inline DEVICE_GLOBAL(float, initialised_inline, 99.);
[[maybe_unused]] const DEVICE_GLOBAL(float, initialised_const, 99.); or, if one prefers, also as DEVICE_GLOBAL(float, initialised_variable){ 99. };
DEVICE_GLOBAL(float[4], initialised_array){ 9.2, -0.1, 5.7, -4.3 }; |
And also I did not try to |
Playing a bit more with the implementation at #2441 (comment), the first issue is with the access to the members of a struct Dim{ float x; float y; float z; float w; };
DEVICE_GLOBAL(Dim, aggregate){ 9.2, -0.1, 5.7, -4.3 }; With the CUDA, HIP and CPU back-ends we can do simply printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
aggregate.x,
aggregate.y,
aggregate.z,
aggregate.w); However the SYCL back-end requires either printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
aggregate.get().x,
aggregate.get().y,
aggregate.get().z,
aggregate.get().w); or an explicit cast: printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
static_cast<Dim&>(aggregate).x,
static_cast<Dim&>(aggregate).y,
static_cast<Dim&>(aggregate).z,
static_cast<Dim&>(aggregate).w); |
So the options seem to be:
And then we need to decide if we want to let the developers:
Update: during the Zoom call on 17/12/2024 we agreed to let the simple syntax fail in all cases, and always require |
inspired by PMacc CONST_VECTOR and alpaka-group/alpaka#2441 (comment)
I guess I'm partially to blame, because I didn't dedicate enough time to them, but with alpaka 1.2.0 the device global variables are now a mess to use.
On the CPU we can do
With CUDA/HIP we can do
With oneAPI we can do
With alpaka 1.2.0 we have to do
In particular:
ALPAKA_STATIC_ACC_MEM_GLOBAL
) and a template wrapper (alpaka::DevGlobal<TAcc, T>
)= 99.
or= { 99.}
constexpr
constant memory, that does not need support for host-to-device memory copies<Acc1D>
template, even if the symbol may already be unambiguous (e.g. using anamespace
, or compiling for a single target).get()
The text was updated successfully, but these errors were encountered: