Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device global variables are a mess #2441

Open
fwyzard opened this issue Dec 14, 2024 · 5 comments
Open

device global variables are a mess #2441

fwyzard opened this issue Dec 14, 2024 · 5 comments

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Dec 14, 2024

I guess I'm partially to blame, because I didn't dedicate enough time to them, but with alpaka 1.2.0 the device global variables are now a mess to use.

On the CPU we can do

// "global" variables
float value = 99.;
float data[] = { 9.2, -0.1, 5.7, -4.3 };

// "global" constants
constexpr float constant = 42.;
constexpr float bins[] = { 0.5, 1.5, 2.5, 3.5 };

void function() {
    // use by reference
    printf("%f\n", std::min(value, constant));

    // use by pointer
    for (int i = 0; i < 3; ++i) {
        printf("%f, %f\n", bins[i], data[i]);
    }
}

With CUDA/HIP we can do

// "global" variables
__device__ float value = 99.;
__device__ float data[] = { 9.2, -0.1, 5.7, -4.3 };

// "global" constants
__constant__ constexpr float constant = 42.;
__constant__ constexpr float bins[] = { 0.5, 1.5, 2.5, 3.5 };

__global__ void kernel() {
    // use by reference
    printf("%f\n", std::min(value, constant));

    // use by pointer
    for (int i = 0; i < 3; ++i) {
        printf("%f, %f\n", bins[i], data[i]);
    }
}

With oneAPI we can do

// "global" variables
sycl::ext::oneapi::experimental::device_global<float> value{99.};
sycl::ext::oneapi::experimental::device_global<float[4]> data{ 9.2, -0.1, 5.7, -4.3 };

// "global" constants
constexpr float constant = 42.; 
constexpr float bins[] = { 0.5, 1.5, 2.5, 3.5 };

void kernel() {
    // use by reference
    printf("%f\n", (sycl::min<float>(value, constant)));
    
    // use by pointer 
    for (int i = 0; i < 3; ++i) {
        printf("%f, %f\n", bins[i], data[i]);
    }
}

With alpaka 1.2.0 we have to do

// "global" variables
ALPAKA_STATIC_ACC_MEM_GLOBAL alpaka::DevGlobal<Acc1D, float> value{99.};
ALPAKA_STATIC_ACC_MEM_GLOBAL alpaka::DevGlobal<Acc1D, float[4]> data{9.2, -0.1, 5.7, -4.3};

// "global" constants
ALPAKA_STATIC_ACC_MEM_CONSTANT alpaka::DevGlobal<Acc1D, float> constant{42.};
ALPAKA_STATIC_ACC_MEM_CONSTANT alpaka::DevGlobal<Acc1D, float[4]> bins{0.5, 1.5, 2.5, 3.5};

struct Kernel {
  ALPAKA_FN_ACC void operator()(Acc1D const& acc) const {
    // use by reference
    printf("%f\n", std::min(value<Acc1D>.get(), constant<Acc1D>.get()));

    // use by pointer
    for (int i = 0; i < 3; ++i) {
      printf("%f, %f\n", bins<Acc1D>.get()[i], data<Acc1D>.get()[i]);
    }
  }
};

In particular:

  • we need both a macro (ALPAKA_STATIC_ACC_MEM_GLOBAL) and a template wrapper (alpaka::DevGlobal<TAcc, T>)
  • only explicit brace initialisation is supported, not = 99. or = { 99.}
  • we could simplify the syntax for constexpr constant memory, that does not need support for host-to-device memory copies
  • we always need the <Acc1D> template, even if the symbol may already be unambiguous (e.g. using a namespace, or compiling for a single target)
  • we (almost ?) always need .get()
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2024

Bottom line, I think we should explore ways to simplify the syntax to declare and use global variables, constant memory variables, and constexpr global variables.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 16, 2024

Updated 2024.12.16 to support uninitialised variables (to support extern), and automatic size (e.g. float[]) for initialised arrays.


Here is one idea to simplify the syntax:

namespace {

  template <typename... Args>
  consteval size_t count_arguments(Args&&...) {
    return sizeof...(Args);
  }

}  // namespace

#if defined(__CUDA_ARCH__) or defined(__HIP_DEVICE_COMPILE__)

// CUDA and HIP: Use __device__ and initialize directly
#define DEVICE_GLOBAL(type, name, ...) __device__ std::type_identity_t<type> name __VA_OPT__({__VA_ARGS__})

#elif defined(__SYCL_DEVICE_ONLY__)

// SYCL: Use device_global and initialize via brace-enclosed list
#define DEVICE_GLOBAL(type, name, ...)                                                                                     \
  sycl::ext::oneapi::experimental::device_global<                                                                          \
      std::conditional_t<std::is_unbounded_array_v<type>, std::remove_extent_t<type>[count_arguments(__VA_ARGS__)], type>> \
      name __VA_OPT__({__VA_ARGS__})

#else

// CPU backend
#define DEVICE_GLOBAL(type, name, ...) std::type_identity_t<type> name __VA_OPT__({__VA_ARGS__})

#endif

It can be used as

// global device simple variables
DEVICE_GLOBAL(float, uninitialised_variable);
DEVICE_GLOBAL(float, initialised_variable, 99.);

// global device arrays
DEVICE_GLOBAL(float[4], uninitialised_array);
DEVICE_GLOBAL(float[4], initialised_array, 9.2, -0.1, 5.7, -4.3);
DEVICE_GLOBAL(float[], auto_sized_array, 9.2, -0.1, 5.7, -4.3);

// global device aggregates
struct Dim{ float x; float y; float z; float w; };
DEVICE_GLOBAL(Dim, uninitialised_aggregate);
DEVICE_GLOBAL(Dim, initialised_aggregate, 9.2, -0.1, 5.7, -4.3);

// extern, static, inline, const, and other attributes
[[maybe_unused]] extern DEVICE_GLOBAL(float, uninitialised_extern;
[[maybe_unused]] static DEVICE_GLOBAL(float, initialised_static, 99.);
[[maybe_unused]] inline DEVICE_GLOBAL(float, initialised_inline, 99.);
[[maybe_unused]] const DEVICE_GLOBAL(float, initialised_const, 99.);

or, if one prefers, also as

DEVICE_GLOBAL(float, initialised_variable){ 99. };
DEVICE_GLOBAL(float[4], initialised_array){ 9.2, -0.1, 5.7, -4.3 };

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 16, 2024

And also I did not try to memcpy values from the host to the global device variables yet.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 17, 2024

Playing a bit more with the implementation at #2441 (comment), the first issue is with the access to the members of a class or struct:

struct Dim{ float x; float y; float z; float w; };
DEVICE_GLOBAL(Dim, aggregate){ 9.2, -0.1, 5.7, -4.3 };

With the CUDA, HIP and CPU back-ends we can do simply

printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
       aggregate.x,
       aggregate.y,
       aggregate.z,
       aggregate.w);

However the SYCL back-end requires either .get()

printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
       aggregate.get().x,
       aggregate.get().y,
       aggregate.get().z,
       aggregate.get().w);

or an explicit cast:

printf(".x = %0.1f, .y = %0.1f, .z = %0.1f, .w = %0.1f\n",
       static_cast<Dim&>(aggregate).x,
       static_cast<Dim&>(aggregate).y,
       static_cast<Dim&>(aggregate).z,
       static_cast<Dim&>(aggregate).w);

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 17, 2024

So the options seem to be:

  • let the simple syntax work for the CUDA/HIP/CPU cases, and fail for the SYCL case, or
  • let the simple syntax fail in all cases
  • (wait and see if reflection and token injection land in c++26 and let us implement a better proxy type)

And then we need to decide if we want to let the developers:

  • use the explicit cast (works in both cases)
  • use .get() (works only in the second case)
  • use some new function like alpaka::get(global_variable) (can work in both cases)

Update: during the Zoom call on 17/12/2024 we agreed to let the simple syntax fail in all cases, and always require .get() for data member access. The explicit cast should also work, anyway.

psychocoderHPC added a commit to psychocoderHPC/alpaka2 that referenced this issue Dec 17, 2024
inspired by PMacc CONST_VECTOR and alpaka-group/alpaka#2441 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant