Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] implement memory visibility #2180

Draft
wants to merge 19 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions benchmarks/babelstream/src/AlpakaStream.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ struct AlpakaStream : Stream<T>
using DevHost = alpaka::Dev<PlatformHost>;
using PlatformAcc = alpaka::Platform<Acc>;
using DevAcc = alpaka::Dev<Acc>;
using BufHost = alpaka::Buf<alpaka::DevCpu, T, Dim, Idx>;
using BufAcc = alpaka::Buf<Acc, T, Dim, Idx>;
using BufHost = alpaka::Buf<alpaka::DevCpu, T, Dim, Idx, alpaka::MemVisibilityTypeList<alpaka::DevCpu>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the buffer type looks ugly now. @chillenzer suggested to use a default template parameter: typename TMeVisibility = alpaka::MemVisibilityTypeList<TDev>. This is possible but this also means the default is, that the memory is only visible from the plattform of given device type.

If we want to use for example manage memory, we still need to define the visibility manually. Also it could confuse, if the user defines the an alpaka buffer with default visibility as member variable and tries to use alpakaMemMapped for allocating the memory.

using BufAcc = alpaka::Buf<Acc, T, Dim, Idx, alpaka::MemVisibilityTypeList<Acc>>;
using Queue = alpaka::Queue<Acc, alpaka::Blocking>;

using WorkDiv = alpaka::WorkDivMembers<Dim, Idx>;
Expand Down
5 changes: 2 additions & 3 deletions example/bufferCopy/src/bufferCopy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,7 @@ auto main() -> int
//
// The `alloc` method returns a reference counted buffer handle.
// When the last such handle is destroyed, the memory is freed automatically.
using BufHost = alpaka::Buf<Host, Data, Dim, Idx>;
BufHost hostBuffer(alpaka::allocBuf<Data, Idx>(devHost, extents));
auto hostBuffer(alpaka::allocBuf<Data, Idx>(devHost, extents));
// You can also use already allocated memory and wrap it within a view (irrespective of the device type).
// The view does not own the underlying memory. So you have to make sure that
// the view does not outlive its underlying memory.
Expand All @@ -159,7 +158,7 @@ auto main() -> int
// Allocate accelerator memory buffers
//
// The interface to allocate a buffer is the same on the host and on the device.
using BufAcc = alpaka::Buf<Acc, Data, Dim, Idx>;
using BufAcc = alpaka::Buf<Acc, Data, Dim, Idx, alpaka::MemVisibilityTypeList<Acc>>;
BufAcc deviceBuffer1(alpaka::allocBuf<Data, Idx>(devAcc, extents));
BufAcc deviceBuffer2(alpaka::allocBuf<Data, Idx>(devAcc, extents));

Expand Down
2 changes: 1 addition & 1 deletion example/convolution1D/src/convolution1D.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ auto main() -> int
using DevAcc = alpaka::ExampleDefaultAcc<Dim, Idx>;
using QueueProperty = alpaka::Blocking;
using QueueAcc = alpaka::Queue<DevAcc, QueueProperty>;
using BufAcc = alpaka::Buf<DevAcc, DataType, Dim, Idx>;
using BufAcc = alpaka::Buf<DevAcc, DataType, Dim, Idx, alpaka::MemVisibilityTypeList<DevAcc>>;

std::cout << "Using alpaka accelerator: " << alpaka::getAccName<DevAcc>() << '\n';

Expand Down
3 changes: 1 addition & 2 deletions example/counterBasedRng/src/counterBasedRng.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,7 @@ auto main() -> int
CounterBasedRngKernel::Key key = {rd(), rd()};

// Allocate buffer on the accelerator
using BufAcc = alpaka::Buf<Acc, Data, Dim, Idx>;
BufAcc bufAcc(alpaka::allocBuf<Data, Idx>(devAcc, extent));
auto bufAcc(alpaka::allocBuf<Data, Idx>(devAcc, extent));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer the syntax

Suggested change
auto bufAcc(alpaka::allocBuf<Data, Idx>(devAcc, extent));
auto bufAcc = alpaka::allocBuf<Data, Idx>(devAcc, extent);

It's just a preference, not a request to make any changes.


// Create the kernel execution task.
auto const taskKernelAcc = alpaka::createTaskKernel<Acc>(
Expand Down
5 changes: 2 additions & 3 deletions example/heatEquation/src/heatEquation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,8 @@ auto main() -> int
double* const pNextHost = std::data(uNextBufHost);

// Accelerator buffer
using BufAcc = alpaka::Buf<Acc, double, Dim, Idx>;
auto uNextBufAcc = BufAcc{alpaka::allocBuf<double, Idx>(devAcc, extent)};
auto uCurrBufAcc = BufAcc{alpaka::allocBuf<double, Idx>(devAcc, extent)};
auto uNextBufAcc{alpaka::allocBuf<double, Idx>(devAcc, extent)};
auto uCurrBufAcc{alpaka::allocBuf<double, Idx>(devAcc, extent)};

double* pCurrAcc = std::data(uCurrBufAcc);
double* pNextAcc = std::data(uNextBufAcc);
Expand Down
6 changes: 2 additions & 4 deletions example/monteCarloIntegration/src/monteCarloIntegration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,6 @@ auto main() -> int
using QueueAcc = alpaka::Queue<Acc, QueueProperty>;
QueueAcc queue{devAcc};

using BufHost = alpaka::Buf<Host, uint32_t, Dim, Idx>;
using BufAcc = alpaka::Buf<Acc, uint32_t, Dim, Idx>;
using WorkDiv = alpaka::WorkDivMembers<Dim, Idx>;
// Problem parameter.
constexpr size_t numPoints = 1'000'000u;
Expand All @@ -104,8 +102,8 @@ auto main() -> int
alpaka::GridBlockExtentSubDivRestrictions::Unrestricted)};

// Setup buffer.
BufHost bufHost{alpaka::allocBuf<uint32_t, Idx>(devHost, extent)};
BufAcc bufAcc{alpaka::allocBuf<uint32_t, Idx>(devAcc, extent)};
auto bufHost{alpaka::allocBuf<uint32_t, Idx>(devHost, extent)};
auto bufAcc{alpaka::allocBuf<uint32_t, Idx>(devAcc, extent)};
uint32_t* const ptrBufAcc{std::data(bufAcc)};

// Initialize the global count to 0.
Expand Down
12 changes: 6 additions & 6 deletions example/randomCells2D/src/randomCells2D.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,12 +156,12 @@ auto main() -> int
using QueueAcc = alpaka::Queue<Acc, QueueProperty>;
QueueAcc queue{devAcc};

using BufHost = alpaka::Buf<Host, float, Dim, Idx>;
using BufAcc = alpaka::Buf<Acc, float, Dim, Idx>;
using BufHostRand = alpaka::Buf<Host, RandomEngineSingle, Dim, Idx>;
using BufAccRand = alpaka::Buf<Acc, RandomEngineSingle, Dim, Idx>;
using BufHostRandVec = alpaka::Buf<Host, RandomEngineVector, Dim, Idx>;
using BufAccRandVec = alpaka::Buf<Acc, RandomEngineVector, Dim, Idx>;
using BufHost = alpaka::Buf<Host, float, Dim, Idx, alpaka::MemVisibilityTypeList<Host>>;
using BufAcc = alpaka::Buf<Acc, float, Dim, Idx, alpaka::MemVisibilityTypeList<Acc>>;
using BufHostRand = alpaka::Buf<Host, RandomEngineSingle, Dim, Idx, alpaka::MemVisibilityTypeList<Host>>;
using BufAccRand = alpaka::Buf<Acc, RandomEngineSingle, Dim, Idx, alpaka::MemVisibilityTypeList<Acc>>;
using BufHostRandVec = alpaka::Buf<Host, RandomEngineVector, Dim, Idx, alpaka::MemVisibilityTypeList<Host>>;
using BufAccRandVec = alpaka::Buf<Acc, RandomEngineVector, Dim, Idx, alpaka::MemVisibilityTypeList<Acc>>;
using WorkDiv = alpaka::WorkDivMembers<Dim, Idx>;

constexpr Idx numX = NUM_X;
Expand Down
8 changes: 4 additions & 4 deletions example/randomStrategies/src/randomStrategies.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,17 @@ struct Box
QueueAcc queue; ///< default accelerator queue

// buffers holding the PRNG states
using BufHostRand = alpaka::Buf<Host, RandomEngine, Dim, Idx>;
using BufAccRand = alpaka::Buf<Acc, RandomEngine, Dim, Idx>;
using BufHostRand = alpaka::Buf<Host, RandomEngine, Dim, Idx, alpaka::MemVisibilityTypeList<PlatformHost>>;
using BufAccRand = alpaka::Buf<Acc, RandomEngine, Dim, Idx, alpaka::MemVisibilityTypeList<PlatformAcc>>;

Vec const extentRand; ///< size of the buffer of PRNG states
WorkDiv workdivRand; ///< work division for PRNG buffer initialization
BufHostRand bufHostRand; ///< host side PRNG states buffer (can be used to check the state of the states)
BufAccRand bufAccRand; ///< device side PRNG states buffer

// buffers holding the "simulation" results
using BufHost = alpaka::Buf<Host, float, Dim, Idx>;
using BufAcc = alpaka::Buf<Acc, float, Dim, Idx>;
using BufHost = alpaka::Buf<Host, float, Dim, Idx, alpaka::MemVisibilityTypeList<PlatformHost>>;
using BufAcc = alpaka::Buf<Acc, float, Dim, Idx, alpaka::MemVisibilityTypeList<PlatformAcc>>;

Vec const extentResult; ///< size of the results buffer
WorkDiv workdivResult; ///< work division of the result calculation
Expand Down
9 changes: 5 additions & 4 deletions example/reduce/src/reduce.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ auto reduce(
DevAcc devAcc,
QueueAcc queue,
uint64_t n,
alpaka::Buf<DevHost, T, Dim, Idx> hostMemory,
alpaka::Buf<DevHost, T, Dim, Idx, alpaka::MemVisibilityTypeList<DevHost>> hostMemory,
TFunc func) -> T
{
static constexpr uint64_t blockSize = getMaxBlockSize<Accelerator, 256>();
Expand All @@ -62,10 +62,11 @@ auto reduce(
if(blockCount > maxBlockCount)
blockCount = maxBlockCount;

alpaka::Buf<DevAcc, T, Dim, Extent> sourceDeviceMemory = alpaka::allocBuf<T, Idx>(devAcc, n);
using DevBuf = alpaka::Buf<DevAcc, T, Dim, Extent, alpaka::MemVisibilityTypeList<DevAcc>>;

alpaka::Buf<DevAcc, T, Dim, Extent> destinationDeviceMemory
= alpaka::allocBuf<T, Idx>(devAcc, static_cast<Extent>(blockCount));
DevBuf sourceDeviceMemory = alpaka::allocBuf<T, Idx>(devAcc, n);

DevBuf destinationDeviceMemory = alpaka::allocBuf<T, Idx>(devAcc, static_cast<Extent>(blockCount));

// copy the data to the GPU
alpaka::memcpy(queue, sourceDeviceMemory, hostMemory, n);
Expand Down
8 changes: 4 additions & 4 deletions example/vectorAdd/src/vectorAdd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ auto main() -> int
auto const devHost = alpaka::getDevByIdx(platformHost, 0);

// Allocate 3 host memory buffers
using BufHost = alpaka::Buf<DevHost, Data, Dim, Idx>;
BufHost bufHostA(alpaka::allocBuf<Data, Idx>(devHost, extent));
auto bufHostA(alpaka::allocBuf<Data, Idx>(devHost, extent));
using BufHost = decltype(bufHostA);
BufHost bufHostB(alpaka::allocBuf<Data, Idx>(devHost, extent));
BufHost bufHostC(alpaka::allocBuf<Data, Idx>(devHost, extent));

Expand All @@ -129,8 +129,8 @@ auto main() -> int
}

// Allocate 3 buffers on the accelerator
using BufAcc = alpaka::Buf<DevAcc, Data, Dim, Idx>;
BufAcc bufAccA(alpaka::allocBuf<Data, Idx>(devAcc, extent));
auto bufAccA(alpaka::allocBuf<Data, Idx>(devAcc, extent));
using BufAcc = decltype(bufAccA);
BufAcc bufAccB(alpaka::allocBuf<Data, Idx>(devAcc, extent));
BufAcc bufAccC(alpaka::allocBuf<Data, Idx>(devAcc, extent));

Expand Down
5 changes: 5 additions & 0 deletions include/alpaka/acc/AccGenericSycl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ namespace alpaka::trait
{
};

struct MemVisibility<alpaka::AccCpuSerial<TDim, TIdx>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why AccCpuSerial ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a copy paste error.

{
using type = alpaka::MemVisibleGenericSycl;
};

//! The SYCL accelerator device properties get trait specialization.
template<template<typename, typename> typename TAcc, typename TDim, typename TIdx>
struct GetAccDevProps<
Expand Down
1 change: 1 addition & 0 deletions include/alpaka/acc/Tag.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ namespace alpaka
CREATE_ACC_TAG(TagGpuCudaRt);
CREATE_ACC_TAG(TagGpuHipRt);
CREATE_ACC_TAG(TagGpuSyclIntel);
#undef CREATE_ACC_TAG

namespace trait
{
Expand Down
1 change: 1 addition & 0 deletions include/alpaka/alpaka.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@
#include "alpaka/math/MathStdLib.hpp"
#include "alpaka/math/MathUniformCudaHipBuiltIn.hpp"
// mem
#include "alpaka/mem/Visibility.hpp"
#include "alpaka/mem/alloc/AllocCpuAligned.hpp"
#include "alpaka/mem/alloc/AllocCpuNew.hpp"
#include "alpaka/mem/alloc/Traits.hpp"
Expand Down
8 changes: 4 additions & 4 deletions include/alpaka/dev/DevCpu.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,16 +167,16 @@ namespace alpaka
};
} // namespace trait

template<typename TElem, typename TDim, typename TIdx>
template<typename TElem, typename TDim, typename TIdx, typename TMemVisibility>
class BufCpu;

namespace trait
{
//! The CPU device memory buffer type trait specialization.
template<typename TElem, typename TDim, typename TIdx>
struct BufType<DevCpu, TElem, TDim, TIdx>
template<typename TElem, typename TDim, typename TIdx, typename TMemVisibility>
struct BufType<DevCpu, TElem, TDim, TIdx, TMemVisibility>
{
using type = BufCpu<TElem, TDim, TIdx>;
using type = BufCpu<TElem, TDim, TIdx, TMemVisibility>;
};

//! The CPU device platform type trait specialization.
Expand Down
8 changes: 4 additions & 4 deletions include/alpaka/dev/DevGenericSycl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

namespace alpaka
{
template<typename TElem, typename TDim, typename TIdx, typename TDev>
template<typename TElem, typename TDim, typename TIdx, typename TPlatform, typename TMemVisibility>
class BufGenericSycl;

namespace detail
Expand Down Expand Up @@ -219,10 +219,10 @@ namespace alpaka::trait
};

//! The SYCL device memory buffer type trait specialization.
template<typename TElem, typename TDim, typename TIdx, typename TPlatform>
struct BufType<DevGenericSycl<TPlatform>, TElem, TDim, TIdx>
template<typename TElem, typename TDim, typename TIdx, typename TPlatform, typename TMemVisibility>
struct BufType<DevGenericSycl<TPlatform>, TElem, TDim, TIdx, TMemVisibility>
{
using type = BufGenericSycl<TElem, TDim, TIdx, TPlatform>;
using type = BufGenericSycl<TElem, TDim, TIdx, TPlatform, TMemVisibility>;
};

//! The SYCL device platform type trait specialization.
Expand Down
8 changes: 4 additions & 4 deletions include/alpaka/dev/DevUniformCudaHipRt.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ namespace alpaka
template<typename TApi>
struct PlatformUniformCudaHipRt;

template<typename TApi, typename TElem, typename TDim, typename TIdx>
template<typename TApi, typename TElem, typename TDim, typename TIdx, typename TMemVisibility>
struct BufUniformCudaHipRt;

//! The CUDA/HIP RT device handle.
Expand Down Expand Up @@ -222,10 +222,10 @@ namespace alpaka
};

//! The CUDA/HIP RT device memory buffer type trait specialization.
template<typename TApi, typename TElem, typename TDim, typename TIdx>
struct BufType<DevUniformCudaHipRt<TApi>, TElem, TDim, TIdx>
template<typename TApi, typename TElem, typename TDim, typename TIdx, typename TMemVisibility>
struct BufType<DevUniformCudaHipRt<TApi>, TElem, TDim, TIdx, TMemVisibility>
{
using type = BufUniformCudaHipRt<TApi, TElem, TDim, TIdx>;
using type = BufUniformCudaHipRt<TApi, TElem, TDim, TIdx, TMemVisibility>;
};

//! The CUDA/HIP RT device platform type trait specialization.
Expand Down
Loading
Loading