OpenCL/clfft Integration including CI #26

tdd11235813 · 2017-08-08T09:57:56Z

Hi,

the liFFT interface is extended to support context including queue management with an option for asynchronous functionality.
This is realized by the clfft client and there are several possibilities how one can use liFFT in conjunction with clfft. Some examples to show you the API changes and usage of clfft.

use-case 1: the default liFFT interface for clfft, where a global OpenCL context is generated in the backend

using TestLibrary = LiFFT::libraries::clFFT::ClFFTNoContextAPI;
using FFT_TYPE = LiFFT::FFT_2D_R2C<TestPrecision>;
auto inWrapped = FFT_TYPE::wrapInput(                                     
                    LiFFT::mem::wrapPtr<false>(input.get(),          
                                               TestExtents(testSize, testSize)));
auto outWrapped = FFT_TYPE::wrapOutput(                                      
                    LiFFT::mem::wrapPtr<true>(output.get(),            
                                              TestExtents(testSize, testSize / 2 + 1)));
auto fft = LiFFT::makeFFT<TestLibrary>(inWrapped, outWrapped);
fft(inWrapped, outWrapped);

use-case 2: the user provides a context/queue object to the liFFT API. The ClFFT client offers 3 classes which encapsulate both the context/device and the queue.
- context classes are: ContextLocal (RAII), ContextGlobal (Singleton) and ContextWrapper (wrap raw OpenCL context, device and queue).
- other clients like CUDA could provide similar types for CUDA streams
- makeFFTInQueue is added to the API, otherwise there would be ambiguous overloads

using TestLibrary = LiFFT::libraries::clFFT::ClFFTContextAPI;
using Context = LiFFT::libraries::clFFT::policies::ContextLocal<>;
auto fft = LiFFT::makeFFTInQueue<TestLibrary>(inWrapped,
                                              outWrapped,
                                              context);
fft(inWrapped, outWrapped, context);

use-case 3: the user also wants to pass OpenCL memory objects to liFFT
- FFT_LibPtrWrapper is added to liFFT to handle non-accessible device/library pointers and is a FFT_DataWrapperBase
- such a lib pointer is flagged as device memory, so the user takes care of memory allocation
- due to non-accessible lib pointer you cannot use generators to fill data or the liFFT copy policy
  - API could be extended again to support lib pointers along with copy functors (with host2device, ...)

cl_mem dat1 = clCreateBuffer(...);
cl_mem dat2 = clCreateBuffer(...);
// ... data sent to dat1 ...
// wrap OpenCL device pointer 
auto inWrapped = FFT::wrapInputLibPtr(dat1, TestExtents(testSize, testSize));
auto outWrapped = FFT::wrapOutputLibPtr(dat2, TestExtents(testSize, testSize));
auto fft = LiFFT::makeFFTInQueue<ClFFTContextAPI>(inWrapped, outWrapped, context);
fft(inWrapped, outWrapped, context);

use-case 4: asynchronous clfft/liFFT (also see testOpenCL.cpp)

using Context = LiFFT::libraries::clFFT::policies::ContextLocal<true>; // enable async context
// ...
{
  Context context;
  using FFT_TYPE = LiFFT::FFT_2D_R2C<TestPrecision>;
  auto inWrapped = FFT_TYPE::wrapInput(
                    LiFFT::mem::wrapPtr<false>(input.get(),
                                               TestExtents(testSize, testSize)));
  auto outWrapped = FFT_TYPE::wrapOutput(
                    LiFFT::mem::wrapPtr<true>(output.get(),
                                              TestExtents(testSize, testSize / 2 + 1)));
  LiFFT::policies::copy(inWrapped, baseR2CInput);

  auto fft = LiFFT::makeFFTInQueue<ClFFTContextAPI>(inWrapped,
                                                     outWrapped,
                                                     context);
  fft(inWrapped, outWrapped, context);

  context.sync_queue(); // to wait until host data with result is present
}

cmake example for building clfft

export CMAKE_PREFIX_PATH=$HOME/software/clFFT-cuda8.0-gcc5.4/:/opt/cuda/include/:$CMAKE_PREFIX_PATH
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo \
      -DLiFFT_ENABLE_CUDA=0 -DLiFFT_ENABLE_OPENCL=1 \
      -DCMAKE_C_COMPILER=gcc-5 -DCMAKE_CXX_COMPILER=g++-5 ..

the .travis.yml is updated. It now uses trusty distribution, cuda8 and includes OpenCL testing (CPU, AMD OpenCL). But there are still CUDA+gcc+boost issues, so only few version combinations seem to work.

I hope it provides a usable design now, where we can build on it.

When you have some time, please review and play around with the code :)

ax3l · 2017-08-14T15:08:12Z

pretty awesome, thank you!

@Flamefire whenever you have the time, feel free to have a look! :)

tdd11235813 · 2017-08-24T11:08:38Z

The travis files and testOpenCL.cpp have been updated. If you want to have this PR as a single commit, let me know and I squash the commits.

There is another use case which has not been shown here yet. When you want to execute the FFT on a CPU or GPU, you can specify the context target by an enum:

enum class ContextDevice {                
    GPU=CL_DEVICE_TYPE_GPU,               
    CPU=CL_DEVICE_TYPE_CPU,               
    ACCELERATOR=CL_DEVICE_TYPE_ACCELERATOR
};

So you could request an OpenCL context for CPU and one for GPU

using Context = LiFFT::libraries::clFFT::policies::ContextLocal<true>;
Context context_cpu(ContextDevice::CPU);                          
Context context_gpu(ContextDevice::GPU);                          
// ...

If there is no GPU, OpenCL uses CL_DEVICE_TYPE_DEFAULT which depends on the OpenCL implementation.
There are two test cases in testOpenCL.cpp called TestClFFTR2CInplaceTwoArch[Async] to show the difference (one warmup before time of FFT calls are measured including synchronization).

$ ./test/Test --run_test=OpenCL/TestClFFTR2CInplaceTwoArch*
Running 2 test cases...
1: "ClFFT Informations","Device","Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz", <snip>
2: "ClFFT Informations","Device","Tesla P100-PCIE-16GB", <snip>
TwoArch Sync: Time = 101.061
TwoArch ASync: Time = 88.4254

*** No errors detected

I know it is not an exact proof that both FFTs were running concurrently, but it shows the workflow of sync and async architecture specific contexts and it is on the ToDo to play around with liFFT and threaded environments.

ax3l · 2017-10-20T09:06:25Z

@Flamefire if you are interested to take a look at the implementation of clfft or want to merge it feel free to jump in :)

Flamefire

Just got around reviewing this. Looks great to me except one place with a trait. Please provide a short explanation and at least rename the trait has_type to something meaningful. Maybe something like shown here would be more readable? Note that there is a void_t already defined in the code so C++11 compatibility is ok.

Flamefire · 2018-02-02T15:22:55Z

include/libLiFFT/traits/IntegralType.hpp

+
+  // SFINAE test if T has type member
+  template <typename T>
+  class has_type


I have trouble understanding the usage of this. I assume this trait is supposed to return true iff a member isComplex exists, that is constructible from a int?
Then what is the reasoning in using it as used in IntegralTypeImpl below? If I read that correctly then IntegralTypeImpl<Foo> returns Foo for every type Foo that is either an int or float or simply does not have a isComplex member which is true for pretty much any class. Wouldn't that make it pointless?

…simplified.

tdd11235813 · 2018-02-14T16:54:57Z

@Flamefire you are absolutely right, this is too messy code and void_t helps here.
Ok, now what was the motivation for a more generic version of IntegralType.
For OpenCL cl_mempointer support I wanted to implement a liFFT integrating LibPtrWrapper which accepts non-integral types, but which must be treated like integral types.
(Note that for float/double and the like we already have PlainPtrWrapper).
cl_mem is such a non-integral, non-liFFT compatible type and is library specific.
liFFT only knows floating points and fundamental integral types, otherwise it calls ::isComplex and ::type at compile-time on the type that was wrapped by IntegralType.
I simplified the check and only use ::type as indicator for having a liFFT compatible type or not.

edit: CI failed, I look into this (uh but it did work on my system :P)
edit2: there was a SFINAE type mismatch, now it has not worked on my system either lol .. god bless travis. doing some tests now and push an update after that.

Flamefire · 2018-02-15T11:17:45Z

IntegralType is used to determine the actual datatype used. E.g. we can have Complex types which use float, double etc. as its integral type. Based on that the backend can choose the library implementation (fftwf, fftwd for example clMem is a problem, because we cannot get the integral type from the handle alone. This was not considered when designing this library. In the current implementation any type which does not have a nested type member is an integral type, which IMO is wrong. May we'd be better off to leave IntegralType empty or unspecialised for clMem which seems like a good way of saying "I don't know". One could then specialize over some kind of wrapper around clMem which just enhances clMem by its type and is implicitly convertible to clMem (but not from!)
You circumvented the problem by using FFT::wrapInput which basically propagates all properties from the FFT to the pointer. The initial idea of PointerWrappers (which yours belong to) was to enhance raw pointers by required properties so the library can use it to select codepaths and/or check conditions. So your approach is the other way round. While this shortcut might be ok, I think the LibPtrWrapper itself should not be based on the FFT but rather get all the information passed in so one could write e.g. wrapLibHandle(myClMem, Complex<float>, myExtends) or so. FFT::wrapInput could still fill these params with the information it has but just as a "lazy shortcut". "Lazy" because it is shorter, but as mentioned circumvents all validity checks done later.

From the comment there seems to be a misunderstanding on what IntegralType is. It is not a "whether" as its not a bool, but a what. And for decision if it is complex or not there exists IsComplex or something like that.

Oh and while I'm on that comment: You don't need the ::type and ::isComplex members you only need a specialization of the traits. That was one of the things René or Axel strongly suggested back then to allow non-intrusive extensions.

tdd11235813 · 2018-02-27T16:50:57Z

thanks for your detailed feedback. I try to summarize the next steps: the goal is to decouple FFT data from FFT executor. FFT properties exist on both sides and become validated at compile-time and that's what we also want for clFFT backend of course.
Thus, a type-agnostic wrapper is required. I would call it LibHandle now.
(probably better to use composition instead of inheritance)

template<typename T, typename TValueType, unsigned T_numDims>
struct LibHandle : public T {
  using type = TValueType;
  using IdxType = types::Vec< T_numDims, size_t >;
protected:  
  IdxType m_extents;
};

I cannot derive from DataContainer like PlainPtrWrapper as cl_mem is not directly accessible like raw pointers. Hence, the name LibHandle instead of LibPointer... to emphasize the difference.
The data side would be:

cl_mem cldata;
// .. this cldata will contain 2D floats ..
LibHandle<cl_mem,float,2> handle = LiFFT::mem::wrapLibHandle<float>(cldata, extents);
// or just: auto handle = LiFFT::mem::wrapLibHandle<float>(cldata, extents);
// .. implicit conversion to base type cl_mem is also possible
// things like copy does not work as there is no accessor defined in handle
// LiFFT::policies::copy(handle, baseR2CInput);

Now the FFT part:

auto in_handle = LiFFT::mem::wrapLibHandle<float>(in_cldata, extents);
auto out_handle = LiFFT::mem::wrapLibHandle<Complex<float>>(out_cldata, extents);
using FFT_TYPE = LiFFT::FFT_2D_R2C<float>;
// add the FFT properties to the handle in a wrapper
auto in_wrapped = FFT_TYPE::wrapInputLibHandle(in_handle);
auto out_wrapped = FFT_TYPE::wrapOutputLibHandle(out_handle);
// make FFT based on wrappers' FFT properties
auto fft = LiFFT::makeFFTInQueue<ClFFTContextAPI>(in_wrapped, out_wrapped,
                                                              context);
// execute
fft(in_wrapped, out_wrapped, context);

Not checked all the possible conflicts under the hood, but what do you think?

Flamefire · 2018-02-28T20:33:50Z

Yes sounds great. Maybe pack in strides too but have them be available as default params? Not sure if this is requires/usefull, just plain pointers may have strides which are checked (I think) by the accessors.

Matthias Werner and others added 5 commits August 2, 2017 14:55

Adding clFFT to LiFFT.

59b7d5b

clfft with async context api implemented.

2befb2c

travis updated and clfft support included.

6fda7ec

travis update for trusty.

ceee18b

travis fixes

bad6c53

ax3l requested a review from Flamefire August 14, 2017 15:05

ax3l assigned Flamefire Aug 14, 2017

ax3l requested a review from psychocoderHPC August 14, 2017 15:06

ax3l added the enhancement label Aug 14, 2017

Matthias Werner added 4 commits August 16, 2017 15:40

travis amd-sdk downloader updated.

1a7f4af

fixes travis amd sdk version selection.

f9b1ddc

uses travis_retry & installs only required cuda packages.

938004c

adds test for async opencl arch workflow.

8076533

Flamefire requested changes Feb 2, 2018

View reviewed changes

check in IntegralType extension for integral-like non-integral types …

eb12412

…simplified.

Matthias Werner added 2 commits February 14, 2018 21:43

fixed SFINAE type mismatch from previous broken commit.

2f60f33

AMD switched to https.

137dcc5

tdd11235813 force-pushed the pr_dev branch from 1bb4f7c to 137dcc5 Compare February 14, 2018 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCL/clfft Integration including CI #26

OpenCL/clfft Integration including CI #26

tdd11235813 commented Aug 8, 2017

ax3l commented Aug 14, 2017

tdd11235813 commented Aug 24, 2017 •

edited

Loading

ax3l commented Oct 20, 2017

Flamefire left a comment

Flamefire Feb 2, 2018

tdd11235813 commented Feb 14, 2018 •

edited

Loading

Flamefire commented Feb 15, 2018

tdd11235813 commented Feb 27, 2018 •

edited

Loading

Flamefire commented Feb 28, 2018

OpenCL/clfft Integration including CI #26

Are you sure you want to change the base?

OpenCL/clfft Integration including CI #26

Conversation

tdd11235813 commented Aug 8, 2017

ax3l commented Aug 14, 2017

tdd11235813 commented Aug 24, 2017 • edited Loading

ax3l commented Oct 20, 2017

Flamefire left a comment

Choose a reason for hiding this comment

Flamefire Feb 2, 2018

Choose a reason for hiding this comment

tdd11235813 commented Feb 14, 2018 • edited Loading

Flamefire commented Feb 15, 2018

tdd11235813 commented Feb 27, 2018 • edited Loading

Flamefire commented Feb 28, 2018

tdd11235813 commented Aug 24, 2017 •

edited

Loading

tdd11235813 commented Feb 14, 2018 •

edited

Loading

tdd11235813 commented Feb 27, 2018 •

edited

Loading