Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang transpiler integration #756

Open
wants to merge 18 commits into
base: development
Choose a base branch
from

Conversation

vyast-softserveinc
Copy link

Description

This pull request is aimed for integration occa-transpiler library for providing full C++ support under the OCCA

Added:

option for switching between old & new transpiler transpiler-version

@deukhyun-cha
Copy link
Contributor

cmake -DOCCA_CLANG_BASED_TRANSPILER=ON worked for me to get the new transpiler source and generate the build.

README.md Outdated Show resolved Hide resolved
cmake/GitSubmodules.cmake Outdated Show resolved Hide resolved
examples/cpp/31_oklt_v3_moving_avg/main.cpp Outdated Show resolved Hide resolved
src/occa/internal/bin/occa.cpp Show resolved Hide resolved
Copy link
Contributor

@deukhyun-cha deukhyun-cha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kris-rowe, with this initial pass I think it's ready for you and others to take a look. I have tested a successful compilation of OCCA with this cmake option turned on, and without the option enabled it should have no effect. Please let us know if you can think of any other tests or changes to have, otherwise having this in would help us proceed with our new okl kernel development. Thanks!

@amikstcyr
Copy link
Contributor

Hi - Any hope that this gets merged?

@kris-rowe
Copy link
Member

Please take a look at this issue.

@YuraCobain
Copy link

@kris-rowe the issue is addressed, please take a look and try the fix.

@amikstcyr
Copy link
Contributor

amikstcyr commented Aug 27, 2024

@kris-rowe all issues were addressed, can we please have a conclusion on this?

@IuriiKobein
Copy link

Hi @kris-rowe
If you have any additional comments, questions or concerns I am glad to resolve to merge the PR.

@thilinarmtb
Copy link
Collaborator

Hi @IuriiKobein, I am planning to test this branch soon. I will let you know if I run into any issues.

@thilinarmtb
Copy link
Collaborator

thilinarmtb commented Sep 23, 2024

I started testing this branch on Frontier at OLCF and I am running into a segmentation
fault when I run 31_oklt_v3_moving_avg test.

I had to make the following changes in occa-transpiler since the CMake 3.26
was not available on Frontier (hope this is not the reason for the segfault).

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2d9cc30..659d44f 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
 
 project(occa-transpiler VERSION 0.0.1 LANGUAGES C CXX)
 
diff --git a/lib/CMakeLists.txt b/lib/CMakeLists.txt
index 182f1e0..dd8b545 100644
--- a/lib/CMakeLists.txt
+++ b/lib/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
 project (occa-transpiler VERSION 0.0.1 LANGUAGES CXX)
 
 set(CMAKE_CXX_STANDARD 17)
diff --git a/tool/CMakeLists.txt b/tool/CMakeLists.txt
index 543d898..98cdb5c 100644
--- a/tool/CMakeLists.txt
+++ b/tool/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
 project (occa-tool VERSION 0.0.1 LANGUAGES CXX)

Then I followed the build instructions and everything built fine.
When I tried to run the test, I get the following:

[[email protected] 31_oklt_v3_moving_avg]$ export OKLT_LOG_LEVEL=trace
[[email protected] 31_oklt_v3_moving_avg]$ ./examples_cpp_oklt_v3_moving_avg                                                                                                                                                                            
[11:50:40.179] [I] start: OKL_DIRECTIVE_EXPANSION_STAGE [stage_action_runner.cpp:32]                                                                                                                                                                           
[11:50:40.179] [T] input source:                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                               
#include "constants.h"                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                               
template<class T,                                                                                                                                                                                                                                              
         int THREADS,                                                                                                                                                                                                                                          
         int WINDOW>                                                                                                                                                                                                                                           
struct MovingAverage {                                                                                                                                                                                                                                         
    MovingAverage(int inputSize,                                                                                                                                                                                                                               
                  int outputSize,                                                                                                                                                                                                                              
                  T *shared_input,                                                                                                                                                                                                                             
                  T *shared_output)                                                                                                                                                                                                                            
        :_inputSize(inputSize)                                                                                                                                                                                                                                 
        ,_outputSize(outputSize)                                                                                                                                                                                                                               
        ,_shared_data(shared_input)                                                                                                                                                                                                                            
        ,_result_data(shared_output)                                                                                                                                                                                                                           
    {}                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                               
    void syncCopyFrom(const T *input, int block_idx, int thread_idx) {
        int linearIdx = block_idx * THREADS + thread_idx;
        //INFO: copy base chunk
        if(linearIdx < _inputSize) {
            _shared_data[thread_idx] = input[linearIdx];
        }
        //INFO: copy WINDOW chunk
        int tailIdx = (block_idx + 1) * THREADS + thread_idx;
        if(tailIdx < _inputSize && thread_idx < WINDOW) {
            _shared_data[THREADS + thread_idx] = input[tailIdx];
        }
        @barrier;
    }

    void process(int thread_idx) {
        T sum = T();
        for(int i = 0; i < WINDOW; ++i) {
            sum += _shared_data[thread_idx + i];
        }
        _result_data[thread_idx] = sum / WINDOW;
        @barrier;
    }

    void syncCopyTo(T *output, int block_idx, int thread_idx) { 
        int linearIdx = block_idx * THREADS + thread_idx;
        if(linearIdx < _outputSize) {
            output[linearIdx] = _result_data[thread_idx];
        }
        @barrier;
    }
private:
    int _inputSize;
    int _outputSize;

    //INFO: not supported
    // @shared T _data[THREADS_PER_BLOCK + WINDOW_SIZE];
    // @shared T _result[THREADS_PER_BLOCK];

    T *_shared_data;
    T *_result_data;
};

@kernel void movingAverage32f(@restrict const float *inputData, 
                              int inputSize,
                              @restrict float *outputData,
                              int outputSize)
{
    @outer(0) for (int block_idx = 0; block_idx < outputSize / THREADS_PER_BLOCK + 1; ++block_idx) {
        @shared float blockInput[THREADS_PER_BLOCK + WINDOW_SIZE];
        @shared float blockResult[THREADS_PER_BLOCK];
        MovingAverage<float, THREADS_PER_BLOCK, WINDOW_SIZE> ma{
                inputSize,
                outputSize,
                blockInput,
                blockResult
        };
        @inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
            ma.syncCopyFrom(inputData, block_idx, thread_idx);
            ma.process(thread_idx);
            ma.syncCopyTo(outputData, block_idx, thread_idx);
        }
    }
}

 [stage_action_runner.cpp:33]
Segmentation fault

This is the backtrace I get with gdb:

#0  0x00007fffe78bf121 in llvm::vfs::InMemoryFileSystem::addFile(llvm::Twine const&, long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<llvm::sys::fs::file_type>, std::optional<llvm::sys::fs::perms>) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#1  0x00007fffe77df5aa in oklt::addInstrinsicStub (session=..., compiler=...) at /ccs/home/thilina/fus166/.local/occa-transpiler/clang/include/llvm/ADT/Twine.h:285
#2  0x00007fffe782dd43 in oklt::StageAction::PrepareToExecuteAction (this=0x4c1a00, compiler=...) at /usr/include/c++/12/bits/shared_ptr_base.h:1665
#3  0x00007fffe973b398 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#4  0x00007fffe7949f0e in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
   from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#5  0x00007fffe79415ac in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) ()
   from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#6  0x00007fffe79452d8 in clang::tooling::ToolInvocation::run() () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#7  0x00007fffe7949456 in clang::tooling::runToolOnCodeWithArgs(std::unique_ptr<clang::FrontendAction, std::default_delete<clang::FrontendAction> >, llvm::Twine const&, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, llvm::Twine const&, llvm::Twine const&, std::shared_ptr<clang::PCHContainerOperations>) ()
   from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#8  0x00007fffe794993d in clang::tooling::runToolOnCodeWithArgs(std::unique_ptr<clang::FrontendAction, std::default_delete<clang::FrontendAction> >, llvm::Twine const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, llvm::Twine const&, llvm::Twine const&, std::shared_ptr<clang::PCHContainerOperations>, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#9  0x00007fffe783266d in oklt::runStageAction (stageName=..., session=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/core/stage_action_runner.cpp:68
#10 0x00007fffe7833144 in oklt::runPipeline (pipeline=..., session=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/core/stage_action_runner.cpp:97
#11 0x00007fffe7829a21 in oklt::normalizeAndTranspile (input=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/normalizer_and_transpiler.cpp:16
#12 0x00007fffed8eaad4 in occa::transpiler::Transpiler::run (this=this@entry=0x7fffffff5140, filename=..., mode=..., kernelProps=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/utils/transpiler_utils.cpp:135
#13 0x00007fffed8b3ee4 in occa::serial::v3::transpileFile (filename=..., outputFile=..., kernelProps=..., metadata=..., mode=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:69
#14 0x00007fffed8b7147 in occa::serial::device::buildKernel (this=this@entry=0x478460, filename=..., kernelName=..., kernelHash=..., kernelProps=..., isLauncherKernel=<optimized out>, isLauncherKernel@entry=false)
    at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:353
#15 0x00007fffed8b778d in occa::serial::device::buildKernel (this=this@entry=0x478460, filename=..., kernelName=..., kernelHash=..., kernelProps=...)
    at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:168
#16 0x00007fffed7010c6 in occa::device::buildKernel (this=this@entry=0x7fffffff5a40, filename=..., kernelName=..., props=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/core/device.cpp:394
#17 0x0000000000401ea2 in main (argc=<optimized out>, argv=<optimized out>) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/examples/cpp/31_oklt_v3_moving_avg/main.cpp:67

I am using gcc=12.3.0 to build OCCA and doing a release build. I will try a debug build and
see if it gives me more information.

PS: I was actually doing a release build with debug info.

@IuriiKobein
Copy link

Thanks for report.
At this moment I have a quick question that help us to proceed with a potential fix.

Did clang was installed according to the https://github.com/libocca/occa-transpiler?tab=readme-ov-file#setup-clang-17 section?
If yes which exactly variant was used?

@thilinarmtb
Copy link
Collaborator

Thanks for report. At this moment I have a quick question that help us to proceed with a potential fix.

Did clang was installed according to the https://github.com/libocca/occa-transpiler?tab=readme-ov-file#setup-clang-17 section? If yes which exactly variant was used?

I installed clang from the source checking out the llvmorg-17.0.6 tag.
Below is the commit:

commit 6009708b4367171ccdbf4b5905cb6a803753fe18 (grafted, HEAD, tag: llvmorg-17.0.6)
Author: Tobias Hieta <[email protected]>
Date:   Tue Nov 28 09:52:28 2023 +0100

    Revert "[runtimes] Add missing test dependencies to check-all (#72955)"
    
    This reverts commit e957e6dcb29d94e4e1678da9829b77009be88926.
    
    The commit was reverted on main because of issues. We will not carry
    this in the release branch for 17.x

These are the configure and build commands I used:

cmake -S llvm -B build -G "Unix Makefiles" \
  -DCMAKE_C_COMPILER=`which gcc` \
  -DCMAKE_CXX_COMPILER=`which g++` \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=~/fus166/.local/occa-transpiler/clang \
  -DLLVM_ENABLE_WERROR=OFF \
  -DLLVM_TARGETS_TO_BUILD='X86' \
  -DLLVM_PARALLEL_LINK_JOBS=1 \
  -DLLVM_ENABLE_RTTI=ON \
  -DCMAKE_POLICY_DEFAULT_CMP0094=NEW \
  -DCMAKE_VERBOSE_MAKEFILE=ON \
  -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF \
  -DLLVM_ENABLE_PROJECTS="polly;lld;lldb;clang-tools-extra;llvm;clang" \
  -DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi;compiler-rt" \
  -DLLVM_REQUIRES_RTTI=ON \
  -DLLVM_ENABLE_RTTI=ON \
  -DLLVM_ENABLE_EH=ON \
  -DLLVM_POLLY_LINK_INTO_TOOLS=ON \
  -DLLVM_Z3_INSTALL_DIR=${Z3_INSTALL_DIR} \
  -DLLVM_ENABLE_Z3_SOLVER=OFF

make -C build install -j12

I think the only thing different to the configure command in the instructions
is that I turned-off Z3-solver.

@YuraCobain
Copy link

So far we couldn't reproduce the issue on our local machines with already setup configuration. The next use the same CMake version and clang build options as yours to catch the issue.

@thilinarmtb
Copy link
Collaborator

thilinarmtb commented Sep 23, 2024

Seems like the reason for the segfault was that I used two different versions
of gcc: one version to build clang and another version to build occa.

Once I used the same gcc version for both, I don't see a segfault anymore.
Now I can run the test but it still fails:

[[email protected] 31_oklt_v3_moving_avg]$ ./examples_cpp_oklt_v3_moving_avg 
Comparison with gold values has failed

I can attach the full log with trace on if that is helpful.

@YuraCobain
Copy link

YuraCobain commented Sep 23, 2024

Glad that the root cause of segfault is found.
The test example was tested only for CUDA/HIP backends. You could verify it by following options:

examples_cpp_oklt_v3_moving_avg -d "{mode: 'CUDA', device_id: 0}"

We are working to fix it for Serial mode as well that is the default one if -d option is omitted.

@thilinarmtb
Copy link
Collaborator

Thanks ! Yes, the example passes with HIP backend. I will try to test this on a few more kernels.

@IuriiKobein
Copy link

Hi Thilina,

The example "31_oklt_v3_moving_avg" is fixed to support host only backends: Serial, OpenMP.
Please pull the latest change and try to fix.
Looking forward for your feedback.

@thilinarmtb
Copy link
Collaborator

With your latest fix, the tests pass for HIP, Serial and OpenMP backends. I will test this a bit more.

@thilinarmtb
Copy link
Collaborator

@IuriiKobein : I added a simple kernel which calculates the dot product between two
vectors here. Seems like it fails with the transpiler. The failure is due to transpiler not
recognizing unsigned int. I think OCCA supports unsigned int (I may be wrong).

@YuraCobain
Copy link

@thilinarmtb please refer the issue reported above for clarification.

@thilinarmtb
Copy link
Collaborator

Is transpiler version 2 is the same as regular OCCA? Seems like unsigned

@thilinarmtb please refer the issue reported above for clarification.

We will continue the discussion there till the issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants