Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildsystem update for Frontier using the worldshared directory and rocm/5.6 #126

Merged
merged 7 commits into from
Mar 14, 2024

Conversation

nkoukpaizan
Copy link
Collaborator

Merge request type

  • New feature
  • Resolves bug
  • Documentation
  • Other

Relates to

  • OPFLOW
  • SOPFLOW
  • SCOPFLOW
  • TCOPFLOW
  • CMake build system
  • Spack configuration
  • Manual
  • Web docs
  • Other

This MR updates

  • Header files
  • Source code
  • CMake build system
  • Spack configuration
  • Web docs
  • Manual
  • Other

Summary

This MR updates the Spack configuration and the corresponding modules on Frontier to build with rocm/5.6. The modules are build in the project's world-shared directory.
This replaces #89. Test failures remain and should be investigated.

@nkoukpaizan nkoukpaizan self-assigned this Mar 13, 2024
Copy link
Contributor

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I will wait for @pelesh to review before we can consider merging.

buildsystem/clang-hip/cache.cmake Show resolved Hide resolved
@pelesh
Copy link
Collaborator

pelesh commented Mar 13, 2024

Are we building ExaGO with gcc or clang?

@pelesh
Copy link
Collaborator

pelesh commented Mar 13, 2024

When building with clang I get following link error:

[ 56%] Linking CXX executable opflow
ld.lld: error: undefined symbol: mc19ad_
>>> referenced by IpEquilibrationScaling.cpp
>>>               IpEquilibrationScaling.o:(Ipopt::EquilibrationScaling::DetermineScalingParametersImpl(Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::SymMatrixSpace const>, Ipopt::Matrix const&, Ipopt::Vector const&, Ipopt::Matrix const&, Ipopt::Vector const&, double&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&)) in archive /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a

It seems HSL is not found. This is for clean checkout of develop on Frontier. I can seeHSL module loaded:

$ ml

Currently Loaded Modules:
...
 34) coinhsl/2019.05.21-gcc-12.2.0-mixed-and6kty
 35) hipblas/5.6.0-clang-16.0.0-rocm5.6.0-mixed-pgkobjo

I used following command to build:

$ CC=clang CXX=clang++ FC=flang cmake ../exago
$ make

I'll investigate more.

@pelesh
Copy link
Collaborator

pelesh commented Mar 13, 2024

The issue I'm seeing looks more like a bug in ExaGO's CMake config. HSL does not seem to be on the linker line:

[ 61%] Linking CXX executable tcopflow
cd /ccs/home/peles/src/exago/build-crusher/applications && /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/cmake-3.20.6-cdzi5pgrngs2wwvhesbwkkjvsftoyqia/bin/cmake -E cmake_link_script CMakeFiles/app_tcopflow.dir/link.txt --verbose=1
/opt/rocm-5.6.0/llvm/bin/clang++ CMakeFiles/app_tcopflow.dir/tcopflow_main.cpp.o -o tcopflow  -Wl,-rpath,/lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/petsc-3.20.4-zgcdpbefalitop4iud527awwmbarfrsc/lib:/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib:/lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/gcc-12.2.0-mixed/openblas-0.3.20-cjm2rkdlesgzck7muxx34kwpr6d5rm7d/lib::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ../src/tcopflow/libexago_tcopflow.a ../src/opflow/libexago_opflow.a ../src/pflow/libexago_pflow.a ../src/ps/libexago_ps.a ../src/utils/libexago_utils.a /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/petsc-3.20.4-zgcdpbefalitop4iud527awwmbarfrsc/lib/libpetsc.so /opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/lib/libmpi_gnu_91.so /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/gcc-12.2.0-mixed/openblas-0.3.20-cjm2rkdlesgzck7muxx34kwpr6d5rm7d/lib/libopenblas.so -lpthread -lm -ldl /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a 
ld.lld: error: undefined symbol: mc19ad_
>>> referenced by IpEquilibrationScaling.cpp
>>>               IpEquilibrationScaling.o:(Ipopt::EquilibrationScaling::DetermineScalingParametersImpl(Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::VectorSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::MatrixSpace const>, Ipopt::SmartPtr<Ipopt::SymMatrixSpace const>, Ipopt::Matrix const&, Ipopt::Vector const&, Ipopt::Matrix const&, Ipopt::Vector const&, double&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&, Ipopt::SmartPtr<Ipopt::Vector>&)) in archive /lustre/orion/eng145/world-shared/spack-install/linux-sles15-x86_64/clang-16.0.0-rocm5.6.0-mixed/ipopt-3.12.10-7fp33q627rou44fzquk57llhwoqqeuho/lib/libipopt.a

@cameronrutherford, please let me know if you can reproduce this issue.

@nkoukpaizan
Copy link
Collaborator Author

@pelesh I can reproduce with the build command you are using.

cmake -C ../buildsystem/clang-hip/cache.cmake ../exago; make seems to work, so it has to do with the CMAKE configuration and options. Some combinations (e.g., default options) seemingling don't work as expected.

@pelesh
Copy link
Collaborator

pelesh commented Mar 13, 2024

@pelesh I can reproduce with the build command you are using.

cmake -C ../buildsystem/clang-hip/cache.cmake ../exago; make seems to work, so it has to do with the CMAKE configuration and options. Some combinations (e.g., default options) seemingling don't work as expected.

I reproduced the same with the build system from develop, so it looks like an ExaGO bug unrelated to modules.

Copy link
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we merge. I will submit separate bug report(s) for issues I observed while testing these modules as they appear to be unrelated to this PR.

@cameronrutherford cameronrutherford merged commit 98d28d0 into develop Mar 14, 2024
7 of 8 checks passed
@cameronrutherford
Copy link
Contributor

Merged - didn't debug your failing build, but I assume that there are some missing CMake options that aren't configured during that minimal build. It might also be a plain CMake bug in our ExaGO code, but I would have to debug more to know for sure

@nkoukpaizan nkoukpaizan deleted the nicholson/frontier-worldshared branch September 24, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants