Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trilinos failure with superlu-dist with xsdk+rocm #235

Open
balay opened this issue Oct 13, 2023 · 31 comments
Open

trilinos failure with superlu-dist with xsdk+rocm #235

balay opened this issue Oct 13, 2023 · 31 comments

Comments

@balay
Copy link
Member

balay commented Oct 13, 2023

spack-build-out.txt


In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_wrapper.h:106,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/gpu_api_utils.h:26,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_defs.h:104,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-r3bbhy5cr5g4b2xqq3slhqf7cy63ei4u/include/superlu_ddefs.h:37,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-t5tcua5g5rbji3nfnwty3l3rdjltrtga/spack-src/packages/amesos/src/Amesos_Superludist.cpp:38:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.5.1-36grozs3lkqmnph77fzw7tfbykoccwci/include/hip/hip_runtime_api.h:7337:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
 7337 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__");
      |  ^~~~~

cc: @cgcgcg @lucbv

@balay balay changed the title trilinos+hip failure with superlu-dist trilinos failure with superlu-dist with xsdk+rocm Oct 13, 2023
@balay
Copy link
Member Author

balay commented Oct 13, 2023

Here its building trilinos~rocm

balay@petsc-gpu-02:/scratch/balay/spack$ ./bin/spack spec xsdk+rocm amdgpu_target=gfx90a |grep trilinos@
 -       ^[email protected]%[email protected]~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex~cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest+hdf5+hypre+ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist~teko~tempus~test+thyra+tpetra~trilinoscouplings~wrapper~x11+zoltan+zoltan2 build_system=cmake build_type=Release cxxstd=17 generator=make gotype=int arch=linux-ubuntu22.04-zen4

@cgcgcg
Copy link

cgcgcg commented Oct 13, 2023

While this probably should be fixed, is there a reason for building both old (host-only) and new solver stacks for a HIP platform?

@liuyangzhuan
Copy link

@balay Can you try the latest commit
xiaoyeli/superlu_dist@0c9ea16
to see if this is fixed?

@xiaoyeli
Copy link

xiaoyeli commented Nov 6, 2023

@balay
Is this good now? If so, we can close it.

@balay
Copy link
Member Author

balay commented Nov 6, 2023

I don't see the above error anymore - but trilinos build continues to fail.

 In file included from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.6.1-olxkmdjitey5gszct57gyagmg4kg33xh/include/hip/amd_detail/amd_channel_descriptor.h:28,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.6.1-olxkmdjitey5gszct57gyagmg4kg33xh/include/hip/channel_descriptor.h:32,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.6.1-olxkmdjitey5gszct57gyagmg4kg33xh/include/hip/texture_types.h:38,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.6.1-olxkmdjitey5gszct57gyagmg4kg33xh/include/hip/hip_runtime_api.h:489,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-gzzebo7ca4h6q7pjqnzv2elmjkfy66i6/include/gpu_wrapper.h:110,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-gzzebo7ca4h6q7pjqnzv2elmjkfy66i6/include/gpu_api_utils.h:26,
                 from /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/superlu-dist-8.2.0-gzzebo7ca4h6q7pjqnzv2elmjkfy66i6/include/superlu_defs.h:104,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_TypeMap.hpp:88,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_FunctionMap.hpp:63,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_decl.hpp:58,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist.hpp:47,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Factory.hpp:108,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Factory.cpp:44:
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen4/gcc-11.4.0/hip-5.6.1-olxkmdjitey5gszct57gyagmg4kg33xh/include/hip/amd_detail/amd_hip_vector_types.h:144:5: error: template with C linkage
  144 |     template<typename T, unsigned int n> struct HIP_vector_base;
      |     ^~~~~~~~
In file included from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_FunctionMap.hpp:63,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_decl.hpp:58,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist.hpp:47,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Factory.hpp:108,
                 from /scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Factory.cpp:44:
/scratch/balay/spack/spack-stage/spack-stage-trilinos-14.4.0-mt4rxovcrszkydmjlmmmzrahyjf6j42s/spack-src/packages/amesos2/src/Amesos2_Superludist_TypeMap.hpp:75:1: note: 'extern "C"' linkage started here
   75 | extern "C" {
      | ^~~~~~~~~~

spack-build-out.txt

@xiaoyeli
Copy link

xiaoyeli commented Nov 6, 2023

@cgcgcg
Can you or someone from Trilinos team take a look at this place? Why does it complain about "extern ..."?

https://github.com/trilinos/Trilinos/blob/c4f035ce9aab54e50654c9a400f2b4c041331670/packages/amesos2/src/Amesos2_Superludist_TypeMap.hpp#L75

@cgcgcg
Copy link

cgcgcg commented Nov 6, 2023

@srajama1 @ndellingwood Can you have a look at this Amesos2 issue?

@balay
Copy link
Member Author

balay commented Nov 8, 2023

@xiaoyeli for this xsdk release - I'm continuing with disabling cuda and rocm from superlu-dist and trilinos (same as last xsdk release) - to avoid these issues.

And the builds are currently working in this mode.

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1064707391

@xiaoyeli
Copy link

xiaoyeli commented Nov 8, 2023

@balay Two questions:

  1. Are you saying that this is a long-standing problem, and also appeared during the testing from last release a year ago?
  2. Can you enable cuda and rocm for the other packages that call superlu?

cc: @srajama1 @ndellingwood

@balay
Copy link
Member Author

balay commented Nov 9, 2023

Are you saying that this is a long-standing problem, and also appeared during the testing from last release a year ago?

@xiaoyeli I'm not sure if the issues were the same - but we could not enable superlu-dist+cuda [and trilinos+cuda] in the past release cycles aswell.

Here is one prior issue that was filed

#162

Can you enable cuda and rocm for the other packages that call superlu?

You mean with superlu+cuda, petsc+cuda - without trilinos? Will try. Right now the build uses: petsc+superlu-dist+cuda, superlu-dist~cuda trilinos~cuda

@balay
Copy link
Member Author

balay commented Nov 9, 2023

ref: ./bin/spack install -j64 [email protected]+rocm~trilinos amdgpu_target=gfx90a

I'm seeing failures with:

cc: @jthies @v-dobrev

@balay
Copy link
Member Author

balay commented Nov 9, 2023

ref: ./bin/spack install -j24 [email protected]%[email protected]~trilinos +cuda cuda_arch=70 ^[email protected] ^openmpi

xsdk+cuda build with superlu-dist+cuda is successful when trilinos is disabled.

balay@xsdk:/data/balay/spack>./bin/spack find -v | grep superlu-dist
[email protected]~caliper~complex+cuda~debug+fortran~gptune~int64~internal-superlu~magma~mixedint+mpi~openmp~rocm+shared+superlu-dist~sycl~umpire~unified-memory build_system=autotools cuda_arch=70
[email protected]~amgx~conduit+cuda~debug+examples~exceptions~fms~ginkgo~gnutls~gslib~hiop~lapack~libceed~libunwind+metis+miniapps~mpfr+mpi~netcdf~occa~openmp+petsc~pumi~raja~rocm+shared~slepc+static~strumpack~suite-sparse+sundials+superlu-dist~threadsafe~umpire+zlib build_system=generic cuda_arch=70 patches=718f073 timer=auto
[email protected]~X~batch~cgns~complex+cuda~debug+double~exodusii~fftw+fortran~giflib+hdf5~hpddm~hwloc+hypre~int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr+mpi~mumps~openmp~p4est~parmmg~ptscotch~random123~rocm~saws~scalapack+shared~strumpack~suite-sparse+superlu-dist~sycl~tetgen~trilinos~valgrind build_system=generic clanguage=C cuda_arch=70 memalign=none
[email protected]+ARKODE+CVODE+CVODES+IDA+IDAS+KINSOL+cuda+examples+examples-install~f2003~fcmix+generic-math+ginkgo+hypre~int64~ipo~klu~kokkos~kokkos-kernels~lapack+magma~monitoring+mpi~openmp+petsc~profiling~pthread~raja~rocm+shared+static+superlu-dist~superlu-mt~sycl~trilinos build_system=cmake build_type=Release cstd=99 cuda_arch=70 cxxstd=14 generator=make logging-level=0 logging-mpi=OFF precision=double
[email protected]+cuda~int64~ipo~openmp+parmetis~rocm+shared build_system=cmake build_type=Release cuda_arch=70 generator=make

@balay
Copy link
Member Author

balay commented Nov 9, 2023

Perhaps we should set trilinos~superlu-dist for the GPU builds... will check..

@balay
Copy link
Member Author

balay commented Nov 9, 2023

wrt mfem and superlu-dist+rocm - I'm testing with:

diff --git a/var/spack/repos/builtin/packages/mfem/package.py b/var/spack/repos/builtin/packages/mfem/package.py
index f4821e63c2..75eeda7b1f 100644
--- a/var/spack/repos/builtin/packages/mfem/package.py
+++ b/var/spack/repos/builtin/packages/mfem/package.py
@@ -967,6 +967,9 @@ def find_optional_library(name, prefix):
             if "^rocthrust" in spec and not spec["hip"].external:
                 # petsc+rocm needs the rocthrust header path
                 hip_headers += spec["rocthrust"].headers
+            if "^hipblas" in spec and not spec["hip"].external:
+                # superlu-dist+rocm needs the hipblas header path
+                hip_headers += spec["hipblas"].headers
             if "%cce" in spec:
                 # We assume the proper Cray CCE module (cce) is loaded:
                 craylibs_path = env["CRAYLIBS_" + machine().upper()]

@balay
Copy link
Member Author

balay commented Nov 9, 2023

Perhaps we should set trilinos~superlu-dist for the GPU builds... will check..

this build is working. So will update xsdk-1.0.0 with these changes.

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1066199514

The mfem change for supleru-dist+rocm is at spack/spack#40981

@cgcgcg
Copy link

cgcgcg commented Nov 9, 2023

@balay Just to confirm: The issue is building SuperLU_dist and Trilinos with its SuperLU_dist interface enabled on Cuda&HIP? But when you disable the interface in Trilinos then everything works? So presumably something is broken in our interface?

@balay
Copy link
Member Author

balay commented Nov 9, 2023

@balay Just to confirm: The issue is building SuperLU_dist and Trilinos with its SuperLU_dist interface enabled on Cuda&HIP? But when you disable the interface in Trilinos then everything works? So presumably something is broken in our interface?

yes - likely spack command to reproduce (I'm checking via xsdk - which enabled many of the trilinos variants)

spack install trilinos+superlu-dist ^superlu-dist+cuda

This is irrespective of trilinos+cuda or trilinos~cuda

So right now I'm using [with xsdk]

spack install xsdk+cuda ^trilinos~superlu-dist~cuda ^superlu-dist+cuda

[and similar for rocm]

@xiaoyeli
Copy link

xiaoyeli commented Nov 9, 2023

@cgcgcg
If you go back to this thread, the complain was at this line of the interface code:

https://github.com/trilinos/Trilinos/blob/c4f035ce9aab54e50654c9a400f2b4c041331670/packages/amesos2/src/Amesos2_Superludist_TypeMap.hpp#L75

It seems it is a long-standing problem, and is easy to fix. Can you ask someone on the Trilinos team to take a look and fix it?

@srajama1
Copy link

srajama1 commented Nov 9, 2023

@xiaoyeli We are looking at this.

Looks like SuperLU-Dist started using extern C within your headers so we don't have to do it. Can you tell us which version did this happen? We might have to support older versions on some systems, so we can check version numbers to decide whether to include the "extern C" or not.

@srajama1
Copy link

srajama1 commented Nov 9, 2023

Git blame says "extern C" came to SuperLU-Dist 6 years ago

xiaoyeli/superlu_dist@949ea75

I hope we don't need to support versions older than that :)

@xiaoyeli
Copy link

xiaoyeli commented Nov 9, 2023

@srajama1
You can remove the 'extern "C"' from your side.
I am surprised that we didn't have it in the earlier versions before 6 years ago. Because without this protection, the C++ compiler will not produce proper names, and our code cannot be used by a C++ program.

@srajama1
Copy link

srajama1 commented Nov 9, 2023

We will remove it. Thanks @xiaoyeli !

@cgcgcg
Copy link

cgcgcg commented Dec 11, 2023

@balay A fix for this is now on Trilinos develop. Is this covered by a build that we can check? Or do we need to wait for the next Trilinos release?

@balay
Copy link
Member Author

balay commented Dec 11, 2023

I'm still seeing failures with trilinos-develp (listed in my build as 14.4.1) - with enabling rocm or cuda.

Attaching logs.

spack-build-out.rocm.txt
spack-build-out.cuda.txt

@cgcgcg
Copy link

cgcgcg commented Dec 11, 2023

Hm, the cuda build says

Explicitly disabled external packages/TPLs on input (by user or by default):  CUDA [...]

What options are set for the spack build?

@balay
Copy link
Member Author

balay commented Dec 16, 2023

I reran the build - making sure trilinos+cuda is enabled.

./bin/spack spec [email protected]%[email protected]+cuda cuda_arch=80 ^[email protected]
...
[+]      ^[email protected]%[email protected]+cuda~int64~ipo~openmp+parmetis~rocm+shared build_system=cmake build_type=Release cuda_arch=80 generator=make arch=linux-ubuntu22.04-zen3
 -       ^[email protected]%[email protected]~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex+cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest+hdf5+hypre+ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer~phalanx~piro~python~rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist~teko~tempus~test+thyra+tpetra~trilinoscouplings~uvm+wrapper~x11+zoltan+zoltan2 build_system=cmake build_type=Release cuda_arch=80 cxxstd=17 generator=make gotype=int arc
h=linux-ubuntu22.04-zen3
...

This is using latest develop with the following change [to use trilinos develop branch]

diff --git a/var/spack/repos/builtin/packages/trilinos/package.py b/var/spack/repos/builtin/packages/trilinos/package.py
index ef335a2728..4d33265a03 100644
--- a/var/spack/repos/builtin/packages/trilinos/package.py
+++ b/var/spack/repos/builtin/packages/trilinos/package.py
@@ -42,6 +42,7 @@ class Trilinos(CMakePackage, CudaPackage, ROCmPackage):
 
     version("master", branch="master")
     version("develop", branch="develop")
+    version("15.0.1", branch="develop")
     version("15.0.0", sha256="5651f1f967217a807f2c418a73b7e649532824dbf2742fa517951d6cc11518fb")
     version("14.4.0", sha256="8e7d881cf6677aa062f7bfea8baa1e52e8956aa575d6a4f90f2b6f032632d4c6")
     version("14.2.0", sha256="c96606e5cd7fc9d25b9dc20719cd388658520d7cbbd2b4de77a118440d1e0ccb")
diff --git a/var/spack/repos/builtin/packages/xsdk/package.py b/var/spack/repos/builtin/packages/xsdk/package.py
index 6b3ec2c126..2697fad1d1 100644
--- a/var/spack/repos/builtin/packages/xsdk/package.py
+++ b/var/spack/repos/builtin/packages/xsdk/package.py
@@ -150,7 +150,6 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
     xsdk_depends_on("[email protected]", when="@0.8.0")
     xsdk_depends_on("[email protected]", when="@0.7.0")
 
-    xsdk_depends_on("trilinos +superlu-dist", when="@1.0.0: +trilinos ~cuda ~rocm")
     xsdk_depends_on(
         "trilinos@develop+hypre+hdf5~mumps+boost"
         + "~suite-sparse+tpetra+nox+ifpack2+zoltan+zoltan2+amesos2"
@@ -159,11 +158,12 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
         when="@develop +trilinos",
     )
     xsdk_depends_on(
-        "[email protected]+hypre+hdf5~mumps+boost"
+        "[email protected]+hypre+hdf5~mumps+boost"
         + "~suite-sparse+tpetra+nox+ifpack2+zoltan+zoltan2+amesos2"
-        + "~exodus~dtk+intrepid2+shards+stratimikos gotype=int"
-        + " cxxstd=17",
+        + "~exodus~dtk+intrepid2+shards+stratimikos+superlu-dist gotype=int"
+        + " cxxstd=17 ",
         when="@1.0.0 +trilinos",
+        cuda_var="cuda", rocm_var="rocm",
     )
     xsdk_depends_on(
         "[email protected]+hypre+superlu-dist+hdf5~mumps+boost"

using:

./bin/spack install -j32  [email protected]%[email protected]+cuda cuda_arch=80 ^[email protected]

I get

-- Check for working CXX compiler: /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/mpich-4.1.2-si2g2ajcuunt2u6oqangmpn7rq4rbqa5/bin/mpic++ - broken
CMake Error at /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/cmake-3.27.9-quq3pv7mwictuuvg7m3dm2tdet3kkjor/share/cmake-3.27/Modules/CMakeTestCXXCompiler.cmake:60 (message):
  The C++ compiler

    "/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/mpich-4.1.2-si2g2ajcuunt2u6oqangmpn7rq4rbqa5/bin/mpic++"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: '/scratch/balay/spack/spack-stage/spack-stage-trilinos-15.0.1-ejmgblrzy72khdxawps72brvshfr4g4b/spack-build-ejmgblr/CMakeFiles/CMakeScratch/TryCompile-kC1Bjz'

    Run Build Command(s): /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/cmake-3.27.9-quq3pv7mwictuuvg7m3dm2tdet3kkjor/bin/cmake -E env VERBOSE=1 /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/gmake-4.4.1-5quzins5c2jqhwgkxwpndhyivnrfgxm2/bin/gmake -f Makefile cmTC_a8f67/fast
    /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/gmake-4.4.1-5quzins5c2jqhwgkxwpndhyivnrfgxm2/bin/gmake  -f CMakeFiles/cmTC_a8f67.dir/build.make CMakeFiles/cmTC_a8f67.dir/build
    gmake[1]: Entering directory '/scratch/balay/spack/spack-stage/spack-stage-trilinos-15.0.1-ejmgblrzy72khdxawps72brvshfr4g4b/spack-build-ejmgblr/CMakeFiles/CMakeScratch/TryCompile-kC1Bjz'
    Building CXX object CMakeFiles/cmTC_a8f67.dir/testCXXCompiler.cxx.o
    /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/mpich-4.1.2-si2g2ajcuunt2u6oqangmpn7rq4rbqa5/bin/mpic++    -o CMakeFiles/cmTC_a8f67.dir/testCXXCompiler.cxx.o -c /scratch/balay/spack/spack-stage/spack-stage-trilinos-1\
5.0.1-ejmgblrzy72khdxawps72brvshfr4g4b/spack-build-ejmgblr/CMakeFiles/CMakeScratch/TryCompile-kC1Bjz/testCXXCompiler.cxx
    g++: error: unrecognized command-line option '--expt-extended-lambda'

spack-build-out.txt

@cgcgcg
Copy link

cgcgcg commented Dec 16, 2023

@balay Can you try with +wrapper as well? That should enable the nvcc_wrapper.

@balay
Copy link
Member Author

balay commented Dec 16, 2023

@cgcgcg the above build has +wrapper [as listed in the spack spec output in the previous message]

And I see:

            if "+wrapper" in spec:
                flags.append("--expt-extended-lambda")

i.e --expt-extended-lambda is added only for the +wrapper build

@balay
Copy link
Member Author

balay commented Dec 16, 2023

With this additional change:

diff --git a/var/spack/repos/builtin/packages/trilinos/package.py b/var/spack/repos/builtin/packages/trilinos/package.py
index 4d33265a03..e2dd097d09 100644
--- a/var/spack/repos/builtin/packages/trilinos/package.py
+++ b/var/spack/repos/builtin/packages/trilinos/package.py
@@ -510,8 +510,6 @@ def flag_handler(self, name, flags):
             if "+stk%intel" in spec:
                 # Workaround for Intel compiler segfaults with STK and IPO
                 flags.append("-no-ipo")
-            if "+wrapper" in spec:
-                flags.append("--expt-extended-lambda")
         elif name == "ldflags":
             if spec.satisfies("%cce@:14"):
                 flags.append("-fuse-ld=gold")

trilinos build is is successful (with cuda, superlu-dist)

==> Installing trilinos-15.0.1-aouk7pqbmlfktkzd4ffxs3iimuajdyug [84/100]
==> No binary for trilinos-15.0.1-aouk7pqbmlfktkzd4ffxs3iimuajdyug found: installing from source
==> No patches needed for trilinos
==> trilinos: Executing phase: 'cmake'
==> trilinos: Executing phase: 'build'
==> trilinos: Executing phase: 'install'
==> trilinos: Successfully installed trilinos-15.0.1-aouk7pqbmlfktkzd4ffxs3iimuajdyug
  Stage: 59.09s.  Cmake: 46.24s.  Build: 21m 57.47s.  Install: 9.52s.  Post-install: 2.28s.  Total: 23m 55.63s
[+] /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.4.0/trilinos-15.0.1-aouk7pqbmlfktkzd4ffxs3iimuajdyug

Now there are failures in dtk, phist, sundials

build.log.txt

@balay
Copy link
Member Author

balay commented Dec 16, 2023

trilinos+rocm [with superlu_dist] build also goes through with the attached changes.

./bin/spack spec [email protected]+rocm amdgpu_target=gfx90a

[+]      ^[email protected]%[email protected]~cuda~int64~ipo~openmp+parmetis+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=Release generator=make arch=linux-ubuntu22.04-zen4
[+]      ^[email protected]%[email protected]~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex~cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest+hdf5+hypre+ifpack+ifpack2~intrepid+intrepid2~ipo~isorropia+kokkos~mesquite~minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer~phalanx~piro~python+rocm~rocm_rdc~rol~rythmos+sacado~scorec+shards+shared~shylu~stk~stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist~teko~tempus~test+thyra+tpetra~trilinoscouplings~wrapper~x11+zoltan+zoltan2 amdgpu_target=gfx90a build_system=cmake build_type=Release cxxstd=17 generator=make gotype=int  arch=linux-ubuntu22.04-zen4
./bin/spack install -j64 [email protected]+rocm amdgpu_target=gfx90a

trilinos-cuda-rocm.patch.txt

And subsequent dtk, phist, sundials failures

build-rocm.log.txt

@iyamazaki
Copy link

iyamazaki commented Apr 4, 2024

@balay. We were wondering if the Trilinos PR 12524 has resolved this issue with Amesos2. Please let us know if we could help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants