-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trilinos failure with superlu-dist with xsdk+rocm #235
Comments
Here its building
|
While this probably should be fixed, is there a reason for building both old (host-only) and new solver stacks for a HIP platform? |
@balay Can you try the latest commit |
@balay |
I don't see the above error anymore - but trilinos build continues to fail.
|
@cgcgcg |
@srajama1 @ndellingwood Can you have a look at this Amesos2 issue? |
@xiaoyeli for this xsdk release - I'm continuing with disabling cuda and rocm from superlu-dist and trilinos (same as last xsdk release) - to avoid these issues. And the builds are currently working in this mode. https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1064707391 |
@balay Two questions:
|
@xiaoyeli I'm not sure if the issues were the same - but we could not enable superlu-dist+cuda [and trilinos+cuda] in the past release cycles aswell. Here is one prior issue that was filed
You mean with superlu+cuda, petsc+cuda - without trilinos? Will try. Right now the build uses: |
ref: I'm seeing failures with:
|
ref:
|
Perhaps we should set |
wrt mfem and superlu-dist+rocm - I'm testing with: diff --git a/var/spack/repos/builtin/packages/mfem/package.py b/var/spack/repos/builtin/packages/mfem/package.py
index f4821e63c2..75eeda7b1f 100644
--- a/var/spack/repos/builtin/packages/mfem/package.py
+++ b/var/spack/repos/builtin/packages/mfem/package.py
@@ -967,6 +967,9 @@ def find_optional_library(name, prefix):
if "^rocthrust" in spec and not spec["hip"].external:
# petsc+rocm needs the rocthrust header path
hip_headers += spec["rocthrust"].headers
+ if "^hipblas" in spec and not spec["hip"].external:
+ # superlu-dist+rocm needs the hipblas header path
+ hip_headers += spec["hipblas"].headers
if "%cce" in spec:
# We assume the proper Cray CCE module (cce) is loaded:
craylibs_path = env["CRAYLIBS_" + machine().upper()] |
this build is working. So will update xsdk-1.0.0 with these changes. https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1066199514 The mfem change for supleru-dist+rocm is at spack/spack#40981 |
@balay Just to confirm: The issue is building SuperLU_dist and Trilinos with its SuperLU_dist interface enabled on Cuda&HIP? But when you disable the interface in Trilinos then everything works? So presumably something is broken in our interface? |
yes - likely spack command to reproduce (I'm checking via xsdk - which enabled many of the trilinos variants)
This is irrespective of So right now I'm using [with xsdk]
[and similar for rocm] |
@cgcgcg It seems it is a long-standing problem, and is easy to fix. Can you ask someone on the Trilinos team to take a look and fix it? |
@xiaoyeli We are looking at this. Looks like SuperLU-Dist started using extern C within your headers so we don't have to do it. Can you tell us which version did this happen? We might have to support older versions on some systems, so we can check version numbers to decide whether to include the "extern C" or not. |
Git blame says "extern C" came to SuperLU-Dist 6 years ago I hope we don't need to support versions older than that :) |
@srajama1 |
We will remove it. Thanks @xiaoyeli ! |
@balay A fix for this is now on Trilinos develop. Is this covered by a build that we can check? Or do we need to wait for the next Trilinos release? |
I'm still seeing failures with trilinos-develp (listed in my build as 14.4.1) - with enabling rocm or cuda. Attaching logs. |
Hm, the cuda build says
What options are set for the spack build? |
I reran the build - making sure trilinos+cuda is enabled.
This is using latest diff --git a/var/spack/repos/builtin/packages/trilinos/package.py b/var/spack/repos/builtin/packages/trilinos/package.py
index ef335a2728..4d33265a03 100644
--- a/var/spack/repos/builtin/packages/trilinos/package.py
+++ b/var/spack/repos/builtin/packages/trilinos/package.py
@@ -42,6 +42,7 @@ class Trilinos(CMakePackage, CudaPackage, ROCmPackage):
version("master", branch="master")
version("develop", branch="develop")
+ version("15.0.1", branch="develop")
version("15.0.0", sha256="5651f1f967217a807f2c418a73b7e649532824dbf2742fa517951d6cc11518fb")
version("14.4.0", sha256="8e7d881cf6677aa062f7bfea8baa1e52e8956aa575d6a4f90f2b6f032632d4c6")
version("14.2.0", sha256="c96606e5cd7fc9d25b9dc20719cd388658520d7cbbd2b4de77a118440d1e0ccb")
diff --git a/var/spack/repos/builtin/packages/xsdk/package.py b/var/spack/repos/builtin/packages/xsdk/package.py
index 6b3ec2c126..2697fad1d1 100644
--- a/var/spack/repos/builtin/packages/xsdk/package.py
+++ b/var/spack/repos/builtin/packages/xsdk/package.py
@@ -150,7 +150,6 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
xsdk_depends_on("[email protected]", when="@0.8.0")
xsdk_depends_on("[email protected]", when="@0.7.0")
- xsdk_depends_on("trilinos +superlu-dist", when="@1.0.0: +trilinos ~cuda ~rocm")
xsdk_depends_on(
"trilinos@develop+hypre+hdf5~mumps+boost"
+ "~suite-sparse+tpetra+nox+ifpack2+zoltan+zoltan2+amesos2"
@@ -159,11 +158,12 @@ class Xsdk(BundlePackage, CudaPackage, ROCmPackage):
when="@develop +trilinos",
)
xsdk_depends_on(
- "[email protected]+hypre+hdf5~mumps+boost"
+ "[email protected]+hypre+hdf5~mumps+boost"
+ "~suite-sparse+tpetra+nox+ifpack2+zoltan+zoltan2+amesos2"
- + "~exodus~dtk+intrepid2+shards+stratimikos gotype=int"
- + " cxxstd=17",
+ + "~exodus~dtk+intrepid2+shards+stratimikos+superlu-dist gotype=int"
+ + " cxxstd=17 ",
when="@1.0.0 +trilinos",
+ cuda_var="cuda", rocm_var="rocm",
)
xsdk_depends_on(
"[email protected]+hypre+superlu-dist+hdf5~mumps+boost" using:
I get
|
@balay Can you try with |
@cgcgcg the above build has And I see:
i.e |
With this additional change: diff --git a/var/spack/repos/builtin/packages/trilinos/package.py b/var/spack/repos/builtin/packages/trilinos/package.py
index 4d33265a03..e2dd097d09 100644
--- a/var/spack/repos/builtin/packages/trilinos/package.py
+++ b/var/spack/repos/builtin/packages/trilinos/package.py
@@ -510,8 +510,6 @@ def flag_handler(self, name, flags):
if "+stk%intel" in spec:
# Workaround for Intel compiler segfaults with STK and IPO
flags.append("-no-ipo")
- if "+wrapper" in spec:
- flags.append("--expt-extended-lambda")
elif name == "ldflags":
if spec.satisfies("%cce@:14"):
flags.append("-fuse-ld=gold") trilinos build is is successful (with cuda, superlu-dist)
Now there are failures in dtk, phist, sundials |
trilinos+rocm [with superlu_dist] build also goes through with the attached changes.
And subsequent dtk, phist, sundials failures |
spack-build-out.txt
cc: @cgcgcg @lucbv
The text was updated successfully, but these errors were encountered: