Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExaGO build failures in xSDK CI #224

Closed
v-dobrev opened this issue Sep 30, 2023 · 25 comments · Fixed by spack/spack#41350
Closed

ExaGO build failures in xSDK CI #224

v-dobrev opened this issue Sep 30, 2023 · 25 comments · Fixed by spack/spack#41350

Comments

@v-dobrev
Copy link
Member

See https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5197024340

ping: @cameronrutherford

@cameronrutherford
Copy link

These failures are unique to gcc 11.3+ for some reason.

Simple workaround is to have exago build ~python. A relevant ExaGO issue has been created here to track future progress.

See pnnl/ExaGO#23

@v-dobrev
Copy link
Member Author

v-dobrev commented Oct 2, 2023

Thanks for the response @cameronrutherford!

Do you want to propose an MR to update the CI tests with the proposed workaround? The place to update these is here: https://gitlab.com/xsdk-project/spack-xsdk-ci in the file gitlab-xsdk-ci.yml. If you prefer, I can create an MR too.

@cameronrutherford
Copy link

Thanks for the response @cameronrutherford!

Do you want to propose an MR to update the CI tests with the proposed workaround? The place to update these is here: https://gitlab.com/xsdk-project/spack-xsdk-ci in the file gitlab-xsdk-ci.yml. If you prefer, I can create an MR too.

I can make the PR since it would be good for me to be a little more of a regular contributor.

Since this is an error for [email protected] and intel compilers https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5197024342, I was wondering if this should be something that we modify in the xSDK package spec, or instead just within the CI configurations.

I also noticed quite a few of the CI pipelines were failing due to spack being unable to concretize, so I was hesitant to break that even further.

@v-dobrev
Copy link
Member Author

v-dobrev commented Oct 2, 2023

The failures with Intel compilers are not due to ExaGO, I think -- just the ones that use gcc v11.3.1. Specifically, I was looking at the jobs called xd_l_g113-xsdk-080-gcc1131 and xd_l_g113-xsdk-080-linux-gcc1131-cuda.

As you said, the fix can probably be added in the xSDK spack package -- that may be a good solution too. @balay, what do you think?

@balay
Copy link
Member

balay commented Oct 9, 2023

ok pushed exago~python to xsdk spec..

@cameronrutherford
Copy link

cameronrutherford commented Oct 9, 2023

I guess my comment was lost to the void... Just merged a fix for this build error to exago@develop, so exago+python should be fine.

See pnnl/ExaGO#23 which is now closed. We have preventative checks in place now which should catch issues, and this should be easier to fix after pybind11 is a dep in spack. I assume this happened due to some compiler flags changing in Debug builds.

@balay
Copy link
Member

balay commented Oct 9, 2023

Sorry - misunderstood. I reverted this change.

Will start a pipeline to gather current failures..

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1030800463

@v-dobrev
Copy link
Member Author

My original post was about the failures in the xSDK 0.8.0 pipelines. We still need to do something about those -- either add a fix the xSDK spack package or in the xSDK CI spec.

@balay
Copy link
Member

balay commented Oct 12, 2023

Tried ^exago~python with [email protected] - still many failures.. [perhaps some are unrelated]

https://gitlab.com/xsdk-project/spack-xsdk/-/pipelines/1033739756

@cameronrutherford
Copy link

Okay so original issue was resolve with +python build issues, but ~python ones are still valid, and captured in:

All other pipeline failures are seemingly not exago related on the surface, and I appreciate you running this build.

Note that we have EXAGO_RUN_TESTS depend on if spack install is enabling testing, so we might need to change that if it affects xSDK testing

@cameronrutherford
Copy link

Okay I tried to track everything down. Take away is that pnnl/ExaGO#45 exago+python~mpi and exago~mpi might have issues at the moment, and will be in future spack package. exago~python+mpi bug was really useful to catch, although for some reason we could not reproduce, so thank you for documenting!!

@cameronrutherford
Copy link

Closing unless someone has a build failure to follow up on

@v-dobrev
Copy link
Member Author

I'm not sure the issue with [email protected] is resolved since it depends on [email protected] -- the latest pipeline for [email protected] still has ExaGo failures, e.g. in this job: https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5368514783.

Probably, it is possible to create a patch in Spack for [email protected] that backport the fixes in the main ExaGo repo?

@cameronrutherford
Copy link

cameronrutherford commented Oct 27, 2023

I'm not sure the issue with [email protected] is resolved since it depends on [email protected] -- the latest pipeline for [email protected] still has ExaGo failures, e.g. in this job: https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5368514783.

Probably, it is possible to create a patch in Spack for [email protected] that backport the fixes in the main ExaGo repo?

I have not made a spack patch before, but yes in principle it would be very easy in this case. Diff is shown in this commit pnnl/ExaGO@4527f0c

It's literally just a one line change, but this patch should also be back-ported through version 1.4.0.

Can you point me to an example to work from? Happy to do this as it aligns with sustainment goals, and is a simple patch.

@v-dobrev
Copy link
Member Author

Here is an example Spack PR that adds a patch for MFEM v4.6: spack/spack#40495.

I think there's a way to use a commit from any repository as the patch, however, I have not used that before. Here are some examples I found:

@balay
Copy link
Member

balay commented Nov 15, 2023

reopening this issue as [email protected] CI is still broken - and its good to get that working.

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5538901463

>> 130    CMake Error at tests/unit/opflow/objective/CMakeLists.txt:42 (exago
            _add_test):
     131      Unknown CMake command "exago_add_test".
     132    
     133    
     134    -- Configuring incomplete, errors occurred!

A new issue?

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/5538901445

>> 127    CMake Error at buildsystem/cmake/ExaGOCheckPython.cmake:5 (if):
     128      if given arguments:
     129    
     130        "VERSION_LESS" "3.6"
     131    
     132      Unknown arguments specified
     133    Call Stack (most recent call first):

Is is probably from the prior suggestion exago~python so that doesn't really work? So will have to recheck the original issue and fix it?

@balay balay reopened this Nov 15, 2023
@balay
Copy link
Member

balay commented Nov 15, 2023

 131      Unknown CMake command "exago_add_test".

Looks like this is triggered by spack/spack@9f0e3c0

That change I guess works with 1.5.1 but not 1.5.0.

@cameronrutherford How do we fix this?

@cameronrutherford
Copy link

cameronrutherford commented Nov 21, 2023

Here is an example Spack PR that adds a patch for MFEM v4.6: spack/spack#40495.

I think there's a way to use a commit from any repository as the patch, however, I have not used that before. Here are some examples I found:

Okay I now understand that there are multiple ways of doing this, and interestingly enough there seems to be bugs in certain configurations? I resolved pnnl/ExaGO#71 for @pelesh by pinning libffi version here, and so if we connect the error:

==> Error: Couldn't find patch for package builtin.libffi with sha256: 070b1f3aa87f2b56f83aff38afc42157e1692bfaa580276ecdbad2048b818ed7. This usually means the patch was modified or removed. To fix this, either reconcretize or use the original package repository

With the type of patch libffi is trying to provide here:

patch(
        "https://github.com/libffi/libffi/commit/ce077e5565366171aa1b4438749b0922fce887a4.patch?full_index=1",
        sha256="070b1f3aa87f2b56f83aff38afc42157e1692bfaa580276ecdbad2048b818ed7",
        when="@3.4.3:3.4.4",
    )

It would suggest that these plaintext patch files are the most sustainable way of doing things? In this case I guess the fix is to trend towards those types of patches, but not sure what's breaking down in the issue I linked...

@cameronrutherford
Copy link

Hopefully spack/spack#41350 fixes things for all builds, even xSDK specific versions

@balay
Copy link
Member

balay commented Nov 30, 2023

@balay
Copy link
Member

balay commented Nov 30, 2023

It would suggest that these plaintext patch files are the most sustainable way of doing things?

I think a URL to a repo (project) commit is preferred over adding the patchfile to spack (if possible)

@cameronrutherford
Copy link

It would suggest that these plaintext patch files are the most sustainable way of doing things?

I think a URL to a repo (project) commit is preferred over adding the patchfile to spack (if possible)

So I would be making branches that contain new commits based on each tag that apply the same patch? If this is a patch that I am happy applying to all versions regardless, should I instead maybe just apply the commit here to all the branches and create new tags and commit shas?

@balay
Copy link
Member

balay commented Nov 30, 2023

Well - if you are supporting all these versions - you might have "release" branches [for bug fixes] - for each of these versions. And the corresponding patches might be there.

But yeah - creating these commits/branches just for spack support might not be worth it.

It would be nice if a single patch file can apply cleanly to multiple pkg versions [but yeah - this doesn't always pan out]

So can wait and see if any reviewer objects to the current (patches in spack) approach.

@balay
Copy link
Member

balay commented Nov 30, 2023

BTW: I see:

balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.0.patch var/spack/repos/builtin/packages/exago/exago-1.1.1.patch
2c2
< index d0e8ed18..3225509c 100644
---
> index 9d3d871b..f682e2a5 100644
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.1.patch var/spack/repos/builtin/packages/exago/exago-1.1.2.patch
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ 

So you can use a single patch file for these 3 versions.

2c2
< index ba8e145f..89dd3327 100644
---
> index 1fcfbc97..0d055bee 100644
19c19
< @@ -392,15 +394,17 @@ else()
---
> @@ -405,15 +407,17 @@ else()

So these could be collapsed if spack allowed patches that apply with "fuzz" [I'm not sure if this can be enabled - if so the duplicates can be removed]

If this were feasible - it would reduce the patches to only 3 files [or 3 commits in repo branch].

@cameronrutherford
Copy link

BTW: I see:

balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.0.patch var/spack/repos/builtin/packages/exago/exago-1.1.1.patch
2c2
< index d0e8ed18..3225509c 100644
---
> index 9d3d871b..f682e2a5 100644
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ diff var/spack/repos/builtin/packages/exago/exago-1.1.1.patch var/spack/repos/builtin/packages/exago/exago-1.1.2.patch
balay@p1 /home/balay/git-repo/github/spack-xsdk (balay/test =)
$ 

So you can use a single patch file for these 3 versions.

2c2
< index ba8e145f..89dd3327 100644
---
> index 1fcfbc97..0d055bee 100644
19c19
< @@ -392,15 +394,17 @@ else()
---
> @@ -405,15 +407,17 @@ else()

So these could be collapsed if spack allowed patches that apply with "fuzz" [I'm not sure if this can be enabled - if so the duplicates can be removed]

If this were feasible - it would reduce the patches to only 3 files [or 3 commits in repo branch].

I also had the same question about duplicates and whether they can be removed. I decided on separate patches in this case to ensure functionality, but each patch is basically identical in quite a few ways...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants