Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation faults with multiple aed modules #11

Open
galenholt opened this issue Aug 24, 2019 · 8 comments
Open

segmentation faults with multiple aed modules #11

galenholt opened this issue Aug 24, 2019 · 8 comments

Comments

@galenholt
Copy link

I've posted this on the AEMON forum as well, but thought I'd put it up here too:
When I try to run aed modules beyond tracer and noncohesive, I'm getting seg faults, although if I take those modules out, I can (sometimes) run a few more modules. This issue is occurring on multiple systems:
I have GLM 3.0.1 running on a Unix cluster, and Ubuntu in Windows subsystem for Linux, and 3.0.0 on a Mac. The two Unix versions are just running the package installed from apt-get, which provides glm built with gcc 7.4.0, and libaed version 1.3.1 built with gfortran 8.3.0. The mac is running the downloaded package from the AED website, and provides glm built with gcc 4.2.1 and libaed2 version 1.3.0 built with gfortran 8.3.0.
I have also re-built GLM from source using AED_tools on Ubuntu, which bumped it to 3.0.2, but didn’t affect the issue.

Here's some more detail about what runs and what doesn't (I've been primarily testing with Kinneret, but I'm getting the same issues with my lake as well):

Runs
Tracer and noncohesive
Oxygen alone (sometimes have to try a few times if it failed previously)
Oxygen, carbon, silica
Oxygen, carbon, silica, nitrogen (Mac and cluster ONLY, and sometimes have to try a few times, never runs on Ubuntu in WSL)
Fails
Tracer, noncohesive, and oxygen
Tracer and oxygen
Noncohesive and oxygen
Oxygen, carbon, silica, nitrogen (always fails on Ubuntu in WSL, sometimes on Mac and Unix cluster)
Adding phosphorus to O, C, S, and N breaks, but not with a seg fault, (it needs noncohesive)
Flipping tracer or noncohesive to after o2 fails, but not with a seg fault (they need to be first)

My fortran is far too weak to start figuring out where the modules are stomping on each other, but hopefully the above lists of combinations help narrow it down. There was a suggestion on AEMON to compile with Intel, so I'll work on giving that a shot.

@galenholt
Copy link
Author

To respond to myself, recompiling with Intel fortran does seem to fix it, for what that's worth

@f-baerenbold
Copy link

I might have the same issue. I coupled AED2 to our own physical model "Simstrat", which worked fine until I recently updated the AED2 library to 1.3.1. Now, I get a segfault when I call "aed2_calculate_surface". I don't get the segfault when I just have tracer and noncohesive, which makes sense as they don't have atmospheric fluxes.

I use gfortran 8.1 on Windows 10.

@galenholt
Copy link
Author

That's too bad, I was just thinking I might try rolling my gfortran back to 8.1 from 8.3. While everything runs fine compiled with intel, each individual run seems to depend on having the intel compiler (it looks for the libifcore.so.5 library every time I run unless I set compilervars.sh intel64). Which is fine, until my free trial of intel runs out. If anyone else is out there using the intel compiler, have you run into this and found a workaround, or did you have to purchase an intel license to continue running simulations?

@matthipsey
Copy link
Contributor

Hi All
Thank you very much fro flagging these issues - we had not seem them as we use intel mostly.
We have checked-in a small fix, that hopefully will make it better ... can you try now?
Cheers
Matt

@f-baerenbold
Copy link

Dear Matt,
This solves the issue for me! All modules working fine now. Thanks a lot for the prompt help!

@galenholt
Copy link
Author

Hi Matt,

Agreed, thanks for the prompt help, that does seem to have fixed the segfault issue. It works for me with all modules with Kinneret example. However, I'm still having compiler-associated issues with the lake we're working on. It's no longer a segfault, but appears to be an infinite loop: Simulation begins... and then just a flashing cursor. It does run when compiled with ifort. I'll do some digging and see if I can isolate what's causing it.

@robertladwig
Copy link

The fix really helped a lot. My gfort compiled model is now running with all AED2 modules up to zooplankton.
Still occasionally I am getting segmentation faults:
(1) Promptly after the start: Segmentation fault: 110.38% of days complete
(2) or after finishing the model: glm(1746,0x10ca115c0) malloc: *** error for object 0xbea36b06e70b7421: pointer being freed was not allocated glm(1746,0x10ca115c0) malloc: *** set a breakpoint in malloc_error_break to debug Abort trap: 6

@matthipsey
Copy link
Contributor

matthipsey commented Oct 2, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants