Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun doesn't work for np > 1 #191

Open
LydiaMak opened this issue Jun 14, 2021 · 10 comments
Open

mpirun doesn't work for np > 1 #191

LydiaMak opened this issue Jun 14, 2021 · 10 comments

Comments

@LydiaMak
Copy link

Hi,

When running mpirun -np 2 (or higher) mosfit {json_file} -m {model} it crashes. I have mentioned it to Matt in private communication but since the issue still exists I thought to open an issue just in case someone have come across a similar issue. The error I get is:

Traceback (most recent call last):
File "/Users/lydiamakrygianni/opt/miniconda3/lib/python3.8/site-packages/schwimmbad/mpi.py", line 72, in init
self.wait()
File "/Users/lydiamakrygianni/opt/miniconda3/lib/python3.8/site-packages/schwimmbad/mpi.py", line 122, in wait
func, arg = task
ValueError: not enough values to unpack (expected 2, got 1)
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1

for as many processes with rank>0.

It's been run on various machines and I have also tried the example but it also doesn't work. I use python 3.8.

If anyone got any idea (or can confirm example works for them), it would be quite useful for some very long runs!

Cheers,

Lydia

@wakatara
Copy link

I also have the precise same problem on OSX, and several flavours of Linux.

@bmockler
Copy link
Collaborator

bmockler commented Aug 16, 2021 via email

@ZhihaoChen5
Copy link

Hi all, it looks like there might be a bug in schwimmbad which is causing incompatibility with newer versions of mpi4py: https://githubmemory.com/repo/adrn/schwimmbad/issues I get the same issue when I am using schwimmbad v0.3.1 and mpi4py v3.0.3. When i downgrade to schwimmbad v0.3.0, it works for me. ~Brenna

On Aug 16, 2021, at 8:00 AM, Daryl Manning @.***> wrote:  I also have the precise same problem on OSX, and several flavours of Linux. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Agreed! If v0.3.1 doesn't work, please try v0.3.0. At least it works for me.

@dnfarias
Copy link

Hi all,

I'm facing the same issue on OSX Monterey with python3.9 (anaconda env). Can you confirm the working python version, please? (I'll try to use a fresh new environment). Thanks!

@ZhihaoChen5
Copy link

Hi all,

I'm facing the same issue on OSX Monterey with python3.9 (anaconda env). Can you confirm the working python version, please? (I'll try to use a fresh new environment). Thanks!

Python 3.6.8 MOSFiT v1.1.7 and schwimmbad 0.3.0 for me.

@dnfarias
Copy link

Dear @hatter5,

I can confirm that it's working for me :-) Many thanks!

@pkgw
Copy link
Contributor

pkgw commented Aug 23, 2024

See also this issue on the schwimmbad side: adrn/schwimmbad#32 (comment)

Adrian's explanation is that basically MOSFiT shouldn't be using the pool.wait() function directly — it sounds like schwimmbad users shouldn't be accessing its MPI pool directly.

@mnicholl
Copy link
Collaborator

mnicholl commented Aug 28, 2024 via email

@pkgw
Copy link
Contributor

pkgw commented Aug 28, 2024

No, I completed the Conda update last week or so. It looks like this issue has been around for several years at this point, so it's clearly not a showstopper for a lot of people.

Based on Adrian's advice it sounds like the relevant code should be redesigned to either use only supported schwimmbad interfaces, or to use some other mechanism for inter-process communication. I don't have a sense of how dirty of a hack the current code is — maybe it's just a matter of using some of the MPI libraries more directly, or maybe this is exposing a bigger architectural issue that needs addressing. Someone's going to need to sit down and understand the current code, and research solutions.

@dnfarias
Copy link

dnfarias commented Aug 28, 2024

Hi all!,

I just want to say that @pkgw is totally right. The latest version of schwimmbad is incompatible with the current version of MOSFiT. However, I can run MOSFiT in a cluster with mpirun under this configuration: Python 3.11.0, MOSFiT 1.1.9, schwimmbad 0.3.0.

I'm pretty sure it wouldn't be so difficult to make MOSFiT compatible with schwimmbad, but two years ago, I just left it like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants