Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun 5.0.3 has bug on parse shell args while 4.1.6 works well. #12623

Open
ladyrick opened this issue Jun 15, 2024 · 12 comments
Open

mpirun 5.0.3 has bug on parse shell args while 4.1.6 works well. #12623

ladyrick opened this issue Jun 15, 2024 · 12 comments

Comments

@ladyrick
Copy link

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from source tarball: https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz

Please describe the system on which you are running

  • Operating system/version: not related
  • Computer hardware: not related
  • Network type: not related

Details of the problem

mpirun 5.0.3 has bug when parse command line args. But mpirun 4.1.6 works well.

The command is bash -c '"$@"' '' echo '1 2 3'. It should output 1 2 3 with three spaces between each numbers.

image
@rhc54
Copy link
Contributor

rhc54 commented Jun 15, 2024

FWIW, the cmd line parser is fine. The problem is that the shell removes your quotes when passing the cmd line into mpirun. In version v4.x, we added quotes around every single argv before executing it. We didn't do that in v5.

I'm experimenting with adding the quotes again to see what problems (if any) it creates. IIRC, there were times when the quotes raised issues, but perhaps my memory has faded.

@ladyrick
Copy link
Author

ladyrick commented Jun 15, 2024 via email

@rhc54
Copy link
Contributor

rhc54 commented Jun 15, 2024

Well, that's an opinion and may be a fair one - but still just one opinion.

My point was only that this has nothing to do with the cmd parser - it is accurately parsing what the shell gave us. The cmd mpirun is executing is accurate:

argv[0]:   bash
argv[1]:   -c
argv[2]:   $@
argv[3]:     
argv[4]:   echo
argv[5]:   1   2   3

The problem is that echo takes that argv and echo's the characters one at a time, ignoring the spaces between them. This is an accurate reflection of the real behavior:

$ echo 1   2    3
1 2 3

So in that sense, mpirun is in fact generating the same output as the bare code would have given.

To get your behavior, I have to do:

$ echo '1   2    3'
1   2    3

which is not what the shell gave us.

The questions therefore are: should we always assume quotes for all argv strings, and what issues does that create? Or is there another way one could format your cmd line so that the shell actually gives the quotes to mpirun?

@rhc54
Copy link
Contributor

rhc54 commented Jun 15, 2024

Fiddling around a bit, I can get mpirun to replicate your desired behavior...but only if I add quotes around the $@ argv element and nothing else. Adding quotes to any other argv results in either an error (e.g., if putting quotes around bash) or the incorrect output (e.g., if putting quotes around the 1 2 3 element, which causes echo to include the quotes in its output).

We probably need to look more closely at what v4.1 is doing - could be you are just getting lucky over there. This increasingly feels like you may not have the right quoting around your cmd line elements.

@ggouaillardet
Copy link
Contributor

my 0.02US$

no Open MPI involved:

$ strace -f -e execve bash -c '"$@"' '' echo '1  2  3'
execve("/usr/bin/bash", ["bash", "-c", "\"$@\"", "", "echo", "1  2  3"], 0xffffffffdb10 /* 118 vars */) = 0
1  2  3
+++ exited with 0 +++

With Open MPI 4.1.6

$ strace -f -e execve ~/local/openmpi-4.1.6/bin/mpirun -np 1 bash -c '"$@"' '' echo '1  2  3'
execve("/home/rist/r00018/local/openmpi-4.1.6/bin/mpirun", ["/home/rist/r00018/local/openmpi-"..., "-np", "1", "bash", "-c", "\"$@\"", "", "echo", "1  2  3"], 0xffffffffdac8 /* 118 vars */) = 0
[pid 490291] execve("/usr/bin/bash", ["bash", "-c", "\"$@\"", "", "echo", "1  2  3"], 0x62fad0 /* 168 vars */) = 0
1  2  3

With Open MPI main

$ strace -f -e execve ~/local/ompi/bin/mpirun -np 1 bash -c '"$@"' '' echo '1  2  3'
execve("/home/rist/r00018/local/ompi/bin/mpirun", ["/home/rist/r00018/local/ompi/bin"..., "-np", "1", "bash", "-c", "\"$@\"", "", "echo", "1  2  3"], 0xffffffffdad8 /* 118 vars */) = 0
execve("/home/rist/r00018/local/ompi/bin/prterun", ["/home/rist/r00018/local/ompi/bin"..., "-np", "1", "bash", "-c", "\"$@\"", "", "echo", "1  2  3"], 0x4a0520 /* 123 vars */) = 0
[pid 490301] execve("/usr/bin/bash", ["bash", "-c", "$@", "", "echo", "1  2  3"], 0x58e800 /* 156 vars */) = 0
1 2 3

I note that execve("/usr/bin/bash", ...) is invoked with the exact same parameters if bash is invoked directly or via Open MPI 4.1.6 mpirun.

However, with the main branch, \"$@\" has been replaced (i guess by prrte) with \"$@\"
At first glance, this suggests the surrounding quotes have been removed, but if I read your comment correctly, they have been removed (as we used to) but they were never added back.

@ggouaillardet
Copy link
Contributor

@rhc54 from prte.c main()

    pargv = pmix_argv_copy_strip(argv); // strip any incoming quoted arguments

what is the rationale for stripping quoted arguments?

also, pmix_argv_copy_strip() assumes the last element of argv is NULL.
Are you positive this is the case forargv here (e.g. the second argument of main)?
I would think that is not the case and we need to use argc to know where argv ends.

@rhc54
Copy link
Contributor

rhc54 commented Jun 16, 2024

what is the rationale for stripping quoted arguments?

We've had problems in the past with quoted values as arguments to params - e.g., MCA params. However, I have no problem with just telling people in those scenarios to fix their cmd line. As I said above, it isn't clear to me that there is a perfect answer here. Whether it is mpirun or a simple bash script launcher, there is always a problem with quotes passing thru layers of indirection.

also, pmix_argv_copy_strip() assumes the last element of argv is NULL.
Are you positive this is the case forargv here (e.g. the second argument of main)?

Last I checked, we always assume argv is NULL terminated across the code base - AFAIK, that's part of the C standard. FWIW, this is what I found when searching for it:



Yes. The non-null pointers in the argv array point to C strings, which are by definition null terminated.

The C Language Standard simply states that the array members "shall contain pointers to strings"
(C99 §5.1.2.2.1/2). A string is "a contiguous sequence of characters terminated by and including
the first null character" (C99 §7.1.1/1), that is, they are null terminated by definition.

Further, the array element at argv[argc] is a null pointer, so the array itself is also, in a sense,
"null terminated."

If that isn't true, then there are a lot of places in PMIx, PRRTE, and OMPI that are impacted...and have been for 20+ years. 😄

@rhc54
Copy link
Contributor

rhc54 commented Jun 16, 2024

See openpmix/openpmix#3353 and openpmix/prrte#1984. As I said, we'll try it this way for now and see what problems we encounter.

@ggouaillardet
Copy link
Contributor

That's embarrassing, but this is something I have been ignoring for the past 20+ years, and every single time I wrote a parser I wished I could simply pass argv to execv()...
Well, better late than never :-)

Thanks for the pointer!

@wenduwan
Copy link
Contributor

@rhc54 Thanks for the quick fix. It would be great if we can get new prrte 3.0.x and pmix 5.0.x releases in the 1st week of July - we plan to release ompi 5.0.4 in the week following.

@rhc54
Copy link
Contributor

rhc54 commented Jun 17, 2024

I can do that, but first I'd need to see updates in OMPI so I can check MTT. Otherwise, I can't tell if the release branches are in a "good" state. Can someone "pull" the branches?

@wenduwan
Copy link
Contributor

@rhc54 Sure thing. I opened #12626

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants