-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mtl/ofi: call to fi_domain fails on Crusher/Frontier #12038
Comments
are you doing this yourself or using the install done by @naughtont3 ? |
I am building OMPI myself using the Cray-provided libfabric module. I do not see any pre-installed Open MPI modules to look for MCA params on either machine. What parameters should I set or where can I look for them? |
Here's what I set in my shell on crusher:
|
Note for some reason slurm on crusher doesn't support pmix. so no srun direct launch support with open mpi 5 and main on that system. |
Thanks @hppritcha, setting those two variables did the trick for me 👍 is there a way to detect that automatically so that other users (and future me) don't have to bother setting them? |
I was going with the creation of a mca param file via, perhaps, the platform file approach that would install the prte mca params file (forgot the exact name) with these params set. At a minimum I guess this is a docs issue. We'll treat this as a docs issue for now. |
#12150 is relevant to this discussion. |
and also make a statement about the OFI BTL more accurate. Related to open-mpi#12038 Signed-off-by: Howard Pritchard <[email protected]>
and also make a statement about the OFI BTL more accurate. Related to open-mpi#12038 Signed-off-by: Howard Pritchard <[email protected]>
and also make a statement about the OFI BTL more accurate. Related to open-mpi#12038 Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit 2718732)
Background information
I am trying to run Open MPI 5.0 on Crusher/Frontier but I get the following error during
MPI_Init
:What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Open MPI 5.0 from the release tarball.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
On Crusher I run configure:
Please describe the system on which you are running
libfabric version: 1.15.2.0 (default module)
Details of the problem
Running OSU benchmark built against this installation on Crusher:
If I run with
FI_LOG_LEVEL=Debug
I get a couple of lines like this:and
and
Not sure if that helps and if that is the right thing to look for. I can post the full log if necessary.
Is there any way to get OMPI 5.0 working with this libfabric?
The text was updated successfully, but these errors were encountered: