-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20191217
Geoffrey Paulsen edited this page Dec 17, 2019
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoffrey Paulsen (IBM)
- Noah Evans (Sandia)
- Brian Barrett (AWS)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Josh Hursey (IBM)
- Michael Heinz (Intel)
- Ralph Castain (Intel)
- Todd Kordenbrock (Sandia)
- William Zhang (AWS)
- Jeff Squyres (Cisco)
- George Bosilca (UTK)
- Thomas Naughton (ORNL)
- Artem Polyakov (Mellanox)
- David Bernhold (ORNL)
- Austen Lauria (IBM)
- Brendan Cunningham (Intel)
- Akshay Venkatesh (NVIDIA)
- Edgar Gabriel (UH)
- Matthew Dosanjh (Sandia)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- Erik Zeiske
- Joshua Ladd (Mellanox)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Nathan Hjelm (Google)
- Xin Zhao (Mellanox)
- mohan (AWS)
- Nothing specific for Open MPI
- Big counts hit some issues, not sure if it's going in.
- Targeting 2020 Super Computing official v4.0
- Jeff Squyres working on python layer for Standard to generate bindings
- Will generate
Blockers All Open Blockers
Review v3.0.x Milestones v3.0.4
Review v3.1.x Milestones v3.1.4
- 3.0.5 and 3.1.5 have shipped
- Planning for no new fixes on 3.x, unless super critical
- BUT, looks like something was messed up with 3.1.5, not sure about 3.0.x branch
- Brian will read up on the issue and see if we need to release to address.
- May be just an issue with Fedora / RHEL 7.8 that we don't see it on earlier RHEL.
- Issue 7212: Private GlibC reference
- Why not just revert the original commit?
- If the commit doesn't work on all systems, just remove it until we can.
- on v3.0 as well.
- May be an issue in UCX.
Review v4.0.x Milestones v4.0.3
- v4.0.3 in the works.
- Schedule: End of january.
- There's a problem in Open MPI v4.0.2, that packagers will hit in UCX 1.7
- PR 1752 may drive an earlier release in case if UCX will be released sooner.
- PR 7116
- Ensure no backwards compat issues?
- Howard will send email to ARM.
- PR 7149 - Geoff go look at.
- PRs currently open, either need reviews or
- PR 7229 - Sandia is reviewing.
-
A few new enhancements desirable.
-
Added a Target v4.1.x label
- Many new enhancements / features would be useful
- 7151 - This is indeed a performance enhancement.
- 7173
- Should look into amount of work back-porting features to a release branch.
- It would be a major thing. But always say we don't take features into release branch thats out there.
- people continue to open PRs with features.
- Two issues:
- One - we've really stalled out v5.0.0
- Two - are performance features really an issue to pull in?
- PR 7151 - seems to be boarderline bugfix / feature / risky
-
PR 7151 - enhancement -
-
Recommend against slipping PRRTE to a v6.0… many people ex
- JSM Direct launch - could take as a patch in SPACK
-
Consensus to NOT do a v4.1.0 release
- Geoff will put a message on those PRs and then close them later this week.
-
PRRTE effort.
- Got PRTE integrated into OMPI.
- Got ORTE out
- Got OMPI calling PMIx Directly
- Need to write mpirun - possibly one
- Want to discuss HOW to bring this in to minimize disruption.
- Questions: PVars and Tvars - currently apply to both MPI and Runtime layers.
- Works because runtime calls orte_init.
- PMIx doesn't have pvar / tvar support.
- ORTE today uses mca system. Not aware of explicitly using pvars/tvars.
- Today ORTE implicitly creates pvars/tvars for each mca parameters.
- mpirun will be a link to prun(PRRTE):
- can attach to an existing system and request a new job. (Direct launch method)
- can start it's own system.
- We want the option from mpirun to do either.
- Which way do we want to default this?
- If you run
-host
, but existing DVM isn't on all those hosts... then launch gets delayed.
- How to discover? Rondevue file gets dropped... look in locations, and find one you have permissions to read.
- Currently haven't set a minimum PMIx version, but might want to set minimum PMIx v4.0
- Ton of unprefixed symbols being spit out by MPI.
- OMPI, OPAL, ORTE that's ours.
- Everything that starts with MCA are in there as public symbols.
- Problem is if Another library reuses the mca system you hit this.
- Domain frameworks - adding mca components to a list for autoclosure, but sequencing of closing needs to be very specific.
- Want to strip out as it's causing problems.
- Might need this for sessions
- Questions: PVars and Tvars - currently apply to both MPI and Runtime layers.
- Schedule: April 2020?
- It's official! Portland Oregon, Feb 17, 2020.
- Safe to begin booking travel now.
- Please register on Wiki page, since Jeff has to register you.
- Date looks good. Feb 17th right before MPI Forum
- 2pm monday, and maybe most of Tuesday
- Cisco has a portland facility and is happy to host.
- about 20-30 min drive from MPI Forum, will probably need a car.
Review Master Master Pull Requests
- IBM's PGI test has NEVER worked. Is it a real issue or local to IBM.
- Austen is looking into
- Absoft 32bit fortran failures.
- No discussion this week.
- See older weekday notes for prior items.
- No discussion this week.