-
Notifications
You must be signed in to change notification settings - Fork 861
Meeting 2018 09
Tomislav Janjusic edited this page Jan 6, 2024
·
1 revision
(yes, the "2018 09" title on this wiki page is a bit of a lie -- cope)
- 9am, Tuesday, Oct 16, 2018 - noonish Thursday, Oct 18, 2018.
- Cisco buildings 2 and 3 (right next to each other), San Jose, CA, USA.
- Tuesday: Cisco building 2 (Google Maps link)
- NOTE 1: The Tuesday meeting is immediately after the weekly Webex. You're welcome to show up after 7:30am US Pacific to be in the Cisco conference room for the weekly Webex.
- NOTE 2: There is no Lobby Ambassador in Cisco building 2. You need to iMessage or SMS text Jeff Squyres, and I'll come escort you to the meeting room.
- Wednesday, Thursday: Cisco building 3 (Google Maps link)
- There is a Lobby Ambassador in Cisco building 3; they will alert me when you arrive (but iMessaging or SMS texting me wouldn't hurt, either).
- Tuesday: Cisco building 2 (Google Maps link)
Please put your name on this list if you plan to attend any/all of the meeting so that Jeff can register you for a Cisco guest badge+wifi.
- Jeff Squyres, Cisco
- George Bosilca, UTK
- Arm Thananon Patinyasakdikul, UTK
- Andrew Friedley, Intel
- Geoff Paulsen, IBM
- Howard Pritchard, LANL
- Edgar Gabriel, UH
- Shinji Sumimoto, Fujitsu
- Thomas Naughton, ORNL
- Matias Cabral, Intel (16th only)
- Neil Spruit , Intel (16th only)
- Brian Barrett, Amazon (16th only)
- Akshay Venkatesh, NVIDIA
- Xin Zhao, Mellanox
- Artem Polyakov, Mellanox
- Nathan/Brian: Vader bug cleanups
- Want to strengthen the recent vader fixes to be fully bulletproof
- OFI (Libfabric):
- OFI Presentation
- Scalable endpoints support in MTL and BTL
- Registering specialized communication functions based on provider capabilities
- m4 generated C code to avoid code duplication?
- Discussion: OFI components to set their priority based on the provider found.
- OFI Common module creation.
- PR 5241: Add MCA param for multithread opal_progress() (George, Arm)
- Multithreading stuff (George, Arm)
- r2 / BTLs are initialized even when they are not used (Jeff)
- 4.0.x status / roadmap
- TCP bric-a-brac:
- Discuss TCP multilink support. What is possible, what we want to do and how can we do it.
- Discuss TCP multiple IP on the same interface. What we want to do, and how we plan to do it.
- TCP BTL progress thread. Does 1 IP interface vs. >1 IP interface matter?
- Should we limit the number of C compilers that can be used to compile the OMPI core (e.g., limit the amount of assembly/atomic stuff we need to support).
- E.g., PGI doesn't give us the guarantees we need
- Probably need to add some extra wrapper glue: e.g., compile OMPI core with C compiler X and use C compiler Y in
mpicc
.- Are there any implications for Fortran? Probably not, but Jeff worries that there may be some assumption(s) about LDFLAGS/LIBS (and/or other things?) such that: "if it works for the C compiler, it works for the Fortran compiler".
- PMIx as "first class" citizen?
- Shall we remove the OPAL pmix framework and directly call PMIx functions?
- Require all with non-PMIx environments to provide a plugin that implements PMIx functions with their non-PMIx library
- In other words, invert the current approach that abstracted all PMIx-related interfaces
- Shall we remove the OPAL pmix framework and directly call PMIx functions?
- ORTE support model
- See https://docs.google.com/document/d/1VwqUVAhkeJt7PmaQCBQpUXBYdy9Elg5m-TBzY-6lLQY/edit
- Should we remove ORTE from the Open MPI repo and make it a separate project?
- How do we resolve the "one package" philosophy we have embraced from day one?
- Should we publish Open MPI release tarballs to https://github.com/open-mpi/ompi/releases?
- Per https://github.com/open-mpi/ompi/issues/5604, I posted a bunch of "Official releases aren't here..." on the github releases page.
- Remove orte-dvm and redirect users to PRRTE? (Ralph)
- public ompi-tests repository for easier sharing of testsuites among collaborators (Edgar)
- Ralph+Jeff: discuss PMIx compatibility issues and how to communicate them
- https://pmix.org/support/faq/how-does-pmix-work-with-containers/ addresses (some of) PMIx-to-PMIx compatibility issues.
- But what about OMPI to PMIx compatibility?
- And what about RM to PMIx compatibility?
- How do we convey what this multi-dimensional variable space means to users in terms of delivered MPI features?
- Case in point: https://github.com/open-mpi/ompi/issues/5260#issuecomment-421407400 (OMPI v3.0.x used with external PMIx 1.2.5, which resulted in some OMPI features not working).
- Discuss memory utilization/scalability (ThomasN)
- Mellanox/Xin: Performance optimization on OMPI/OSC/UCX multithreading
- Fujitsu's status
- Fujitsu MPI for Post-K Computer
- Development Status in Fujitsu Going ARM
- QA Activity in Fujitsu
- 5.0.x roadmap
- PMIx Roadmap
- Review of v3 and v4 features
- Outline changes in OMPI required to support them
- Outline changes for minimizing footprint (modex pointers instead of copies)
- Decide which features OMPI wants to use
- Mail service (mailman, etc.) discussion - here are the lists we could consolidate down to:
- OMPI core
- OMPI devel/packagers
- OMPI users/announce
- OMPI commits
- HWLOC commits
- HWLOC devel/users/announce
- MTT users/devel
- Do we want to move to a commercial hosting site? Would cost about $21/month, or about $250/year
- Additionally: The Mail Archive is going through some changes. Unclear yet as to whether this will impact us or not
- As of 4 Sep 2018, Open MPI has $575 in our account at SPI.
- Debugger transition from MPIR to PMIx
- How to orchestrate it?
- ABI-changing commit on master (after v4.0.x branch) which will affect future v5.0.x branch: https://github.com/open-mpi/ompi/commit/11ab621555876e3f116d65f954a6fe184ff9d522.
- Do we want to keep it? It's a minor update / could easily be deferred.
- Need vendors to reply to their issues on the github issue tracker
- openib: persistent error reported by multiple users
- https://github.com/open-mpi/ompi/issues/5914
- Feels like this should be easy to fix...?