Skip to content

WeeklyTelcon_20180508

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Dan Topa (LANL)
  • Jeff Squyres
  • Akvenkatesh
  • Brian
  • Joshua Ladd
  • Howard Pritchard
  • Nathan Hjelm
  • Ralph
  • Todd Kordenbrock
  • Josh Hursey
  • Xin Zhao

not there today (I keep this for easy cut-n-paste for future notes)

  • David Bernholdt
  • Geoffroy Vallee

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.4

  • v2.1.4 - Targeting Oct 15th,
  • lower priority to v3.0 and v3.1
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • Schedule:
    • Quick turnaround on this, Shooting for May 1st.
  • Going to try to post v3.0.2 RC tomorrow
    • A few PRs ready to go, and a few that need review.

Review v3.1.x Milestones v3.1.0

  • Shipped v3.1.0 last night.
  • Long outstanding list of PR for v3.1.x branch.
    • 4 or 5 need review. one is Geoff tagged for review. (done)
    • will hold of about a week in case we need to do a quick turn-around oops release.
  • Schedule
    • Like to hold to a month turn around.
    • in two weeks will cut release canidate
  • For v3.1.1 - Want to get UCX OSC at a higher priority.
    • Issue 5048
    • looking pretty good. Howard brought up some issues on single node with xpmem.
      • UCX bug.
      • Xin will rerun and see where we stand.
    • xpmem can be disabled via env var.
    • Issue with Connect-X3 attomic support. UCX limitation.
      • For v3.1.1 Some want fallbacks, or Errors, but don't segv.
    • For v4.0
      • Mellanox planning to do emulation on CPU if IB card can't do HCA attomics.
      • Still need a check in OMPI, incase they're running with old UCX.

v4.0.0

  • As a heads up ULFM support may require PMIx v3.0
  • Schedule: mid-July branch. mid-Sept relelase.
  • Moving to drop openib (except for iWarp)
    • More support for setting priority of openib to 0, and copying Open IB to iWarp and only use for iWarp.
    • Rename openib BTL to iWarp - because if UCX is prefered way for Mellanox, go whole hog on Mellanox.
      • And change logic so that it only works on iwarp devices.
      • If someone (Broadcom or Chelcio?) steps up for MTT testing.
      • Jeff will get contact info to v4.0 release managers who will reach out to iWarp providers to request MTT testing.
  • MPI Standards removal for MPI removed items in Open MPI v4.0
    • Nathan sent out email about PR 5127 - to remove all MPI2.x standard items.
    • A little weird to be able to pull back MPI1 removed items.
      • Lets remove these too at the same time.
    • C++ bindings are seperate pull request. PR 5128 Goal is to have these removed as well.
      • Sent poll out - 12 of 41 responses. 1 of 12 has said they're using C++ binding, but not sure if it's accurate.
      • If C++ bindings are sufficently isolated we could move to a seperte repo.
      • But if no one is really using, lets just remove it all.
  • Lets turn off more building by default.
    • Forum didn't REMOVE everything that was deprecated in MPI v3.0 standard.

PMIx

  • PMIx v3.0 - finishing up debugger support
    • Open MPI needs someone to step up to pull in PMIx v3.0 changes
  • PRTE is not trying to be a production quality runtime.
    • It's a prototype layer.
  • Start with Open MPI having some runtime people.
    • If you go to PMIx v3.0 - need to at least update the ORTE/PMIX directory
    • Biggest thing would be for DVM mode, additional changes to be made to keep that work.
  • Open MPI needs a runtime plan
  • Open MPI needs runtime developers.
  • Contention because code-bases are divering?
    • Two seperate projects. In PRTE, no requirement
    • PRTE used to be ORTE (parts of opal) used by PMIx as a refrence library
  • Possible options are:
    • Continue to maintaining ORTE - but someone has to maintain it.
    • Or move to PRTE (with slightly different goals) - but need developers for this as well.
  • Scoping of move inside of ORTE to move to PMIx v3.0 is significantly smaller than move to PMIx v2.0
    • Earlier (orte changes needed to move from PMIx v1.2 -> PMIx v2.0) was quite signficant.
      • Most of code changes in ORTE server area.
    • If you want to take advantage of PMIx Tooling, some changes in MPI-IO area.
  • If we had a statement of work, we could SWAG an estiamate of work, it could help request resources.
  • Unfortunately we don't have anyone on the team that could step up.
  • PMIx said that in Open MPI v4.0 we were going to put a deprecation warning in the news.
    • ORTE has bit rotted a fair amount relative to PRRTE.
    • Some issues found in ORTE
  • right now Open MPI's only Debugger interface is MPIX.
    • To upgrade to PMIx debugger interface, ORTE would need to upgrade to PMIx v3.0 + some ORTE changes.

Review Master Master Pull Requests

  • Last week: OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
    • Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.
    • Xin should be able to push first version of OSHMEM v1.4 changes to master next week or so.

Other topics

MTT / Jenkins Testing Dev

  • Got compiler licenses for NAG compiler, and Absoft
    • Both Fortran
  • Get copy of perl JSON, and put it on MTT.

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally