Skip to content

WeeklyTelcon_20180327

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • oops, forgot to gather this week, but had a good quarum.
  • Ralph not here.

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.3

  • v2.1.4 - Oct 15th,
    • Merged in a bunch of stuff.
    • One-sided multithreaded bugs that came up.
      • Doesn't feel like it's worth it to fix in v2.1.x, so instead pulled configurey changes from v2.0 to v2.1.x
  • Relelase v2.1.3 - Saint Patty's day release.

Review v3.0.x Milestones v3.0.1

  • Merged late minute fix from IBM. PR4955
    • IBM's MTT looked much better last night.
    • write memory barrier in vader.
    • PR4977 causes corruption. Nathan just discovered, will PR today, possibly release tomorrow.

Review v3.1.x Milestones v3.1.0

  • Brian got no useful feedback, so not interested in releasing.
    • some appear to be getting a segfault.
  • PR4977 causes corruption. Nathan just discovered, will PR today, possibly release tomorrow.
  • Cisco is seeing a number of Spawn issues on v3.1.x in MTT.
    • Most of these were oversubscription issues, needed ini changes
    • Still seeing another 100 or so more failures, he expects the same. Possibly new regressions.
    • cisco will look at and file issues tonight.
  • Nathan says looking good, except for PR4977.
  • Josh Ladd says it looks good as well.

Review Master Master Pull Requests

  • Looks low on MTT results, still fighting with MTT download problem.
    • downloads resolved, shut off old download.
  • OSC compiler failure (gcc v4.1.2) at Absoft - 32bit build.
    • Nathan will look and see. fixed in PR4941 just mergin now.
    • believe he fixed this, but Absoft MTT failed.
    • they may need new perl/json module
    • Jeff is contacting Absoft about MTT
  • Gave up on clean warning build on 32bit builds.
  • Open MPI cleanup PR4953 - what do we want to delete from Open MPI v4.0
    • C++ bindings (have been deleted from the spec for 10 years)
    • Some are trivially yes - MPI_Address - trivial to replace.
    • Should start asking the users the question
    • Some people wont know and will just say they want to be concervative and gives us false positives if users are using.
    • PR failing for real reasons, need to understand.
    • likes that we dont use deprecated functions internally.
    • Some tests use deprecatd funcs, need to take a sweep there.
    • a macro to make them not appear in mpi.h
    • Don't compile mpiC and friends.
  1. Is anyone USING the C++ bindings (packagers probably turn it on by default) - but could give data on what libs use it.
    • Not mutually exclusive. disable and 12 months later delete.
    • Put this on the FAQ - this is what was deleted, and here's how you change your code.
      • LB and UB, Address, C++ bindings.
  2. Here's a list of things that we no longer will build by default.
  • What about talking to mpich to do the samething?
    • All be in the same room in June.
    • Nathan could bring this message to MPICH.

Other topics

  • Giles question about our versioning compatibility
    • He was thinking we said versions would work with other OMPI versions in a minor series.
    • We thought that MPI and OSHMEM apps can be forward compatible within the same series.
    • Use cases:
      • cluster - same version of Open MPI everywhere
        • direct launch in containers everything the same version.
      • singularity containers - orted one version, app/libs running a different.
      • docker style - mpirun outside of container, app and libs all different versions than mpirun
    • Amazon recommends that all pieces of orte/mpi be same version in container
      • not everyone agrees.
    • Open MPI version | PMIx version Cross-version mpirun interoperability v2.0.x | v1.1.5 v2.1.x | v1.2.5+ v3.0.x | v2.0.3+ ------------ newer any client/server combo. PMIx v2.1.x - fixed cross compatibility for dstore and other stuff. v3.1.x | v2.1.x master | v3.0.x
    • Nathan - we should version ORTE differently (we do for sharedlibs now, but not for orte 'project')
    • Brian though we agreed for OMPI v3.0+ we were going to support cross version fro both singularity and docker.
    • Realy talking about mpirun ONLY, not some nodes have OMPI vX and OMPI vY.
    • Agreed that app has to link against same version of libOMPI everywhere
    • Agreed that all ORTEDs need to be same everywhere.
    • Need to decide if we want Version of orte in mpirun/launcher and orte in orted different - to support docker or not.
      • singularity mpirun is in the same boat as orteds.
    • PMIx is still in the picture too.
    • Brian thought we agreed to for OMPI v3.0 and beyond would support both docker and singularity use cases.
      • (not neccisarily what he wants)
    • Discuss this next week.
      • A lot of mixed cases here.
      • need to make sure someone documents.
      • Certainly not being tested.
    • Need to start testing version compatibility compiled with / run with.

MTT / Jenkins Testing Dev

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally