Skip to content

WeeklyTelcon_20170327

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brian Barrett
  • Geoffroy Vallee (ORNL)
  • David Bernholdt (ORNL)
  • Howard
  • Josh Hursey
  • Joshua Ladd
  • Ralph
  • Nathan Hjelm

Agenda

v2.0.x

  • No news is good news.
  • A couple of pull requests that are in push-back state.
    • reviewer wanted something, or just not done.
  • Don't have a time-line, but will need one.
  • April 8th sounds like a good arbitrary date.
  • Howard pushed some PRs into v2.x (after v2.1.0 released)
  • Did not create a v2.1.x branch (v2.1.0 branched off of
  • Question: Can vendors backport features to 2.x branch and maintain it as a vendor specific branch?
    • We haven't done this in the past, but we could talk about it.
      • Some pushback about this.
    • Maybe the question is, should there be a v2.2 ?
    • On v1.10 we allowed PRs against the branch, even though there was no intention to release.
  • Some not in favor of v2.2, since we now have 4 month cycle for branching.
    • v3.0 has already diverted so much from v2.1
    • Takes a lot of testing to retest.
  • Mellanox is not interested in releasing a v2.2, but is interesting in branching v2.1 out, and then having a v2.x that they can push to.
  • Two options:
    • Could make a branch with no goal of releasing. Somewhat unclear to customers that Open MPI wouldn't release.
    • If we make a branch for v2.2 and try to have a release, that seems like a lot of extra testing on top of our 4 month release schedule.
  • If we create the v2.2 branch, and no one else chooses to push there, how does that affect others?
  • If we have a branch in OMPI tree, what do we need?
    • If we have a release branch, we need a release manager, and CI testing, etc.
  • SO what is after 3.0? Is it a 3.1?
    • If we want new OSHMEM changes, it needs to be new major version change.
    • What we discussed was do a release every 4 months, (either major or minor). What determines if it's major or minor is based on back-wards compatibility.
    • If there are changes of backwards compatibility, it will be 4.0, if not, it will be 3.1.
  • Much confusion about where to target new features and bugfixes, etc.
  • A lot of discussion about stabilizing.
  • branch on a date / release by a date - needed for regularly scheduled releases.
  • If someone wants to push new minor features / bugfixes on a release branch.
    • Should have some level of CI testing, and a release manager.
  • IBM described their current process, and how painful it is to maintain.
    • Changes downstream without getting them upstream first.
  • For Vendors if we keep the upstream simple and date-based.
    • Time based release gives suborganizations a little freedom to help plan on.
    • Solid timeline seems like a really good idea for roadmap planning.
    • Bumps from transitioning to new system.
  • Cisco is 100% upstream. Just got work from a year ago from upstream.
    • Hasn't always been, and might not be in the future.
  • If vendor could have absolute confidence of the release time of a release.
  • We are doing two things to try to hit the v3.0 schedule:
    • Whitelisted a few items on v3.0, and been firm on that.
    • said no to v2.2
  • had to end the conversation early due to time.
  • We branched for v3.x - So don't forget to PR over to v3.x when PRing over to v2.x
  • Everything is off the whitelist, except PMIx.
  • PMIx - reason we're doing an accelerated v3.0
  • Discussed PMIx PR 3194 - got pushed to v3.1
  • Whitelist Issue 3107
  • UCX got in.

PMIx v2.0 status

  • Ralph working on job control monitoring RFC.
  • Just finishing integration of this.
  • And Only other major piece is the messaging compatibility piece.
  • Still on track.

AWS Testing

  • No status this week.
  • Looking pretty good.
  • Nvidia cluster seems to have something wrong, most unusual errors.

MTT Dev status:

  • Thoughts about removing Travis?
    • Can turn off MAC parts of Travis, but nervous to turn off other Travis until after AWS is online.
    • Should do a comparison of coverage of Travis and what others are testing.
    • Howard will remove Mac OS testing from Travis... we'll keep the rest in Travis for now.

Exceptional topics

  • We should begin thinking about scheduling our next face to face.
    • Geoff will put out doodle for June and July and begin to nail down a schedule.
    • Cisco has a site in Chicago.
    • Prefer not-last week of July.
    • Dallas, San Jose, Seattle

Status Updates:

Status Update Rotation

  1. Cisco, IBM, ORNL, UTK, NVIDIA, Amazon
    • IBM - working Spectrum MPI based on v2.0.2 in field, working well.
    • ORNL setup mtt to do some testing
    • Amazon - build scripts for PMIx / hwloc - Some news about scripts not cleaning up correctly after themselves.
      • Want to start using s3 instead of gatorhost for storing nightly tarballs.
        • Also, eventually all tarballs will move out of ompi repo, so our main repo will be small again.
      • Release work - takes time.
      • Trying to hire staff.
  2. Mellanox, Sandia, Intelm
  3. LANL, Houston, IBM, Fujitsu

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally