Skip to content

WeeklyTelcon_20200609

Geoffrey Paulsen edited this page Jun 9, 2020 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Aurelien Bouteiller (UTK)
  • Austen Lauria (IBM)
  • Barrett, Brian (AWS)
  • Brendan Cunningham (Intel)
  • Christoph Niethammer (HL
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia/Mellanox)
  • Matthew Dosanjh (Sandia)
  • Naughton III, Thomas (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia/Mellanox)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Joseph Schuchart
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Ralph Castain (Intel)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • William Zhang (AWS)
  • Xin Zhao (nVidia/Mellanox)
  • mohan (AWS)

New

  • nothing new.

Release Branches

Review v4.0.x Milestones v4.0.4

  • Given the v4.1 plan (see below), we'd like to release v4.0.4 before next Tuesday.
  • shmat / shmdt - What is the problem scenario?
    • If the sky is falling, we need to know.
    • UCX, OFI MTL/BTL don't rely on these hooks.
    • openib, PAMI do rely on these hooks.
    • What about UCT, Portals?
    • Need to communicate silent data corruption risk.
      • registration cache that mixes and matches shared memory segments and IB segements?
      • Application making sysV registration calls.
      • We need to describe to customers if you do X, Y, and Z.
        • openib assumes a working registration cache. Found an issue where the registration cache doens't work.
    • How did patcher even build?
      • Looking at the code, it looks like it didn't include portions of the build that failed. :(
      • on RHEL8 with glibc 2.28-72 (CENTOS-8?)
  • ACTION:
    • Write up of core problem case and what type processes might be affected.
    • Need to know what components are affected.
    • reproducer if possible (least important)

Review v4.1.0 Milestones v4.1.0

  • Schedule:
    • Optimistic goal is to release June 30th.
    • rc1 around June 22nd.
    • If there are things people need to get in, please reach out to Brian and/or Jeff
  • Release Engineers: Brian (AWS) Jeff Squyres (Cisco)
  • We've come to consensus for a v4.1.0 release
    • Not breaking ABI or backwards compatibility.
    • Blocker moving forward is to start from the v4.0.4 tag (Tomorrow)
    • NOT touching runtime!!!
    • Not going to be pulling in a new PMIx version.

Review v5.0.0 Milestones v5.0.0

  • PMIX
    • PPN scaling issue - simple algorithmic issue in this function
      • PMIX talked about it. Artem might know someone who might be interested in working on it.
      • Algorithm behind one of the interfaces doesn't scale well.
      • Not a regression. Above ~ 4K nodes, becomes quadratic.

master

Face to face

  • No discussion since COVID19

Infrastrastructure

  • scale-testing, PRs have to opt-into it.

Review Master Master Pull Requests

CI status


Depdendancies

PMIx Update

ORTE/PRRTE

MTT


Back to 2020 WeeklyTelcon-2020

Clone this wiki locally