-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20200609
Geoffrey Paulsen edited this page Jun 9, 2020
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Aurelien Bouteiller (UTK)
- Austen Lauria (IBM)
- Barrett, Brian (AWS)
- Brendan Cunningham (Intel)
- Christoph Niethammer (HL
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Josh Hursey (IBM)
- Joshua Ladd (nVidia/Mellanox)
- Matthew Dosanjh (Sandia)
- Naughton III, Thomas (ORNL)
- Todd Kordenbrock (Sandia)
- William Zhang
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- Joseph Schuchart
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Michael Heinz (Intel)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Ralph Castain (Intel)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- William Zhang (AWS)
- Xin Zhao (nVidia/Mellanox)
- mohan (AWS)
- nothing new.
Blockers All Open Blockers
Review v4.0.x Milestones v4.0.4
- Given the v4.1 plan (see below), we'd like to release v4.0.4 before next Tuesday.
- shmat / shmdt - What is the problem scenario?
- If the sky is falling, we need to know.
- UCX, OFI MTL/BTL don't rely on these hooks.
- openib, PAMI do rely on these hooks.
- What about UCT, Portals?
- Need to communicate silent data corruption risk.
- registration cache that mixes and matches shared memory segments and IB segements?
- Application making sysV registration calls.
- We need to describe to customers if you do X, Y, and Z.
- openib assumes a working registration cache. Found an issue where the registration cache doens't work.
- How did patcher even build?
- Looking at the code, it looks like it didn't include portions of the build that failed. :(
- on RHEL8 with glibc 2.28-72 (CENTOS-8?)
- ACTION:
- Write up of core problem case and what type processes might be affected.
- Need to know what components are affected.
- reproducer if possible (least important)
Review v4.1.0 Milestones v4.1.0
- Schedule:
- Optimistic goal is to release June 30th.
- rc1 around June 22nd.
- If there are things people need to get in, please reach out to Brian and/or Jeff
- Release Engineers: Brian (AWS) Jeff Squyres (Cisco)
- We've come to consensus for a v4.1.0 release
- Not breaking ABI or backwards compatibility.
- Blocker moving forward is to start from the v4.0.4 tag (Tomorrow)
- NOT touching runtime!!!
- Not going to be pulling in a new PMIx version.
Review v5.0.0 Milestones v5.0.0
- PMIX
- PPN scaling issue - simple algorithmic issue in this function
- PMIX talked about it. Artem might know someone who might be interested in working on it.
- Algorithm behind one of the interfaces doesn't scale well.
- Not a regression. Above ~ 4K nodes, becomes quadratic.
- PPN scaling issue - simple algorithmic issue in this function
- No discussion since COVID19
- scale-testing, PRs have to opt-into it.