ASC Q3 2020 Meeting

PMIx Standard Administrative Steering Committee (ASC) Q3 2020 Meeting

Date: July 22, 2020
Time: 11 am - 5 pm US Central Daylight Time.
Location: Virtual Meeting. WebEx information (bottom of the page): https://recaptcha.open-mpi.org/pmix-std-recaptcha/
Active Notes Link: Google Doc - Please add your name and affiliation.

Quick Links

Governance Document [PDF] [GitHub Repo]

Agenda (timeline in chart below)

Discuss 2020 quarterly meetings
- Reminder of quarterly meeting dates
- Discussion of Q4 virtual meeting logistics
- Discuss next quarterly meeting dates
Call for officer nominations (1 Co-chair and 1 Secretary position - both "odd year")
Call for new ASC Members
Vote on new ASC members
Roll call for attendance/vote
- Review voting eligibility
Governance PRs up for a vote:
- Reading and First Vote
- Reading and Second Vote
  - https://github.com/pmix/governance/pull/11
PMIx Standard PRs up for a vote:
- First Vote
  - https://github.com/pmix/pmix-standard/pull/235
  - https://github.com/pmix/pmix-standard/pull/235/commits/a1370b884f55e62b32b1e2ff1a11e82a4c40cbdd
- Second Vote
  - https://github.com/pmix/pmix-standard/pull/192
PMIx Standard PRs up for a Reading
PMIx 4.0 release update
Plenary discussion items
- Storage working group presentation and discussion
Working Group Presentations
- Client Separation / Implementation Agnostic Document Working Group
- Slicing/Grouping of functionality Working Group
- Dynamic Workflows Working Group
- Storage Working Group
Technical presentations
- Dave Solt, Josh Hursey (IBM) "Modex Exchange Modes: PMIx_Commit/Fence/Get Semantics"
- Artem Y. Polyakov, Charles Shereda, Howard Pritchard, Boris I. Karasev (Mellanox/Nvidia, LANL) "PMIx unit testing infrastructure"
Use case presentations
- Kaiyuan Hou, Quincey Koziol, Suren Byna (Northwestern U., LBNL) "TaskWorks: An HPC friendly task engine"
Open call for new Working Groups
Discussion items:
- (None at this time)

Agenda Timeline

We will try to keep to this timeline as best as we can. However, discussion items may take longer/shorter than anticipated and as a result, the agenda may need to be adjusted during the meeting.

We will start roll call promptly at 11:15 am. After this point, the co-chairs may decide, during the meeting, to adust the timeline based on the discussion.

All times in US Central. (Last Update: July 10, 2020)

Start	End	Topic
11:00 am	11:05	Gathering
		* Slides
11:05 am	11:15	Discuss Quarterly Meetings
11:15 am	11:25	Roll Call, Vote on New ASC Members, Call for New ASC Members
		- Roll call
		- Call for Officer Nominations
		- Call for New ASC Members
11:25 am	11:35	Governance PR Votes
		* Second Vote: https://github.com/pmix/governance/pull/11
		- https://doodle.com/poll/apgnuykk45hsehc2
11:35 am	11:50	Standard PR Votes
		* First Vote: https://github.com/pmix/pmix-standard/pull/235
		- https://doodle.com/poll/awk7f9nug54efy7h
		* Second Vote: https://github.com/pmix/pmix-standard/pull/192
		- https://doodle.com/poll/x77y8qpbkk7cdke6
11:50 am	12:20	PMIx 4.0 Release Update
12:20 pm	1:00	Break
1:00 pm	2:00	Plenary: Storage Working Group
		* Shane Snyder, "Integrating storage APIs into PMIx"
		- Slides
2:00 pm	2:45	Working Group Updates
		- Client Separation / Implementation Agnostic Document Working Group
		- Slicing/Grouping of functionality Working Group
		- Dynamic Workflows Working Group
		- Storage Working Group
2:45 pm	3:00	Open Call for New Working Groups
3:00 pm	3:15	Break
3:15 pm	4:45	Technical and Use Case Presentations
		* Josh Hursey, David Solt (IBM) "Key-Value Exchange Modes: Put/Commit/Fence/Get Semantics"
		- Video Link
		- Slides
		* Artem Y. Polyakov, Charles Shereda, Howard Pritchard, Boris Karasev (Mellanox/Nvidia, LANL) "PMIx unit testing infrastructure"
		- Slides
		* Kaiyuan Hou, Quincey Koziol, Suren Byna (Northwestern U., LBNL) "TaskWorks: An HPC friendly task engine"
		- Slides
4:45 pm	5:00	Additional Discussion Items

Attendees

Kathryn Mohror (LLNL)
Josh Hursey (IBM)
Thomas Naughton (ORNL)
Ralph Castain (Intel)
Michael Karo (Altair)
Stephen Herbein (LLNL)
Shane Snyder (ANL)
Ken Raffenetti (ANL)
Bengisu Elis (TUM)
Swaroop Pophale (ORNL)
Artem Polyakov (Mellanox)
John DelSignore (Perforce)
Jai Dayal (Intel)
Kaiyuan Hou (Northwestern U.)
Suren Byna (LBNL)
Quincey Koziol (LBNL)

Notes

Discuss 2020 quarterly meetings
- Q4 will be virtual due to COVID-19
- Doodle poll for 2021 meetings (https://doodle.com/poll/uckuxpumgyhdekt3 )
Call for officer nominations (1 Co-chair and 1 Secretary position - both "odd year")
- Send an email to Co-Chairs with nominations (Josh and Kathryn)
Call for new ASC Members
- N/A
Roll call for attendance/vote
- Present: Altair, ANL, IBM, Intel, LLNL, Mellanox/Nvidia, ORNL, TUM, Perforce/RogueWave*
- Not Present: Inria, Microsoft, OSU, UTK, LANL
Governance PRs up for a vote:
- Adjust the ASC quarterly agenda finalization · Issue #11 · pmix/governance
  - Concerns from Martin about being very precise in the deadlines (e.g., AOE time zone)
    - General consensus that this can be a follow up PR if needed. Don’t want it to delay this PR vote.
  - Question about "agenda finalization" deadline
    - Martin ask if that term was intended at line 245, and the chairs believe it was
    - Artem asked what exactly is intended to be frozen
      - Josh: PRs up for votes or reading, technical talks are not subject to the agenda finalization deadline
  - Passes - Yes: 7, No: 0, Abstain: 2
    - Yes: Mellanox/Nvidia, Intel, ORNL, IBM, LLNL, ANL, Altair
    - No:
    - Abstain: TUM, Perforce
PMIx Standard PRs up for a vote:
- First Vote
  - https://github.com/pmix/pmix-standard/pull/235
  - https://github.com/pmix/pmix-standard/pull/235/commits/a1370b884f55e62b32b1e2ff1a11e82a4c40cbdd
  - Reading a diff of the changes rather than the full chapter. There were several comments to the PR last night. The plan is to read the diff and then have a discussion around the comments from last night and then proceed to the vote.
  - Discussion about which suggestions and comments are within scope of the PR and which ones are out of scope and should be spun out into separate efforts
  - Ralph: can pull in the non-controversial changes into V4 to reduce the number of conflicts on merging/rebasing
  - Kathryn: instinct would be to keep the two efforts independent and have the WG go through the normal process
  - Ralph: pulling in the changes will happen either way and it will reduce work going forward
  - Kathryn: If we preemptively pull something into V4 that we don’t all agree on and change it in V5, it could create a backwards incompatibility
  - Ralph: the incompatibility will exist either way and adding the changes to V4 won’t make anything worse
  - Kathryn: going back and forth on definitions could make the standard look less reliable. Not strictly looking to be a stickler for process
  - Josh: the new V4 definitions are already in conflict with this PR, and there seems to be hesitation about selectively pulling out definitions from this PR. Would handling this in the conflict resolution make sense? Do we want to freeze the definitions and prevent changes in V4?
  - Ralph: I don’t think that makes sense, as it restricts V4. I have posted PRs on V4 changes and people can go make comments on these changes/PRs.
  - Ralph: will attempt to improve and clarify definitions in V4, and the group can update and change things in V5. Current changes are up in PRs, and people can comment
  - Kathryn: we have a process for V5 that requires review
  - Josh: bringing V4 under the V5 process doesn’t seem like the right move. V4 does have a review process involving release candidates, and the community can provide feedback on those. The WG can also provide feedback on the release candidates. Hopefully we can meet in the middle on these points
  - John: it seems like the contention is between making things better vs. making things perfect, and you can’t make things perfect all the time
  - John: question about "tools do not have peers"
    - Ralph: a tool is a process started from the command line, so there is only one
    - Ralph and John to take discussion offline
  - Stephen: someone should go through the comments and mark which ones are within scope for the PR and which ones are out of scope. For anything out of scope, it should be given a separate issue and a separate change be added to the standardization pipeline.
  - Josh: Suggestion to not vote on this today, give WG feedback. Instead vote on it at the Q4 meeting and do reading of any WG changes (identified "in scope" items).
- Second Vote
  - https://github.com/pmix/pmix-standard/pull/192
  - Passes - Yes: 8, No: 0, Abstain: 1
    - Yes: Perforce, Mellanox/Nvidia, ANL, TUM, ORNL, Altair, IBM, LLNL
    - No:
    - Abstain: Intel
PMIx Standard PRs up for a Reading
PMIx 4.0 release update
- Almost done with the tool chapter
  - Needs to go back to the mpir-shim now that it is simplified a bit
- In a few places, people wanted clarifications
- Will spend a good chunk of August having people look at it and making finalizations
- So target of end of august if all goes well
Discussion on pulling changes voted on for V5 into V4?
- No objections to pulling in voted on/approved changes into V4
Plenary discussion items: Storage working group presentation and discussion Slides
- Shane: Extending PMIx to support applications, IO middleware, & WLM/RM interactions with storage.
- (In progress) Query APIs & Event notification APIs
- (Future) Direct storage for commands like stage-in data, etc. Working more on these pieces going forward. Working on abstractions for example file storage vs. object storage (e.g., Lustre vs DAOS)
- Intend to have first reading at Q4 for Query/Event APIs
- Query API: Useful to discovery HPC storage hierarchy, characteristics, current state, etc. What parallel filesystems are available, how to access specific storage system, what is current state, capabilities (bandwidth, persistence, etc.)
- Example of PMIx_Query_info() interface to get array of query structs (pmix_query_t’s), and output array of results (pmix_info_t’s)
  - See details in slides for more examples. A pmix_query_t contains:
  - Keys - attributes (strings)
  - Qualifiers (key/val constraints to restrict scope of responses), e.g., PMIX_PROC_ID=1234
- Looking at extending this Query interface for additional storage qualifiers
  - PMIX_STORAGE_ID
    - (e.g., lustre-fs0)
    - Q: Would ID’s be system specific?
      - Could be set manually, or admins could generate, or generate by PMIx server. Intent is just that it is a unique identifier.
  - PMIX_STORAGE_PATH
    - (qualifier for query e,g., mount point, in lieu of using Storage_ID)
  - PMIX_STORAGE_TYPE
    - (qualifier to describe different types of storage resources, to help users understand underlying characteristics, e.g., DAOS, PFS, SSD, burst buffers, etc. Still in-progress on what these types should be.)
    - Intent is to give users a notion of the "type" of the storage w/o having to have lots of details about the underlying details.
    - Q: Is intent to define a fixed set of storage types, and then additional set that users could be defined at runtime?
      - The "non-standard" types or user-defined/hybrid cases could be created alongside the standardized cases.
      - Discussion about possibly have extensible method for representing capabilities/characteristics of the storage, e.g., strings w/ details that could be returned via Query. Storage systems could use a generic way to provide details w/o having lots of detailed types. Balance between detail/generic query results.
      - What is correct balance for having more generic "higher-level" list (e.g., “PFS”) and then later drill in further. Important to give notional level of intent back for users.
      - Possibly start with minimal "types" at start and then add as needed.
      - Q: Who are entities that will populate this data?
        
        Likely at the resource mgr for filling in the initial characteristics. Seems less likely that PMIx would discover DAOS, etc.
        
        Maybe something like DAOS itself know/announce
        
        App/Tool asking for a request, that would be relayed to PMIx server that would have plugin to gather that info (e.g., lustre) and provide that info. Otherwise would fall back to some more generic info from Resource Mgr./host environment (i.e., if no pmix specific storage plugin for that info, then fallback to more generic/host case).
  - Storage qualifiers (existing) that can be reused for storage queries
    - PMIX_USERID, PMIX_GRPID, PMIX_HOSTNAME, PMIX_PROCID
    - Using Lustre proof-of-concept to sanity check these queries to qualify queries on user/group/project/quota
    - These are used in concert with other qualifiers (e.g., PMIX_STORAGE_PATH) to further qualify queries
    - Useful to lookup things like scratch dirs for a user, quota space, etc.
    - Could possibly use PMIX_HOSTNAME to specify a Lustre OSS/OST to narrow storage target for Lustre. Possibly be able to use PMIX_PROCIDs too, but still in-progress for downselecting/restricting the storage targets.
  - New PMIx Query Attributes for storage
    - PMIX_QUERY_STORAGE_LIST -- list of identifiers for available storage systems. Accepts a qualifier for PMIX_STORAGE_TYPE to limit to a specific type of storage (e.g., "node-local", etc.)
    - PMIX_STORAGE_CAPACITY_LIMIT -- overall capacity
    - PMIX_STORAGE_CAPACITY_FREE -- used capacity
    - Q: Would limit/free be for entire filesystem or just hte user?
      - That would be based on the given qualifiers
      - Example: Could add PMIX_USERID or PMIX_GRPID qualifiers to further restrict results (i.e., limit to view of a given user, otherwise would give back a more global view)
    - PMIX_STORAGE_OBJECT_LIMIT
    - PMIX_STORAGE_OBJECT_FREE
      - Limit/tracking on things like i-nodes, but could also be way to support more object-storage.
      - Q: Is it easy/meaningful to say "number of free" objects in an object-storage system?
        
        Some systems keep table of objects, e.g., uint_32_t max, so those remaining available within that table would be the "free" space
      - Possibly may want to have object_used instead of the capacity_free, point being that the intent being to capture the available space for the given context
      - These attributes may be optional, and if the attribute is less logical/obvious for the underlying storage system they can return "unsupported".
    - PMIX_STORAGE_XFER_SIZE
      - Optimal transfer size for the storage system
      - Motivated by filesystem block size for I/O perf
    - Note - Generalizing the statfs() syscalls as basis for these query interfaces. Simplified implementation but also provides a commonly available basis for storage queries. Note the per-user and per-project queries outside statfs()
    - PMIX_STORAGE_BW_LIMIT
    - PMIX_STORAGE_BW
      - Providers users w/ some expectation of peak and observed storage system bandwidth.
      - Qualifiers can be used to downselect to a specific storage server (e.g., specific OST)
    - Note - PMIX_STORAGE_TYPE likely to provide more user-friendly aggregation of these details
  - Event Notification API
    - Mechanism to respond to system changes, e.g., failures, low-capacity, servers idle/recovering/overloaded
    - Plan to use the PMIx_Notify_event()
    - Defining the pmix_status_t that are relevant for storage and associated pmix_info_t’s for details on the events
  - Next Steps
    - Creating discussion on github via issues and submit a PR with proposed changes
    - Start storage API work, looping in people from Cray and OLCF for DataWarp and Spectral
    - Call for participation: Meeting every other Tuesday at 4pm CST (next one at Aug 4th)
      - Also join the google group mailing list
    - Teams/people to reach out to
      - IBM’s Burst Buffer API: Tom Gooding (IBM) and Adam Moody (LLNL)
        
        https://github.com/IBM/CAST
      - HPSS
    - Q: How anticipate exposing topology, e.g., rack/node local, etc.?
      - Not fully fleshed out but certainly something to include topology info. Not necessarily something useful for apps
      - There are examples of tiered storage within a single "storage" system, so likely good to expose some notion of topology/hierarchy
      - Likely something to fit in the storage type discussions
      - Regarding fault-tolerance, the sharing of storage between nodes, but throwing idea out there for providing such info for shared/failure-containment/redundancy
    - Consider the bandwidth relative to originating (client), for example the gateway/network links may be the limit above Lustre. Example: shared PFS across machines vs. lustre proper storage system capabilities. What’s reasonable expectation of bandwidth
    - Difficult at standard level to cope with that
    - Could look at the effective storage bandwidth at the standard level, it is rooted from client perspective.
    - No specific subsystem could answer on its own, but if have attribute could query on the bandwidth and possibly compute it from client perspective
Working Group Presentations
- Client Separation / Implementation Agnostic Document Working Group
  - Chapter 1
    - Changes made it in!
  - Chapter 2
    - Taking changes back and will put back up for a vote
  - Key-Value Chapter
    - Discussing how to re-structure to be easier for users to interpret
    - On Github as PR 255, includes a link to Josh’s PR against that PR with changes from Chapter 5 (includes PDF rendering) https://github.com/pmix/pmix-standard/pull/255
    - Terming it "Process Related Key-Value Exchange"
    - "Non process related key-value exchange" (which is publish/lookup)
    - Put, Get, Fence (optional, but what does it mean to have it), modex, etc. Give the users the context of how these pieces connect
    - Added examples for publish and lookup
    - As they are going through Chapter 5, spinning off discussions about tweaks to interfaces to improve them
    - Stephen: what is the overlap between the business card exchange and the chapter 5 changes
      - Josh: they are closely connected, and going back to the use case might be useful
    - Ralph: going through and looking at things, clarifying, etc. You implied that you might be changing the APIs
      - Josh: The goal of this change is to clarify and describe without changing things at all. In the course of those discussions, we noticed that (for example) adding a non-blocking interface might make sense in certain places. So those will be spun off into separate issues/PRs.
- Slicing/Grouping of functionality Working Group
  - MPI Sessions Use Case
    - Swaroop reading up and working on this use case
    - Thanks to Howard and Dan for presenting their MPI sessions slides and material
    - Intersection with the dynamic WG on extensions to process sets
      - Open GitHub issue: https://github.com/pmix/pmix-standard/issues/254
    - Open Call for participation on this use case
  - Power Community Initial Meeting
    - Initially coalesced around the idea that PMIx could be a passthrough interface for PowerAPI
    - Further feedback unveiled that the community hasn’t coalesced around that particular API and we don’t want PMIx to pick winners and losers.
  - Standards Integration
    - Added new function signature macros and automated the conversion to the new macro. Overcomes a significant hurdle to integration by de-duplicating code within the standard
    - Josh is looking into how to better handle code snippets/examples. Using OpenMP as a reference
    - Branch on Github: https://github.com/pmix/pmix-wg-slices/tree/hybrid-prog-models
  - Coverage Analysis
    - Updated for latest use case documents
    - 28% coverage for client interfaces, 11% for server-side interfaces
- Dynamic Workflows Working Group
  - Initially looking at Spark for the main use case driver, but administrative reasons have shelved that idea. Went back to look for new implementation drivers: K8s, ADIOS/Savannah, mOS.
  - Also looked at Radical Pilot, Swift-t, Balsam
  - Generalized GPU affinity to device affinity
    - May be multiple devices of multiple types on a node
  - ADIOS/Savannah
    - Working to make cumbersome scripts to make workflow dynamic cleaner with PMIx
    - Porting to PMIx with static support first then focusing on elasticity
    - "Faking" elasticity with PRRTE (pre-allocated 100 nodes and bursting from 50 -> 100) as a proof of concept
  - mOS
    - Need a way to specify and reconfigure affinity dynamically in dynamic jobs
    - Devices can move between Linux and mOS control w/out rebooting, which creates a dynamicity for workflows. Plus reconfiguration requires sudo, which we don’t want to give to users directly
    - PMIx/PRRTE can potentially handle the reconfiguration
      - How to handle failure conditions?
  - Next steps
    - Moving beyond concepts/discussions to implementations
  - Question
    - Stephen: will PRRTE be running as root to do the reconfiguration?
      - Yes. The initial idea is to run PRRTE as root and fallback to funnel reconfiguration requests elsewhere if that doesn’t work
- Storage Working Group
  - (See plenary notes)
Open call for new Working Groups
- Jai: Energy Efficiency WG - pinged Sid Jana from Intel
  - Stephen: not sure if there is enough difference between Energy Efficiency vs Power, but there was a Power discussion
  - Jai: will ping Sid again
  - Stephen: there was a discussion of using PMIx for cross-node communication. Maybe that still makes sense
  - Ralph: might be looking for something in GEOPM. Will talk internally to gather more details
- Kathryn: tools WG - might not have enough people with enough time
  - Ralph: the chapter being written for V4 could be a starting point/launch point
Technical presentations
- Dave Solt, Josh Hursey (IBM) "Modex Exchange Modes: PMIx_Commit/Fence/Get Semantics"
  - Video: https://youtu.be/O7Y7H27YoqA
  - Josh: We are now YouTube "influencers"! :-)
- Artem Y. Polyakov, Boris Karasev, Howard Pritchard, Charles Shereda (Mellanox/Nvidia, LANL) "PMIx unit testing infrastructure"
  - Slides
  - Q: Anyone have any experience with testing infrastructure for testing two-sided scenario like is being done here?
    - No responses or examples given for something similar/existing for this kind of testing
  - Q: Does ASC have any interest/ideas for being involved in this work, e.g., for stating intent for parts of the standard?
    - Would be good to have standardized test suite
    - Stephen’s working group with use-cases/functionality, if could distill into tests and connect to this work, that might be very useful
Use case presentations
- Kaiyuan Hou, Quincey Koziol, Suren Byna (Northwestern U., LBNL) "TaskWorks: An HPC friendly task engine"
  - Slides
  - Discussion about possible use cases, and granularity/atomicity for dynamic aspects at runtime -- is PMIx right fit?
  - Some past use with OpenMP/MPI interactions as possible use case
  - Some work in past on OpenMP/MPI, past working group on this and it was written up and published (at least first 4 bullets of slide#15).
  - Unsure if their use case.
  - Q: Is this the right group/context to get feedback on this?
    - Can send info on past work
    - Dynamic WG is likely the correct current work
    - Also, the Slices WG includes write up the past use case and reference to Geoffroy’s past paper
    - https://github.com/pmix/pmix-standard/issues/232
  - Q: Do you think it would be useful to have another WG for programming model interoperabilityh would be useful in adition to the current dynamic WG?
    - There is distinction.
    - Dynamic WG is more about orchestration
    - Whereas this has to do with some event/interleaving case
    - There is overlap, and possibly not enough people to have multiple WGs
    - Dynamic WG certainly has overlap, e.g., work w/ ADIOS group relates
    - Questions about more dynamic resource use/availability on-node
    - This comes up elsewhere, e.g., network fabric manager
    - Possibly a resource coordination WG?
    - Jai can send invite to Dynamic WG to join and see if overlap/match is good. Possibly a reasonable place to start
    - Ralph will look into the hybrid programming model group to see if there’s enough critical mass there
    - Quincey: We want to play nice so would like to make sure we connect to right spot
    - Other thought is to reinvigorate OpenMP/MPI case to see if valid to "get the gang back together" :-)
  - Also if LBNL might be interested in joining PMIx / ASC! :-)
Discussion items:
- (None at this time)