- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20220201
        Geoffrey Paulsen edited this page Mar 4, 2022 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoffrey Paulsen (IBM)
 - Austen Lauria (IBM)
 - Jeff Squyres (Cisco)
 - Brendan Cunningham (Cornelis Networks)
 - Brian Barrett (AWS)
 - Christoph Niethammer (HLRS)
 - David Bernhold (ORNL)
 - Hessam Mirsadeghi (UCX/nVidia)
 - Howard Pritchard (LANL)
 - Josh Hursey (IBM)
 - Thomas Naughton (ORNL)
 - Todd Kordenbrock (Sandia)
 - Tomislav Janjusic (nVidia)
 - William Zhang (AWS)
 
- Akshay Venkatesh (NVIDIA)
 - Artem Polyakov (nVidia)
 - Aurelien Bouteiller (UTK)
 - Brandon Yates (Intel)
 - Charles Shereda (LLNL)
 - Edgar Gabriel (UoH)
 - Erik Zeiske
 - Geoffroy Vallee (ARM)
 - George Bosilca (UTK)
 - Harumi Kuno (HPE)
 - Joseph Schuchart
 - Joshua Ladd (nVidia)
 - Marisa Roman (Cornelius)
 - Mark Allen (IBM)
 - Matias Cabral (Intel)
 - Matthew Dosanjh (Sandia)
 - Michael Heinz (Cornelis Networks)
 - Nathan Hjelm (Google)
 - Noah Evans (Sandia)
 - Raghu Raja (AWS)
 - Ralph Castain (Intel)
 - Sam Gutierrez (LLNL)
 - Scott Breyer (Sandia?)
 - Shintaro iwasaki
 - Xin Zhao (nVidia)
 
- Two HWLOC issues
- PRRTE/PMIx hwloc issue: https://github.com/openpmix/prrte/pull/1185 and https://github.com/openpmix/openpmix/pull/2445
 - hwloc when built with CUDA support, is hard linking against it.
- This doesn't work in the common case where CUDA isn't installed on login nodes.
 
 - hwloc v2.5 - v2.7.0 is putting variables in read-only memory into 
environ, but prrte is trying to modify these and segvs. - PMIx and PRRTE has block-listed large hwloc versions 2.5-2.7.0
- putstr(env) is segv-ing.
 
 - Discussions about minimizing mpirun/mpicc to only link against subset of opal.
 - Makes things slightly better, but not really. Still have cuda on some nodes and not on others.
 - Projected solution is to use hwloc plugins (dlopen cuda libs)
- A while back, hwloc changed default to NOT load components as plugins.
- He this this for Open MPI (some cyclic dependencies).
 - This is no longer an issue for us.
 
 - Now hwloc has reasonable defaults for some things build as plugins (dlopened at runtime).
 - Usually customers install in local filesystems.
 - This gets us around the dependencies.
 - So whenever this is actually fixed, Jeff will write docs, and we can touch on points.
 - From JOSH'es HWLOC PR, if there are any other suggestions or modifications, please put this on the hwloc PR.
 
 - A while back, hwloc changed default to NOT load components as plugins.
 
 - Resuming MTT development
- Like to have a monthly call.
 - Christopph Niethammer is interested.
- Might need a new cleanup mechanism when rolling out lots of versions.
 
 - Find out who's using python client, and what problems.
 - IU database plugin (what ends up getting data into MTT viewer) has a number of issues.
 
 - OMPI businessy things
- Usually happens in the summer
 - Auditing
 - Coverity
 - github
 
 
- Schedule: No schedule for v4.0.8 yet
- bugfixes case-by-case basis
 
 - Winding down v4.0.x, and after v5.0.x will stop
 - Really only want small changes reported by users.
 - Otherwise, point users to v4.1.x release.
 - Howard and Geoff will meet Jan 28th
 
- Schedule: Shooting for v4.1.3 end of March/Q1.
 - No other update.
 
- CI is back.
 - Need a full ROMIO update [Geoff to file issue}
- Open an issue to track this.
 
 - https://github.com/openpmix/prrte/pull/1176
 - Sessions - https://github.com/open-mpi/ompi/pull/9097
- Howard will rebase (again)
 
 - Prrte has for a long time has had a schizo component, that tries to provide an
interface based on what implementation the user's using.  CLI was still centralized,
and this was leading to difficulties.  Example: disagreement about how ranks should
be placed with 
-Noption. So moved some of these decisions down into a framework that has an OMPI component.- Some questions if we should bring this into v5.0 for OMPI. There is a PRRTE PR up with some early work.
 - This would be backported to the PRRTE release branch for our OMPI v5
 - Blocker v5.0 items are in the Project/2
 - Schedule is Q1
 
 - Thinking about an RC before and after Sessions.
- Well as far as tracking, we have nightly tarballs, and it'll be clear in git
 
 - Docs rework
- We made a lot of progress on revamping the docs with restructured text.
 - Might actually be able to get this done by v5.0.x
 - Dont go review yet, but lots of good progress.
 
 
- No new Gnus
 
- A fix pending to workaround the IBM XL MTT build failure (compiler abort)
 - Issue 9919 - Thinks this common component should still be built.
- Commons get built when it's likely their is a dependency.
 - Commons self-select if they should be built or not.