Notes on MPI+X vs Chapel
Some random notes, no particular point or order.
MPI+X: how to evolve MPI for “exascale”. Do we just engineer scalable primitives/hardware, or do we consider other abstractions too.
Modeling MPI perf: simple cost model is linear. Less bytes communicated = less time, and vice versa. But small messages are a lot more expensive.
Traditional model is BSP (Bulk Synchronous Parallel) — MPI is designed around BSP. Hence the use of collectives. Users can use asynchronous APIs for comm/compute overlap. Extensions (one-sided notification/non-blocking synchronization) being considered.
MPI+X = hybrid approaches. “Post-BSP programming”. Neighbor collectives, non-blocking collectives. X= OpenMP or CUDA or Global Arrays.
Some outstanding problems — IO performance, topology. (Side note: MPI_Cart_create is an interesting idea).
Chapel
PGAS (Partitioned Global Address Space). Async PGAS. Largely what you’d think.
High-level, top-down, python-like are words that have been used in its context. Supported by Cray?
Continued reading: https://dl.acm.org/doi/pdf/10.1145/2780584
(One-sided puts/gets in MPI-3 to bridge mismatch with RDMA)