Notes on MPI+X vs Chapel

1 min readAug 31, 2023

Some random notes, no particular point or order.

MPI+X: how to evolve MPI for “exascale”. Do we just engineer scalable primitives/hardware, or do we consider other abstractions too.

Modeling MPI perf: simple cost model is linear. Less bytes communicated = less time, and vice versa. But small messages are a lot more expensive.

Traditional model is BSP (Bulk Synchronous Parallel) — MPI is designed around BSP. Hence the use of collectives. Users can use asynchronous APIs for comm/compute overlap. Extensions (one-sided notification/non-blocking synchronization) being considered.

MPI+X = hybrid approaches. “Post-BSP programming”. Neighbor collectives, non-blocking collectives. X= OpenMP or CUDA or Global Arrays.

Some outstanding problems — IO performance, topology. (Side note: MPI_Cart_create is an interesting idea).

Chapel

PGAS (Partitioned Global Address Space). Async PGAS. Largely what you’d think.

High-level, top-down, python-like are words that have been used in its context. Supported by Cray?

Continued reading: https://dl.acm.org/doi/pdf/10.1145/2780584

(One-sided puts/gets in MPI-3 to bridge mismatch with RDMA)

Notes on MPI+X vs Chapel

Chapel

Written by Ankush Jain