Open in app

Sign In

Write

Sign In

Ankush Jain
Ankush Jain

38 Followers

Home

About

Sep 9

Playing With Perf Probes

Moving my note-taking to Jekyll + Github Pages because Medium is stupid. https://anku94.github.io/blog/2023/perf-probes/

1 min read

1 min read


Sep 5

Profiling MPI with Perf

The goal here is to profile MPI-based applications. The key difference is that we would also like to profile MPI itself. Most high-level HPC tracing and profiling tools stop at MPI-level calls. So we use perf. perf does not care about MPI or not — it will profile the entire…

1 min read

1 min read


Aug 31

Notes on MPI+X vs Chapel

Some random notes, no particular point or order. MPI+X: how to evolve MPI for “exascale”. Do we just engineer scalable primitives/hardware, or do we consider other abstractions too. Modeling MPI perf: simple cost model is linear. Less bytes communicated = less time, and vice versa. …

1 min read

1 min read


Jul 13

On MMIO, DMA, and PCIe 3.0

The goal of this post is to reconcile these two papers and some slides from Linux Plumbers’ Conference. https://www.usenix.org/system/files/conference/atc16/atc16_paper-kalia.pdf, and https://dl.acm.org/doi/10.1145/3230543.3230560 The ATC16 paper seems to make some confusing claims in Section 2.1 The bumps for DMA in Fig 2 of ATC16 should not be at C_rc multiples, but at…

2 min read

2 min read


Nov 18, 2022

On Taylor Swift, Tickets, and Scale

The purpose of this post is to explore the design options for a ticket booking service, and see how it aligns with what TicketMaster (TM) does. …

5 min read

5 min read


Aug 27, 2022

notes on linux graphics

Previously: X server multiplexed access to GPU. all apps went through X. later OpenGL came in and OpenGL commands were translated to X11 protocol and passed on to GPU. This is Indirect Rendering. Wayland is a more modern display protocol. Indirect Rendering is obviously slow because you have a stupid…

1 min read

1 min read


Aug 24, 2022

On doorbells/NVMe etc.

libfabric queue pairs and ops on those are thread safe. one queue pair can be shared between threads (although there might be a performance cost). NVMe queue pairs are to be assigned to threads — not thread safe (and user or kernel-level locks would be too slow). Wonder why. NVMe…

2 min read

2 min read


Aug 24, 2022

On B+ Trees vs LSM Trees

B Tree: tree with a fanout of k (lower depth -> lower length of chained lookup pattern), also self-balancing. Insert/Search/Delete all O(log N). Extension of binary tree, suitable for storage devices, because storage latency sucks and you want smaller lookup chains. Variants used in NTFS/APFS/btrfs/extr/Reiser4. B+ Tree: popularly used in…

2 min read

2 min read


Aug 24, 2022

notes on ebpf/kernel bypass/storage latency

Modern storage stacks — 2–7 GB/s, with 4–5 us latencies Half the latency comes from software stack — bad. Papers criticize SPDK/kernel bypass as having a bunch of problems (polling, wasted CPU etc). Instead use eBPF to inject functions into kernelspace. Breakdown of 6.27 us read() syscall, 512B, Intel Optane: …

2 min read

2 min read


Jul 22, 2022

Python multiprocessing OOM handling

Situation: want to join 500 dataframes against a large dataframe. multiprocessing.Pool seems to keep getting stuck. what’s happening: subprocess is being killed by linux OOM killer (check syslog) multiprocessing doesn’t handle it well. Another unrelated problem with multiprocessing: If a task raises an exception, it will be thrown on the…

1 min read

1 min read

Ankush Jain

Ankush Jain

38 Followers

Umm…

Following
  • Jon Alexander

    Jon Alexander

  • danah boyd

    danah boyd

  • Instamojo.com

    Instamojo.com

  • Intel

    Intel

  • The Stoa

    The Stoa

See all (74)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams