Ankush Jain
2 min readJul 1, 2021

memory mapped files, basics:

  1. if page swapped in, easy peasy
  2. if not, page-fault and do what?

both mmap and buffered syscalls hit the page cache. on a page cache miss, the subsequent stack for I/O is common.

page cache -> filesystems -> block layer -> driver -> disk

— — — — — — — — — -

block layer: works with I/O requests

  • starting sector/length/read/write/special
  • can have hints (SYNC) and other flags (FUA, FLUSH)

block request lifecycle:

  • created in block layer when I/O submitted by FS
  • can be delayed/merged/reorganized
  • dispatched to device driver
  • completed when I/O is finished

(credits: https://www.slideshare.net/ennael/kernel-recipes-2015-linux-kernel-io-subsystem-how-it-works-and-how-can-i-see-what-is-it-doing)

— — — — — — — — — -

linux memory management:

  • programs mapped virtual address space is organized as a red-black tree of VMA structs (virtual memory area)
  • each VMA is a contiguous, non-overlapping span with metadata
  • task_struct->(mm_struct*)mm->(vma_area_struct*)mmap
  • task_struct is process-level struct. mm_struct is all mem_mgmt. mmap is a linked list of VMAs. (mm->mm_rb is the RB tree of VMAs).
  • some of the VMAs are file-backed. VMA size is a multiple of page size.
  • mm->pgd = page table for the process. each PTE = one 4K page
  • pages -> virtual memory, frames -> physical memory
  • page cache -> in-memory cache of a page representing stored data
  • contiguous virtual pages can point to random frames. (that’s why mapping is per page)

— — — — — — — — —

pages and I/O

  • all buffered I/O goes through page cache
  • linux does read-ahead for mmap’ed I/O
  • madvise — MADV_NORMAL/RANDOM/SEQUENTIAL
  • mmap’s can be PRIVATE or SHARED. only applies to UPDATES. (assuming even private maps are shared in read-only mode. CoW when changed)

— — — — — — — — -

page fault handling:

  • do_page_fault() -> locate VMA using find_vma()
  • check PTE, and accordingly
  • do_anonymous_page()
  • do_swap_page() (major fault)
  • filemap_fault() — invoked via vma operations vector for mmap’ed region to read in file data during a page fault
  • find_get_page() -> consult page cache using find_get_page() + if found, do_async_map_readahead
  • if not found, MAJOR FAULT, do_sync_mmap_readahead() + pagecache_get_page()
  • do_sync_mmap_readahead() -> do_page_cache_ra() (the actual READ)
  • do_page_cache_ra_unbounded(): allocate memory + submit I/O using read_pages()
  • read_pages() glues mm module to fs module, by invoking struct address_space_operations{} functions
  • address_space_operations{} suite implemented by all filesystems (https://elixir.bootlin.com/linux/v5.13/source/include/linux/fs.h#L370)

https://elixir.bootlin.com/linux/latest/source/mm/filemap.c

No responses yet