2 min readJul 1, 2021
memory mapped files, basics:
- if page swapped in, easy peasy
- if not, page-fault and do what?
both mmap and buffered syscalls hit the page cache. on a page cache miss, the subsequent stack for I/O is common.
page cache -> filesystems -> block layer -> driver -> disk
— — — — — — — — — -
block layer: works with I/O requests
- starting sector/length/read/write/special
- can have hints (SYNC) and other flags (FUA, FLUSH)
block request lifecycle:
- created in block layer when I/O submitted by FS
- can be delayed/merged/reorganized
- dispatched to device driver
- completed when I/O is finished
— — — — — — — — — -
linux memory management:
- programs mapped virtual address space is organized as a red-black tree of VMA structs (virtual memory area)
- each VMA is a contiguous, non-overlapping span with metadata
- task_struct->(mm_struct*)mm->(vma_area_struct*)mmap
- task_struct is process-level struct. mm_struct is all mem_mgmt. mmap is a linked list of VMAs. (mm->mm_rb is the RB tree of VMAs).
- some of the VMAs are file-backed. VMA size is a multiple of page size.
- mm->pgd = page table for the process. each PTE = one 4K page
- pages -> virtual memory, frames -> physical memory
- page cache -> in-memory cache of a page representing stored data
- contiguous virtual pages can point to random frames. (that’s why mapping is per page)
— — — — — — — — —
pages and I/O
- all buffered I/O goes through page cache
- linux does read-ahead for mmap’ed I/O
- madvise — MADV_NORMAL/RANDOM/SEQUENTIAL
- mmap’s can be PRIVATE or SHARED. only applies to UPDATES. (assuming even private maps are shared in read-only mode. CoW when changed)
— — — — — — — — -
page fault handling:
- do_page_fault() -> locate VMA using find_vma()
- check PTE, and accordingly
- do_anonymous_page()
- do_swap_page() (major fault)
- filemap_fault() — invoked via vma operations vector for mmap’ed region to read in file data during a page fault
- find_get_page() -> consult page cache using find_get_page() + if found, do_async_map_readahead
- if not found, MAJOR FAULT, do_sync_mmap_readahead() + pagecache_get_page()
- do_sync_mmap_readahead() -> do_page_cache_ra() (the actual READ)
- do_page_cache_ra_unbounded(): allocate memory + submit I/O using read_pages()
- read_pages() glues mm module to fs module, by invoking struct address_space_operations{} functions
- address_space_operations{} suite implemented by all filesystems (https://elixir.bootlin.com/linux/v5.13/source/include/linux/fs.h#L370)