mmap vs read/write: Choosing the Right File Access Strategy

This article covers mmap vs read/write: Choosing the Right File Access Strategy. When should you use mmap, and when are classic read/write loops better? Learn page cache behavior, page faults, random access, and safe patterns in C, Zig,...

Two common ways to read a file on Unix-like systems:

  • Streaming with read()/write()
  • Mapping the file into memory with mmap()

Both ultimately go through the page cache, but the control surface and failure modes differ.

1) How read() typically works

  • You ask for N bytes.
  • Kernel copies data from the page cache into your buffer.
  • If the pages aren’t cached, the kernel reads from disk into the cache first.

Pros:

  • Explicit control over buffering
  • Clear error handling (syscall returns -1)
  • Works well for sequential streaming

2) How mmap() works

  • You map the file into your virtual address space.
  • Accessing bytes triggers page faults if pages aren’t present.
  • Reads become memory loads.

Pros:

  • Great for random access
  • Avoids an extra copy into your buffer (you read from the cache directly)
  • Simplifies parsing formats that want pointer-based access

Cons:

  • Errors can surface as SIGBUS/SIGSEGV instead of a returned error
  • You must be careful with file truncation while mapped
  • Page fault behavior can be spiky

3) C: map a file and sum bytes

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

int main(int argc, char **argv) {
    if (argc != 2) {
        fprintf(stderr, "usage: %s <file>\n", argv[0]);
        return 2;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
        fprintf(stderr, "open: %s\n", strerror(errno));
        return 1;
    }

    struct stat st;
    if (fstat(fd, &st) != 0) {
        fprintf(stderr, "fstat: %s\n", strerror(errno));
        close(fd);
        return 1;
    }

    size_t n = (size_t)st.st_size;
    if (n == 0) {
        close(fd);
        return 0;
    }

    void *p = mmap(NULL, n, PROT_READ, MAP_PRIVATE, fd, 0);
    if (p == MAP_FAILED) {
        fprintf(stderr, "mmap: %s\n", strerror(errno));
        close(fd);
        return 1;
    }

    const unsigned char *b = (const unsigned char *)p;
    uint64_t sum = 0;
    for (size_t i = 0; i < n; i++) sum += b[i];

    munmap(p, n);
    close(fd);

    printf("sum=%llu\n", (unsigned long long)sum);
    return 0;
}

4) Zig: mmap and slice the bytes

const std = @import("std");
const os = std.os;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const a = gpa.allocator();

    var args = try std.process.argsAlloc(a);
    defer std.process.argsFree(a, args);
    if (args.len != 2) return error.InvalidArgs;

    const path = args[1];
    var f = try std.fs.cwd().openFile(path, .{});
    defer f.close();

    const st = try f.stat();
    if (st.size == 0) return;

    const p = try os.mmap(
        null,
        @as(usize, @intCast(st.size)),
        os.PROT.READ,
        os.MAP.PRIVATE,
        f.handle,
        0,
    );
    defer os.munmap(p);

    const bytes = @as([*]const u8, @ptrCast(p))[0..@as(usize, @intCast(st.size))];
    var sum: u64 = 0;
    for (bytes) |v| sum += v;

    std.debug.print("sum={}\n", .{sum});
}

5) Rust: mmap with libc

use std::fs::File;
use std::io;
use std::os::unix::io::AsRawFd;

fn main() -> io::Result<()> {
    let path = std::env::args().nth(1).expect("path");
    let f = File::open(path)?;
    let n = f.metadata()?.len() as usize;
    if n == 0 { return Ok(()); }

    unsafe {
        let p = libc::mmap(
            std::ptr::null_mut(),
            n,
            libc::PROT_READ,
            libc::MAP_PRIVATE,
            f.as_raw_fd(),
            0,
        );
        if p == libc::MAP_FAILED {
            return Err(io::Error::last_os_error());
        }

        let bytes = std::slice::from_raw_parts(p as *const u8, n);
        let mut sum: u64 = 0;
        for &b in bytes { sum += b as u64; }

        libc::munmap(p, n);
        println!("sum={sum}");
    }

    Ok(())
}

6) Practical decision guide

Use read() when:

  • you’re streaming sequentially
  • you want explicit error handling and predictable control flow
  • you want to avoid SIGBUS/SIGSEGV failure modes

Use mmap() when:

  • you need random access / binary parsing
  • you want many independent readers to share the same cache pages
  • you can tolerate page fault behavior and you understand mapping safety

References