mmap vs read/write: Choosing the Right File Access Strategy
This article covers mmap vs read/write: Choosing the Right File Access Strategy. When should you use mmap, and when are classic read/write loops better? Learn page cache behavior, page faults, random access, and safe patterns in C, Zig,...
Two common ways to read a file on Unix-like systems:
- Streaming with
read()/write() - Mapping the file into memory with
mmap()
Both ultimately go through the page cache, but the control surface and failure modes differ.
1) How read() typically works
- You ask for N bytes.
- Kernel copies data from the page cache into your buffer.
- If the pages aren’t cached, the kernel reads from disk into the cache first.
Pros:
- Explicit control over buffering
- Clear error handling (syscall returns -1)
- Works well for sequential streaming
2) How mmap() works
- You map the file into your virtual address space.
- Accessing bytes triggers page faults if pages aren’t present.
- Reads become memory loads.
Pros:
- Great for random access
- Avoids an extra copy into your buffer (you read from the cache directly)
- Simplifies parsing formats that want pointer-based access
Cons:
- Errors can surface as SIGBUS/SIGSEGV instead of a returned error
- You must be careful with file truncation while mapped
- Page fault behavior can be spiky
3) C: map a file and sum bytes
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return 2;
}
int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf(stderr, "open: %s\n", strerror(errno));
return 1;
}
struct stat st;
if (fstat(fd, &st) != 0) {
fprintf(stderr, "fstat: %s\n", strerror(errno));
close(fd);
return 1;
}
size_t n = (size_t)st.st_size;
if (n == 0) {
close(fd);
return 0;
}
void *p = mmap(NULL, n, PROT_READ, MAP_PRIVATE, fd, 0);
if (p == MAP_FAILED) {
fprintf(stderr, "mmap: %s\n", strerror(errno));
close(fd);
return 1;
}
const unsigned char *b = (const unsigned char *)p;
uint64_t sum = 0;
for (size_t i = 0; i < n; i++) sum += b[i];
munmap(p, n);
close(fd);
printf("sum=%llu\n", (unsigned long long)sum);
return 0;
}
4) Zig: mmap and slice the bytes
const std = @import("std");
const os = std.os;
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const a = gpa.allocator();
var args = try std.process.argsAlloc(a);
defer std.process.argsFree(a, args);
if (args.len != 2) return error.InvalidArgs;
const path = args[1];
var f = try std.fs.cwd().openFile(path, .{});
defer f.close();
const st = try f.stat();
if (st.size == 0) return;
const p = try os.mmap(
null,
@as(usize, @intCast(st.size)),
os.PROT.READ,
os.MAP.PRIVATE,
f.handle,
0,
);
defer os.munmap(p);
const bytes = @as([*]const u8, @ptrCast(p))[0..@as(usize, @intCast(st.size))];
var sum: u64 = 0;
for (bytes) |v| sum += v;
std.debug.print("sum={}\n", .{sum});
}
5) Rust: mmap with libc
use std::fs::File;
use std::io;
use std::os::unix::io::AsRawFd;
fn main() -> io::Result<()> {
let path = std::env::args().nth(1).expect("path");
let f = File::open(path)?;
let n = f.metadata()?.len() as usize;
if n == 0 { return Ok(()); }
unsafe {
let p = libc::mmap(
std::ptr::null_mut(),
n,
libc::PROT_READ,
libc::MAP_PRIVATE,
f.as_raw_fd(),
0,
);
if p == libc::MAP_FAILED {
return Err(io::Error::last_os_error());
}
let bytes = std::slice::from_raw_parts(p as *const u8, n);
let mut sum: u64 = 0;
for &b in bytes { sum += b as u64; }
libc::munmap(p, n);
println!("sum={sum}");
}
Ok(())
}
6) Practical decision guide
Use read() when:
- you’re streaming sequentially
- you want explicit error handling and predictable control flow
- you want to avoid SIGBUS/SIGSEGV failure modes
Use mmap() when:
- you need random access / binary parsing
- you want many independent readers to share the same cache pages
- you can tolerate page fault behavior and you understand mapping safety
References
mmap(2): https://man7.org/linux/man-pages/man2/mmap.2.htmlmincore(2)(page residency): https://man7.org/linux/man-pages/man2/mincore.2.htmlmadvise(2): https://man7.org/linux/man-pages/man2/madvise.2.html- Brendan Gregg on caches/page cache: http://www.brendangregg.com/linuxperf.html