The Page Cache and Writeback: Why Disk I/O Doesn't Behave Like You Think
This article covers The Page Cache and Writeback: Why Disk I/O Doesn't Behave Like You Think. Understand Linux page cache, buffered I/O, dirty pages, writeback, and how fsync interacts with durability. Includes experiments and code in C,...
When you call write(), your data usually goes into the page cache, not directly to disk.
That design makes the system faster, but it creates confusion about:
- why a program can "write" gigabytes quickly
- why data disappears after power loss
- why
fsynccan stall for a long time
1) Page cache in one paragraph
The page cache is memory the kernel uses to cache file contents.
- Reads: often served from RAM.
- Writes: often update cached pages (mark them dirty).
Later, a background flusher writes dirty pages to disk (writeback).
2) Dirty pages and writeback
Dirty pages accumulate until:
- background writeback triggers
- memory pressure triggers
- the application calls
fsync/fdatasync
Writeback can become bursty:
- large bursts of dirty pages
- sudden disk activity
- latency spikes
3) C: measure buffered writes vs fsync
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
static unsigned long long now_ns(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (unsigned long long)ts.tv_sec * 1000000000ull + (unsigned long long)ts.tv_nsec;
}
int main(void) {
int fd = open("/tmp/pagecache.bin", O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0644);
if (fd < 0) {
fprintf(stderr, "open: %s\n", strerror(errno));
return 1;
}
static unsigned char buf[1024 * 1024];
memset(buf, 0xAB, sizeof(buf));
unsigned long long t0 = now_ns();
for (int i = 0; i < 256; i++) {
if (write(fd, buf, sizeof(buf)) != (ssize_t)sizeof(buf)) {
fprintf(stderr, "write: %s\n", strerror(errno));
return 1;
}
}
unsigned long long t1 = now_ns();
unsigned long long t2 = now_ns();
if (fsync(fd) != 0) {
fprintf(stderr, "fsync: %s\n", strerror(errno));
return 1;
}
unsigned long long t3 = now_ns();
printf("write ns: %llu\n", t1 - t0);
printf("fsync ns: %llu\n", t3 - t2);
close(fd);
return 0;
}
You’ll often observe:
- writes look fast (page cache)
- fsync is the slow part (waits for storage)
4) Zig: sync after buffered writes
const std = @import("std");
pub fn main() !void {
var file = try std.fs.cwd().createFile("/tmp/pagecache.bin", .{ .truncate = true });
defer file.close();
var buf: [1024 * 1024]u8 = undefined;
@memset(&buf, 0xAB);
for (0..256) |_| {
try file.writeAll(&buf);
}
try file.sync();
}
5) Rust: sync_all and durability
use std::fs::OpenOptions;
use std::io::{self, Write};
fn main() -> io::Result<()> {
let mut f = OpenOptions::new().create(true).write(true).truncate(true).open("/tmp/pagecache.bin")?;
let buf = vec![0xABu8; 1024 * 1024];
for _ in 0..256 {
f.write_all(&buf)?;
}
f.sync_all()?;
Ok(())
}
6) Tuning knobs (overview)
Linux exposes writeback tuning in /proc/sys/vm/.
Examples:
dirty_ratiodirty_background_ratiodirty_expire_centisecs
These affect writeback behavior system-wide.
7) Observability: how to see page cache and writeback
If you want to understand what is happening, look at:
/proc/meminfo(Cached, Dirty, Writeback)vmstat 1(dirty pages, writeback, context switches)iostat -x 1(device utilization, queue depth)
Typical patterns:
- Dirty grows quickly during buffered writes
- Writeback increases later (background flush)
fsyncoften correlates with a spike in disk utilization and latency
If you see latency spikes on a server, bursty writeback is a common suspect.
8) Direct I/O vs buffered I/O
Buffered I/O uses the page cache.
Direct I/O (often via O_DIRECT) tries to bypass the page cache and issue I/O closer to the device.
Tradeoffs:
- buffered I/O is usually faster for small reads and re-reads
- direct I/O can reduce double-buffering for databases that manage their own cache
- direct I/O complicates alignment and short I/O handling
Even with direct I/O, you still need explicit durability calls if you require crash safety.
9) Durability semantics: what fsync actually guarantees
fsync is expensive because it waits for storage to commit data.
Common misconceptions:
- "close() implies durability" (it does not)
- "write() means on disk" (usually false)
Practical guidance:
- use
fdatasyncif you only need file data and not metadata - if you create a new file and need the name to be durable, also sync the directory
The correct durability boundary depends on your threat model (process crash vs power loss).
References
fsync(2): https://man7.org/linux/man-pages/man2/fsync.2.html- Linux admin guide: writeback: https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html
- Brendan Gregg Linux performance: http://www.brendangregg.com/linuxperf.html