The Page Cache and Writeback: Why Disk I/O

This article covers The Page Cache and Writeback: Why Disk I/O Doesn't Behave Like You Think. Understand Linux page cache, buffered I/O, dirty pages, writeback, and how fsync interacts with durability. Includes experiments and code in C,...

When you call write(), your data usually goes into the page cache, not directly to disk.

That design makes the system faster, but it creates confusion about:

why a program can "write" gigabytes quickly
why data disappears after power loss
why fsync can stall for a long time

1) Page cache in one paragraph

The page cache is memory the kernel uses to cache file contents.

Reads: often served from RAM.
Writes: often update cached pages (mark them dirty).

Later, a background flusher writes dirty pages to disk (writeback).

2) Dirty pages and writeback

Dirty pages accumulate until:

background writeback triggers
memory pressure triggers
the application calls fsync/fdatasync

Writeback can become bursty:

large bursts of dirty pages
sudden disk activity
latency spikes

3) C: measure buffered writes vs fsync

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

static unsigned long long now_ns(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return (unsigned long long)ts.tv_sec * 1000000000ull + (unsigned long long)ts.tv_nsec;
}

int main(void) {
    int fd = open("/tmp/pagecache.bin", O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0644);
    if (fd < 0) {
        fprintf(stderr, "open: %s\n", strerror(errno));
        return 1;
    }

    static unsigned char buf[1024 * 1024];
    memset(buf, 0xAB, sizeof(buf));

    unsigned long long t0 = now_ns();
    for (int i = 0; i < 256; i++) {
        if (write(fd, buf, sizeof(buf)) != (ssize_t)sizeof(buf)) {
            fprintf(stderr, "write: %s\n", strerror(errno));
            return 1;
        }
    }
    unsigned long long t1 = now_ns();

    unsigned long long t2 = now_ns();
    if (fsync(fd) != 0) {
        fprintf(stderr, "fsync: %s\n", strerror(errno));
        return 1;
    }
    unsigned long long t3 = now_ns();

    printf("write ns: %llu\n", t1 - t0);
    printf("fsync ns: %llu\n", t3 - t2);

    close(fd);
    return 0;
}

You’ll often observe:

writes look fast (page cache)
fsync is the slow part (waits for storage)

4) Zig: sync after buffered writes

const std = @import("std");

pub fn main() !void {
    var file = try std.fs.cwd().createFile("/tmp/pagecache.bin", .{ .truncate = true });
    defer file.close();

    var buf: [1024 * 1024]u8 = undefined;
    @memset(&buf, 0xAB);

    for (0..256) |_| {
        try file.writeAll(&buf);
    }

    try file.sync();
}

5) Rust: sync_all and durability

use std::fs::OpenOptions;
use std::io::{self, Write};

fn main() -> io::Result<()> {
    let mut f = OpenOptions::new().create(true).write(true).truncate(true).open("/tmp/pagecache.bin")?;
    let buf = vec![0xABu8; 1024 * 1024];
    for _ in 0..256 {
        f.write_all(&buf)?;
    }
    f.sync_all()?;
    Ok(())
}

6) Tuning knobs (overview)

Linux exposes writeback tuning in /proc/sys/vm/.

Examples:

dirty_ratio
dirty_background_ratio
dirty_expire_centisecs

These affect writeback behavior system-wide.

7) Observability: how to see page cache and writeback

If you want to understand what is happening, look at:

/proc/meminfo (Cached, Dirty, Writeback)
vmstat 1 (dirty pages, writeback, context switches)
iostat -x 1 (device utilization, queue depth)

Typical patterns:

Dirty grows quickly during buffered writes
Writeback increases later (background flush)
fsync often correlates with a spike in disk utilization and latency

If you see latency spikes on a server, bursty writeback is a common suspect.

8) Direct I/O vs buffered I/O

Buffered I/O uses the page cache.

Direct I/O (often via O_DIRECT) tries to bypass the page cache and issue I/O closer to the device.

Tradeoffs:

buffered I/O is usually faster for small reads and re-reads
direct I/O can reduce double-buffering for databases that manage their own cache
direct I/O complicates alignment and short I/O handling

Even with direct I/O, you still need explicit durability calls if you require crash safety.

9) Durability semantics: what `fsync` actually guarantees

fsync is expensive because it waits for storage to commit data.

Common misconceptions:

"close() implies durability" (it does not)
"write() means on disk" (usually false)

Practical guidance:

use fdatasync if you only need file data and not metadata
if you create a new file and need the name to be durable, also sync the directory

The correct durability boundary depends on your threat model (process crash vs power loss).

References

fsync(2): https://man7.org/linux/man-pages/man2/fsync.2.html
Linux admin guide: writeback: https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html
Brendan Gregg Linux performance: http://www.brendangregg.com/linuxperf.html