File I/O Fundamentals: read/write, Buffers, and the Cost of Syscalls

This article covers File I/O Fundamentals: read/write, Buffers, and the Cost of Syscalls. Learn the practical mechanics of file I/O: file descriptors, syscalls, buffering, partial reads/writes, and robust patterns with C, Zig, and Rust.

File I/O looks simple: call read() and write(). In practice, robust and fast file I/O requires understanding:

  • file descriptors and offsets
  • partial reads/writes
  • buffering (user-space and kernel-space)
  • syscall overhead and batching

1) File descriptors and offsets

On POSIX systems, opening a file gives you a file descriptor (fd): a small integer indexing a per-process table.

A descriptor has state, including a current offset (where the next read()/write() happens) unless you use pread()/pwrite().

2) Partial reads/writes are normal

  • read(fd, buf, n) may return fewer than n bytes.
  • write(fd, buf, n) may write fewer than n bytes.

This is common for:

  • pipes and sockets
  • non-blocking I/O
  • signals interrupting syscalls

Even for regular files, you should code defensively.

3) A robust “copy file” loop

C: copy using read/write with retry

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

static int write_all(int fd, const unsigned char *buf, size_t n) {
    size_t off = 0;
    while (off < n) {
        ssize_t w = write(fd, buf + off, n - off);
        if (w < 0) {
            if (errno == EINTR) continue;
            return -1;
        }
        off += (size_t)w;
    }
    return 0;
}

int main(int argc, char **argv) {
    if (argc != 3) {
        fprintf(stderr, "usage: %s <src> <dst>\n", argv[0]);
        return 2;
    }

    int in = open(argv[1], O_RDONLY);
    if (in < 0) {
        fprintf(stderr, "open src: %s\n", strerror(errno));
        return 1;
    }

    int out = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (out < 0) {
        fprintf(stderr, "open dst: %s\n", strerror(errno));
        close(in);
        return 1;
    }

    unsigned char buf[64 * 1024];
    for (;;) {
        ssize_t r = read(in, buf, sizeof(buf));
        if (r == 0) break; // EOF
        if (r < 0) {
            if (errno == EINTR) continue;
            fprintf(stderr, "read: %s\n", strerror(errno));
            break;
        }

        if (write_all(out, buf, (size_t)r) != 0) {
            fprintf(stderr, "write: %s\n", strerror(errno));
            break;
        }
    }

    close(out);
    close(in);
    return 0;
}

Key points:

  • Buffer size is a trade-off: too small → many syscalls; too big → cache pressure.
  • EINTR handling matters.

Zig: buffered copying

Zig’s standard library has buffered I/O abstractions that reduce syscalls.

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const a = gpa.allocator();

    var args = try std.process.argsAlloc(a);
    defer std.process.argsFree(a, args);
    if (args.len != 3) return error.InvalidArgs;

    const src_path = args[1];
    const dst_path = args[2];

    const cwd = std.fs.cwd();
    var src = try cwd.openFile(src_path, .{});
    defer src.close();

    var dst = try cwd.createFile(dst_path, .{ .truncate = true });
    defer dst.close();

    var br = std.io.bufferedReader(src.reader());
    var bw = std.io.bufferedWriter(dst.writer());

    try std.io.copyAll(br.reader(), bw.writer());
    try bw.flush();
}

Rust: copy with std::io

use std::env;
use std::fs::File;
use std::io::{self, BufReader, BufWriter, Read, Write};

fn main() -> io::Result<()> {
    let args: Vec<String> = env::args().collect();
    if args.len() != 3 {
        eprintln!("usage: {} <src> <dst>", args[0]);
        std::process::exit(2);
    }

    let src = File::open(&args[1])?;
    let dst = File::create(&args[2])?;

    let mut r = BufReader::new(src);
    let mut w = BufWriter::new(dst);

    let mut buf = [0u8; 64 * 1024];
    loop {
        let n = r.read(&mut buf)?;
        if n == 0 { break; }
        w.write_all(&buf[..n])?;
    }
    w.flush()?;

    Ok(())
}

4) When to use pread/pwrite

pread/pwrite avoid shared file offsets and are friendly for concurrent access patterns.

  • Multiple threads can read different parts of the same file without locking around a shared offset.

5) Advanced options (preview)

  • mmap: map file into memory (great for random reads, but page-fault patterns matter).
  • sendfile/copy_file_range: kernel-assisted copies.
  • O_DIRECT: bypass page cache (specialized, easy to misuse).

References