Futexes and Parking: How Fast Locks Sleep Witho

This article covers Futexes and Parking: How Fast Locks Sleep Without Syscalls. Futexes are the kernel primitive behind fast mutexes and condition variables on Linux. Learn the "fast path" vs "slow path", and build a tiny mutex using C,...

Linux synchronization is usually designed around a principle:

fast path in user space (no syscalls)
slow path in the kernel only when contended

A futex (fast userspace mutex) is the kernel primitive enabling this.

1) What a futex is

A futex is not a mutex by itself.

It’s a kernel syscall that lets threads:

sleep waiting for a user-space integer to change
wake sleepers when they should retry

The integer is in your address space. The kernel only gets involved when you call futex().

2) Why it’s fast

In the uncontended case, a lock/unlock can be:

an atomic CAS
no syscalls

Only when the CAS fails (someone else holds the lock) do you use futex to sleep.

One important nuance: futex wait is typically structured as:

check the user-space state
if it still indicates "locked", call futex_wait

This check-then-sleep structure is what prevents missed wakeups.

3) Minimal futex mutex (educational)

State:

0 = unlocked
1 = locked, no known waiters
2 = locked, contended

Algorithm sketch:

lock: try CAS 0→1; if fail, set contended and futex_wait
unlock: set to 0; if contended, futex_wake

C: futex syscall wrapper + mutex

#define _GNU_SOURCE
#include <linux/futex.h>
#include <stdatomic.h>
#include <stdint.h>
#include <sys/syscall.h>
#include <unistd.h>

static int futex_wait(atomic_int *addr, int expected) {
    return (int)syscall(SYS_futex, addr, FUTEX_WAIT, expected, NULL, NULL, 0);
}

static int futex_wake(atomic_int *addr, int n) {
    return (int)syscall(SYS_futex, addr, FUTEX_WAKE, n, NULL, NULL, 0);
}

typedef struct {
    atomic_int state;
} futex_mutex_t;

static void fm_init(futex_mutex_t *m) {
    atomic_store_explicit(&m->state, 0, memory_order_relaxed);
}

static void fm_lock(futex_mutex_t *m) {
    int c = 0;
    if (atomic_compare_exchange_strong_explicit(
            &m->state, &c, 1,
            memory_order_acquire, memory_order_relaxed)) {
        return;
    }

    for (;;) {
        int s = atomic_load_explicit(&m->state, memory_order_relaxed);
        if (s == 2 || atomic_exchange_explicit(&m->state, 2, memory_order_acquire) != 0) {
            futex_wait(&m->state, 2);
        }

        c = 0;
        if (atomic_compare_exchange_strong_explicit(
                &m->state, &c, 2,
                memory_order_acquire, memory_order_relaxed)) {
            return;
        }
    }
}

static void fm_unlock(futex_mutex_t *m) {
    int prev = atomic_exchange_explicit(&m->state, 0, memory_order_release);
    if (prev == 2) {
        futex_wake(&m->state, 1);
    }
}

This is educational, not production-grade.

3.1) Spurious wakeups and the retry loop

Even with futexes, you must code as if wakeups can be spurious:

the kernel can return early
signals can interrupt waits
other threads can win the race before you

That is why futex-based locks always re-check state in a loop.

4) Zig: calling futex syscall

const std = @import("std");

const SYS_futex = std.os.linux.SYS.futex;
const FUTEX_WAIT: usize = 0;
const FUTEX_WAKE: usize = 1;

fn futex_wait(addr: *i32, expected: i32) void {
    _ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAIT, @bitCast(@as(usize, @intCast(expected))), 0, 0, 0);
}

fn futex_wake(addr: *i32, n: i32) void {
    _ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAKE, @bitCast(@as(usize, @intCast(n))), 0, 0, 0);
}

In Zig, you’d normally use std.Thread.Mutex, but understanding the syscall clarifies what it’s doing.

5) Rust: futex via libc syscall

Rust codebases typically rely on parking_lot or std primitives, but futex is accessible.

use std::sync::atomic::{AtomicI32, Ordering};

fn futex_wait(addr: &AtomicI32, expected: i32) {
    unsafe {
        libc::syscall(
            libc::SYS_futex,
            addr as *const _ as *const i32,
            libc::FUTEX_WAIT,
            expected,
            std::ptr::null::<libc::timespec>(),
        );
    }
}

fn futex_wake(addr: &AtomicI32, n: i32) {
    unsafe {
        libc::syscall(
            libc::SYS_futex,
            addr as *const _ as *const i32,
            libc::FUTEX_WAKE,
            n,
        );
    }
}

(Production implementations include careful error handling and robust state machines.)

6.1) Fairness, convoying, and priority inversion

Real locks care about latency under contention.

Problems you can encounter:

convoying: many threads wake and fight, causing cache thrash
priority inversion: a low-priority thread holds the lock while high-priority threads wait
thundering herd: waking too many waiters wastes CPU

This is why production locks implement strategies like:

wake one waiter (not all)
handoff/fairness queues
adaptive spinning before sleeping

6) Where you see futex in practice

pthread_mutex on Linux
Rust parking_lot
many custom lock implementations

If you ever strace a contended program and see futex(...), that’s normal.

futex(2): https://man7.org/linux/man-pages/man2/futex.2.html
Ulrich Drepper futex paper (historical background): https://www.akkadia.org/drepper/futex.pdf
Rust parking_lot (uses futex on Linux): https://crates.io/crates/parking_lot