Futexes and Parking: How Fast Locks Sleep Without Syscalls

This article covers Futexes and Parking: How Fast Locks Sleep Without Syscalls. Futexes are the kernel primitive behind fast mutexes and condition variables on Linux. Learn the "fast path" vs "slow path", and build a tiny mutex using C,...

Linux synchronization is usually designed around a principle:

  • fast path in user space (no syscalls)
  • slow path in the kernel only when contended

A futex (fast userspace mutex) is the kernel primitive enabling this.

1) What a futex is

A futex is not a mutex by itself.

It’s a kernel syscall that lets threads:

  • sleep waiting for a user-space integer to change
  • wake sleepers when they should retry

The integer is in your address space. The kernel only gets involved when you call futex().

2) Why it’s fast

In the uncontended case, a lock/unlock can be:

  • an atomic CAS
  • no syscalls

Only when the CAS fails (someone else holds the lock) do you use futex to sleep.

One important nuance: futex wait is typically structured as:

  • check the user-space state
  • if it still indicates "locked", call futex_wait

This check-then-sleep structure is what prevents missed wakeups.

3) Minimal futex mutex (educational)

State:

  • 0 = unlocked
  • 1 = locked, no known waiters
  • 2 = locked, contended

Algorithm sketch:

  • lock: try CAS 0→1; if fail, set contended and futex_wait
  • unlock: set to 0; if contended, futex_wake

C: futex syscall wrapper + mutex

#define _GNU_SOURCE
#include <linux/futex.h>
#include <stdatomic.h>
#include <stdint.h>
#include <sys/syscall.h>
#include <unistd.h>

static int futex_wait(atomic_int *addr, int expected) {
    return (int)syscall(SYS_futex, addr, FUTEX_WAIT, expected, NULL, NULL, 0);
}

static int futex_wake(atomic_int *addr, int n) {
    return (int)syscall(SYS_futex, addr, FUTEX_WAKE, n, NULL, NULL, 0);
}

typedef struct {
    atomic_int state;
} futex_mutex_t;

static void fm_init(futex_mutex_t *m) {
    atomic_store_explicit(&m->state, 0, memory_order_relaxed);
}

static void fm_lock(futex_mutex_t *m) {
    int c = 0;
    if (atomic_compare_exchange_strong_explicit(
            &m->state, &c, 1,
            memory_order_acquire, memory_order_relaxed)) {
        return;
    }

    for (;;) {
        int s = atomic_load_explicit(&m->state, memory_order_relaxed);
        if (s == 2 || atomic_exchange_explicit(&m->state, 2, memory_order_acquire) != 0) {
            futex_wait(&m->state, 2);
        }

        c = 0;
        if (atomic_compare_exchange_strong_explicit(
                &m->state, &c, 2,
                memory_order_acquire, memory_order_relaxed)) {
            return;
        }
    }
}

static void fm_unlock(futex_mutex_t *m) {
    int prev = atomic_exchange_explicit(&m->state, 0, memory_order_release);
    if (prev == 2) {
        futex_wake(&m->state, 1);
    }
}

This is educational, not production-grade.

3.1) Spurious wakeups and the retry loop

Even with futexes, you must code as if wakeups can be spurious:

  • the kernel can return early
  • signals can interrupt waits
  • other threads can win the race before you

That is why futex-based locks always re-check state in a loop.

4) Zig: calling futex syscall

const std = @import("std");

const SYS_futex = std.os.linux.SYS.futex;
const FUTEX_WAIT: usize = 0;
const FUTEX_WAKE: usize = 1;

fn futex_wait(addr: *i32, expected: i32) void {
    _ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAIT, @bitCast(@as(usize, @intCast(expected))), 0, 0, 0);
}

fn futex_wake(addr: *i32, n: i32) void {
    _ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAKE, @bitCast(@as(usize, @intCast(n))), 0, 0, 0);
}

In Zig, you’d normally use std.Thread.Mutex, but understanding the syscall clarifies what it’s doing.

5) Rust: futex via libc syscall

Rust codebases typically rely on parking_lot or std primitives, but futex is accessible.

use std::sync::atomic::{AtomicI32, Ordering};

fn futex_wait(addr: &AtomicI32, expected: i32) {
    unsafe {
        libc::syscall(
            libc::SYS_futex,
            addr as *const _ as *const i32,
            libc::FUTEX_WAIT,
            expected,
            std::ptr::null::<libc::timespec>(),
        );
    }
}

fn futex_wake(addr: &AtomicI32, n: i32) {
    unsafe {
        libc::syscall(
            libc::SYS_futex,
            addr as *const _ as *const i32,
            libc::FUTEX_WAKE,
            n,
        );
    }
}

(Production implementations include careful error handling and robust state machines.)

6.1) Fairness, convoying, and priority inversion

Real locks care about latency under contention.

Problems you can encounter:

  • convoying: many threads wake and fight, causing cache thrash
  • priority inversion: a low-priority thread holds the lock while high-priority threads wait
  • thundering herd: waking too many waiters wastes CPU

This is why production locks implement strategies like:

  • wake one waiter (not all)
  • handoff/fairness queues
  • adaptive spinning before sleeping

6) Where you see futex in practice

  • pthread_mutex on Linux
  • Rust parking_lot
  • many custom lock implementations

If you ever strace a contended program and see futex(...), that’s normal.

Related posts:

References