Futexes and Parking: How Fast Locks Sleep Without Syscalls
This article covers Futexes and Parking: How Fast Locks Sleep Without Syscalls. Futexes are the kernel primitive behind fast mutexes and condition variables on Linux. Learn the "fast path" vs "slow path", and build a tiny mutex using C,...
Linux synchronization is usually designed around a principle:
- fast path in user space (no syscalls)
- slow path in the kernel only when contended
A futex (fast userspace mutex) is the kernel primitive enabling this.
1) What a futex is
A futex is not a mutex by itself.
It’s a kernel syscall that lets threads:
- sleep waiting for a user-space integer to change
- wake sleepers when they should retry
The integer is in your address space. The kernel only gets involved when you call futex().
2) Why it’s fast
In the uncontended case, a lock/unlock can be:
- an atomic CAS
- no syscalls
Only when the CAS fails (someone else holds the lock) do you use futex to sleep.
One important nuance: futex wait is typically structured as:
- check the user-space state
- if it still indicates "locked", call
futex_wait
This check-then-sleep structure is what prevents missed wakeups.
3) Minimal futex mutex (educational)
State:
0= unlocked1= locked, no known waiters2= locked, contended
Algorithm sketch:
- lock: try CAS 0→1; if fail, set contended and futex_wait
- unlock: set to 0; if contended, futex_wake
C: futex syscall wrapper + mutex
#define _GNU_SOURCE
#include <linux/futex.h>
#include <stdatomic.h>
#include <stdint.h>
#include <sys/syscall.h>
#include <unistd.h>
static int futex_wait(atomic_int *addr, int expected) {
return (int)syscall(SYS_futex, addr, FUTEX_WAIT, expected, NULL, NULL, 0);
}
static int futex_wake(atomic_int *addr, int n) {
return (int)syscall(SYS_futex, addr, FUTEX_WAKE, n, NULL, NULL, 0);
}
typedef struct {
atomic_int state;
} futex_mutex_t;
static void fm_init(futex_mutex_t *m) {
atomic_store_explicit(&m->state, 0, memory_order_relaxed);
}
static void fm_lock(futex_mutex_t *m) {
int c = 0;
if (atomic_compare_exchange_strong_explicit(
&m->state, &c, 1,
memory_order_acquire, memory_order_relaxed)) {
return;
}
for (;;) {
int s = atomic_load_explicit(&m->state, memory_order_relaxed);
if (s == 2 || atomic_exchange_explicit(&m->state, 2, memory_order_acquire) != 0) {
futex_wait(&m->state, 2);
}
c = 0;
if (atomic_compare_exchange_strong_explicit(
&m->state, &c, 2,
memory_order_acquire, memory_order_relaxed)) {
return;
}
}
}
static void fm_unlock(futex_mutex_t *m) {
int prev = atomic_exchange_explicit(&m->state, 0, memory_order_release);
if (prev == 2) {
futex_wake(&m->state, 1);
}
}
This is educational, not production-grade.
3.1) Spurious wakeups and the retry loop
Even with futexes, you must code as if wakeups can be spurious:
- the kernel can return early
- signals can interrupt waits
- other threads can win the race before you
That is why futex-based locks always re-check state in a loop.
4) Zig: calling futex syscall
const std = @import("std");
const SYS_futex = std.os.linux.SYS.futex;
const FUTEX_WAIT: usize = 0;
const FUTEX_WAKE: usize = 1;
fn futex_wait(addr: *i32, expected: i32) void {
_ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAIT, @bitCast(@as(usize, @intCast(expected))), 0, 0, 0);
}
fn futex_wake(addr: *i32, n: i32) void {
_ = std.os.linux.syscall6(SYS_futex, @intFromPtr(addr), FUTEX_WAKE, @bitCast(@as(usize, @intCast(n))), 0, 0, 0);
}
In Zig, you’d normally use std.Thread.Mutex, but understanding the syscall clarifies what it’s doing.
5) Rust: futex via libc syscall
Rust codebases typically rely on parking_lot or std primitives, but futex is accessible.
use std::sync::atomic::{AtomicI32, Ordering};
fn futex_wait(addr: &AtomicI32, expected: i32) {
unsafe {
libc::syscall(
libc::SYS_futex,
addr as *const _ as *const i32,
libc::FUTEX_WAIT,
expected,
std::ptr::null::<libc::timespec>(),
);
}
}
fn futex_wake(addr: &AtomicI32, n: i32) {
unsafe {
libc::syscall(
libc::SYS_futex,
addr as *const _ as *const i32,
libc::FUTEX_WAKE,
n,
);
}
}
(Production implementations include careful error handling and robust state machines.)
6.1) Fairness, convoying, and priority inversion
Real locks care about latency under contention.
Problems you can encounter:
- convoying: many threads wake and fight, causing cache thrash
- priority inversion: a low-priority thread holds the lock while high-priority threads wait
- thundering herd: waking too many waiters wastes CPU
This is why production locks implement strategies like:
- wake one waiter (not all)
- handoff/fairness queues
- adaptive spinning before sleeping
6) Where you see futex in practice
pthread_mutexon Linux- Rust
parking_lot - many custom lock implementations
If you ever strace a contended program and see futex(...), that’s normal.
Related posts:
- Threads and Synchronization: Mutexes, Condition Variables, and Semaphores
- Atomics and Memory Ordering
References
futex(2): https://man7.org/linux/man-pages/man2/futex.2.html- Ulrich Drepper futex paper (historical background): https://www.akkadia.org/drepper/futex.pdf
- Rust parking_lot (uses futex on Linux): https://crates.io/crates/parking_lot