Lock-Free SPSC Ring Buffer: A High-Throughput Queue Between Two Threads
This article covers Lock-Free SPSC Ring Buffer: A High-Throughput Queue Between Two Threads. Build a correct single-producer single-consumer ring buffer using atomics, and understand why SPSC is simpler than MPMC. Includes C, Zig, and Ru...
A single-producer single-consumer (SPSC) queue is one of the most useful concurrency building blocks:
- a producer thread generates work
- a consumer thread processes work
- both operate concurrently
SPSC is special because there is exactly:
- one writer for
tail - one writer for
head
That ownership eliminates many races and lets us build a fast queue without locks.
1) The mental model
We store items in a fixed-size array (capacity is typically a power of two).
headpoints to the next element to readtailpoints to the next slot to write
The queue is empty when head == tail.
To distinguish full vs empty, a common trick is to keep one slot empty:
- full when advancing
tailwould make it equal tohead
2) Correctness: what ordering do we need?
Producer steps:
- write item into
buf[tail] - publish new
tail
Consumer steps:
- read
tail - read item from
buf[head] - publish new
head
We need:
- producer’s item write becomes visible before consumer sees updated
tail - consumer’s head update becomes visible before producer checks for full
This is exactly what release (publish) and acquire (consume) ordering provide.
3) C (C11 atomics) implementation
#include <stdatomic.h>
#include <stddef.h>
#include <stdint.h>
#define CAP 1024u
typedef struct {
uint64_t buf[CAP];
atomic_uint head;
atomic_uint tail;
} spsc_t;
static inline void spsc_init(spsc_t *q) {
atomic_store_explicit(&q->head, 0u, memory_order_relaxed);
atomic_store_explicit(&q->tail, 0u, memory_order_relaxed);
}
static inline int spsc_push(spsc_t *q, uint64_t v) {
unsigned tail = atomic_load_explicit(&q->tail, memory_order_relaxed);
unsigned head = atomic_load_explicit(&q->head, memory_order_acquire);
unsigned next = (tail + 1u) % CAP;
if (next == head) return 0; // full
q->buf[tail] = v;
// Publish the element.
atomic_store_explicit(&q->tail, next, memory_order_release);
return 1;
}
static inline int spsc_pop(spsc_t *q, uint64_t *out) {
unsigned head = atomic_load_explicit(&q->head, memory_order_relaxed);
unsigned tail = atomic_load_explicit(&q->tail, memory_order_acquire);
if (head == tail) return 0; // empty
*out = q->buf[head];
unsigned next = (head + 1u) % CAP;
atomic_store_explicit(&q->head, next, memory_order_release);
return 1;
}
Why relaxed sometimes works:
- each index has a single writer, so we don’t need RMW operations
- but we still need acquire/release to connect data visibility to index updates
4) Zig implementation
const std = @import("std");
pub fn SpscQueue(comptime T: type, comptime CAP: usize) type {
return struct {
buf: [CAP]T = undefined,
head: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
tail: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
pub fn push(self: *@This(), v: T) bool {
const tail = self.tail.load(.monotonic);
const head = self.head.load(.acquire);
const next = (tail + 1) % CAP;
if (next == head) return false;
self.buf[tail] = v;
self.tail.store(next, .release);
return true;
}
pub fn pop(self: *@This()) ?T {
const head = self.head.load(.monotonic);
const tail = self.tail.load(.acquire);
if (head == tail) return null;
const v = self.buf[head];
const next = (head + 1) % CAP;
self.head.store(next, .release);
return v;
}
};
}
5) Rust implementation
use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering};
pub struct Spsc<T, const CAP: usize> {
buf: [UnsafeCell<Option<T>>; CAP],
head: AtomicUsize,
tail: AtomicUsize,
}
unsafe impl<T: Send, const CAP: usize> Send for Spsc<T, CAP> {}
unsafe impl<T: Send, const CAP: usize> Sync for Spsc<T, CAP> {}
impl<T, const CAP: usize> Spsc<T, CAP> {
pub fn new() -> Self {
let buf: [UnsafeCell<Option<T>>; CAP] = std::array::from_fn(|_| UnsafeCell::new(None));
Self { buf, head: AtomicUsize::new(0), tail: AtomicUsize::new(0) }
}
pub fn push(&self, v: T) -> bool {
let tail = self.tail.load(Ordering::Relaxed);
let head = self.head.load(Ordering::Acquire);
let next = (tail + 1) % CAP;
if next == head { return false; }
unsafe { *self.buf[tail].get() = Some(v); }
self.tail.store(next, Ordering::Release);
true
}
pub fn pop(&self) -> Option<T> {
let head = self.head.load(Ordering::Relaxed);
let tail = self.tail.load(Ordering::Acquire);
if head == tail { return None; }
let v = unsafe { (*self.buf[head].get()).take() };
let next = (head + 1) % CAP;
self.head.store(next, Ordering::Release);
v
}
}
Rust note:
- This uses
UnsafeCellto allow interior mutation. - The SPSC guarantees (one producer thread calls
push, one consumer callspop) are required for safety.
6) Performance notes
- Prefer power-of-two capacities and use
idx & (CAP-1). - Avoid false sharing: place
headandtailon separate cache lines (padding) when you optimize further. - The queue is not MPMC; don’t share it across multiple producers/consumers.
References
- C11 atomics: https://en.cppreference.com/w/c/atomic
- Rust atomics: https://doc.rust-lang.org/std/sync/atomic/
- Zig atomics: https://ziglang.org/documentation/master/std/#std.atomic
- Linux Kernel memory barriers (background): https://www.kernel.org/doc/Documentation/memory-barriers.txt