Atomics and Memory Ordering: Building Correct Lock-Free Code

This article covers Atomics and Memory Ordering: Building Correct Lock-Free Code. A practical guide to atomic operations and memory ordering (relaxed/acquire/release/seq_cst). Learn what the CPU and compiler are allowed to do, with C, Zi...

Lock-free programming isn’t “no locks = faster”. It’s a different correctness model.

If you don’t understand memory ordering, it’s very easy to write code that passes tests on your machine and fails under load, on a different CPU, or with a different compiler.

This post aims to make memory ordering concrete.

1) The problem: reordering

Modern compilers and CPUs reorder operations for performance.

Example intent:

  • Thread A: write data, then publish a flag.
  • Thread B: wait for flag, then read data.

Without ordering, Thread B might see the flag but still read stale data.

The core idea:

  • Atomics prevent data races on the atomic variable.
  • Ordering determines visibility relationships between different memory locations.

2) Vocabulary

  • Atomic operation: an indivisible load/store/RMW.
  • RMW: read-modify-write (e.g., fetch_add, compare_exchange).
  • Happens-before: a guaranteed ordering/visibility relation.
  • Acquire: prevents later operations from moving before the acquire.
  • Release: prevents earlier operations from moving after the release.

3) The “publish data” pattern

This is the canonical acquire/release example.

C (C11 atomics)

#include <stdatomic.h>
#include <stdint.h>

typedef struct {
    uint64_t value;
} Message;

static Message msg;
static atomic_int ready = 0;

void producer(void) {
    msg.value = 0xdeadbeef;

    // Publish: all writes before this become visible to an acquire load.
    atomic_store_explicit(&ready, 1, memory_order_release);
}

uint64_t consumer(void) {
    // Spin until we observe ready == 1.
    while (atomic_load_explicit(&ready, memory_order_acquire) == 0) {
        // spin
    }

    // After acquire, reading msg.value is guaranteed to observe producer's write.
    return msg.value;
}

Important note:

  • msg.value is not atomic.
  • This is still data-race-free because the release/acquire pair establishes ordering and the consumer reads msg only after synchronization.

Zig

const std = @import("std");

const Message = struct { value: u64 };
var msg: Message = .{ .value = 0 };
var ready: std.atomic.Value(u32) = std.atomic.Value(u32).init(0);

fn producer() void {
    msg.value = 0xdeadbeef;
    ready.store(1, .release);
}

fn consumer() u64 {
    while (ready.load(.acquire) == 0) {}
    return msg.value;
}

Rust

use std::sync::atomic::{AtomicU32, Ordering};

static READY: AtomicU32 = AtomicU32::new(0);
static mut MSG: u64 = 0;

fn producer() {
    unsafe { MSG = 0xdeadbeef; }
    READY.store(1, Ordering::Release);
}

fn consumer() -> u64 {
    while READY.load(Ordering::Acquire) == 0 {}
    unsafe { MSG }
}

Rust note:

  • static mut is unsafe and used only to keep the example minimal.
  • In real Rust, you’d typically wrap shared data in something safe (or use UnsafeCell with strict discipline).

4) Choosing an ordering

Relaxed

Use when you only need atomicity for the variable itself (counters/statistics), not ordering.

Example: increment a counter.

  • C: memory_order_relaxed
  • Zig: .monotonic
  • Rust: Ordering::Relaxed

Acquire/Release

Use for “publish/consume” patterns.

  • Release store when publishing.
  • Acquire load when consuming.

SeqCst

Strongest, simplest to reason about, often slower.

Use when:

  • you need a single global order across atomics
  • you’re debugging correctness and want to rule ordering out first

5) compare_exchange and the ABA footgun

Many lock-free algorithms use CAS (compare-and-swap):

  • “If the value is what I expect, replace it with new value.”

ABA problem (high-level):

  • Thread sees A
  • Another thread changes A → B → A
  • First thread’s CAS sees A again and incorrectly assumes nothing changed

Mitigations:

  • tagged pointers / version counters
  • hazard pointers / epoch reclamation
  • algorithm-specific constraints

6) One practical lock-free structure: once-init

A common use case: initialize something once without a mutex.

Pattern:

  • 0 = uninitialized
  • 1 = initializing
  • 2 = initialized

C

#include <stdatomic.h>

static atomic_int state = 0;

void init_once(void (*init_fn)(void)) {
    int expected = 0;
    if (atomic_compare_exchange_strong_explicit(
            &state, &expected, 1,
            memory_order_acq_rel,
            memory_order_acquire)) {
        init_fn();
        atomic_store_explicit(&state, 2, memory_order_release);
        return;
    }

    while (atomic_load_explicit(&state, memory_order_acquire) != 2) {
        // spin
    }
}

7) How to debug ordering bugs

  • Prefer ThreadSanitizer (TSan) for data races (it can’t prove ordering is correct, but it helps).
  • Build mental models around “publish/consume” and “ownership transfer”.
  • Reduce to small litmus tests.

References