Building a Memory Allocator from Scratch in C

This article covers Building a Memory Allocator from Scratch in C. Learn how memory allocation works by implementing your own malloc() and free() functions in C. A deep dive into heap management, fragmentation, and optimization techniques.

Memory allocation is one of those fundamental concepts in programming that we often take for granted. We call malloc() and free() without thinking about what happens under the hood. But understanding how allocators work is crucial for writing efficient, reliable software, especially in performance-critical applications.

In this post, we'll build a complete memory allocator from scratch in C, exploring the challenges and trade-offs involved in heap management.

Understanding the Memory Landscape

Before we dive into implementation, let's understand how memory is organized in a typical C program:

+----------------------+
|       Stack          |  ← Local variables, function calls
+----------------------+
|         ↓            |
|       Heap           |  ← Dynamic allocation (malloc/free)
+----------------------+
|         ↓            |
|       BSS            |  ← Uninitialized global/static data
+----------------------+
|       Data           |  ← Initialized global/static data
+----------------------+
|       Text           |  ← Program code
+----------------------+

The heap grows upward from lower addresses, while the stack grows downward. Our allocator will manage the heap region.

The Challenge of Memory Allocation

A memory allocator faces several challenges:

  1. Tracking free blocks - Knowing which memory regions are available
  2. Finding suitable blocks - Choosing the right free block for a request
  3. Handling fragmentation - Dealing with scattered free memory
  4. Alignment requirements - Ensuring proper memory alignment for different types
  5. Overhead minimization - Keeping metadata small while maintaining functionality

Basic Allocator Design

We'll implement a simple but effective allocator using a free list approach. Here's the basic idea:

  1. Request large memory chunks from the operating system
  2. Split these chunks into smaller blocks as needed
  3. Maintain a linked list of free blocks
  4. Coalesce adjacent free blocks when freeing

Let's start with our header structure:

#include <stddef.h>
#include <stdint.h>
#include <unistd.h>

// Block header for tracking allocated and free blocks
typedef struct block {
    size_t size;           // Size of the block (excluding header)
    struct block *next;    // Next block in free list (only for free blocks)
    int is_free;           // 1 if block is free, 0 if allocated
    // Data follows immediately after this header
} block_t;

// Global variables for our heap
static block_t *free_list = NULL;  // Head of free list
static block_t *heap_start = NULL; // Start of our heap

Finding Free Blocks

When malloc() is called, we need to find a suitable free block. We'll use the first-fit strategy - the first free block that's large enough:

// Find a free block using first-fit strategy
static block_t *find_free_block(size_t size) {
    block_t *current = free_list;
    
    while (current) {
        if (current->is_free && current->size >= size) {
            return current;
        }
        current = current->next;
    }
    
    return NULL;  // No suitable block found
}

Requesting Memory from the OS

When we can't find a suitable free block, we need to request more memory from the operating system using sbrk():

// Request more memory from the OS
static block_t *request_memory(size_t size) {
    // Calculate total size needed (header + data + alignment padding)
    size_t total_size = sizeof(block_t) + size;
    
    // Align to page size for efficiency
    size_t page_size = sysconf(_SC_PAGESIZE);
    size_t aligned_size = ((total_size + page_size - 1) / page_size) * page_size;
    
    // Request memory from OS
    void *block = sbrk(aligned_size);
    if (block == (void*)-1) {
        return NULL;  // sbrk failed
    }
    
    // Initialize the block
    block_t *new_block = (block_t*)block;
    new_block->size = aligned_size - sizeof(block_t);
    new_block->is_free = 0;
    new_block->next = NULL;
    
    // If this is our first allocation, set heap_start
    if (heap_start == NULL) {
        heap_start = new_block;
    }
    
    return new_block;
}

Splitting Blocks

If we find a free block that's much larger than needed, we should split it to avoid wasting memory:

// Split a block if it's significantly larger than needed
static void split_block(block_t *block, size_t size) {
    // Only split if the remaining space is large enough for a new block
    if (block->size - size > sizeof(block_t) + 16) {
        // Create new block for the remaining space
        block_t *new_block = (block_t*)((char*)block + sizeof(block_t) + size);
        new_block->size = block->size - size - sizeof(block_t);
        new_block->is_free = 1;
        new_block->next = block->next;
        
        // Update original block
        block->size = size;
        block->next = new_block;
        
        // Add new block to free list
        new_block->next = free_list;
        free_list = new_block;
    }
}

Implementing malloc()

Now let's put it all together in our malloc() implementation:

void *malloc(size_t size) {
    if (size == 0) {
        return NULL;
    }
    
    // Align size to 8-byte boundary for performance
    size = (size + 7) & ~7;
    
    block_t *block = find_free_block(size);
    
    if (block) {
        // Found a suitable free block
        split_block(block, size);
        block->is_free = 0;
    } else {
        // Need to request more memory from OS
        block = request_memory(size);
        if (!block) {
            return NULL;  // Allocation failed
        }
    }
    
    // Return pointer to data area (after header)
    return (void*)((char*)block + sizeof(block_t));
}

Implementing free()

Freeing memory is more complex than it seems. We need to:

  1. Mark the block as free
  2. Add it back to the free list
  3. Coalesce with adjacent free blocks to reduce fragmentation
// Coalesce adjacent free blocks
static void coalesce_blocks(block_t *block) {
    // Coalesce with next block if it's free
    if (block->next && block->next->is_free) {
        block->size += sizeof(block_t) + block->next->size;
        block->next = block->next->next;
    }
    
    // Coalesce with previous block if it's free
    block_t *current = free_list;
    while (current && current->next != block) {
        current = current->next;
    }
    
    if (current && current->is_free) {
        current->size += sizeof(block_t) + block->size;
        current->next = block->next;
        block = current;
    }
}

void free(void *ptr) {
    if (!ptr) {
        return;
    }
    
    // Get block header from pointer
    block_t *block = (block_t*)((char*)ptr - sizeof(block_t));
    
    // Mark as free
    block->is_free = 1;
    
    // Add to beginning of free list
    block->next = free_list;
    free_list = block;
    
    // Coalesce with adjacent blocks
    coalesce_blocks(block);
}

Complete Implementation

Let's put everything together in a complete, working allocator:

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>

typedef struct block {
    size_t size;
    struct block *next;
    int is_free;
} block_t;

static block_t *free_list = NULL;
static block_t *heap_start = NULL;

static block_t *find_free_block(size_t size) {
    block_t *current = free_list;
    while (current) {
        if (current->is_free && current->size >= size) {
            return current;
        }
        current = current->next;
    }
    return NULL;
}

static void split_block(block_t *block, size_t size) {
    if (block->size - size > sizeof(block_t) + 16) {
        block_t *new_block = (block_t*)((char*)block + sizeof(block_t) + size);
        new_block->size = block->size - size - sizeof(block_t);
        new_block->is_free = 1;
        new_block->next = block->next;
        
        block->size = size;
        block->next = new_block;
        
        new_block->next = free_list;
        free_list = new_block;
    }
}

static block_t *request_memory(size_t size) {
    size_t total_size = sizeof(block_t) + size;
    size_t page_size = sysconf(_SC_PAGESIZE);
    size_t aligned_size = ((total_size + page_size - 1) / page_size) * page_size;
    
    void *block = sbrk(aligned_size);
    if (block == (void*)-1) {
        return NULL;
    }
    
    block_t *new_block = (block_t*)block;
    new_block->size = aligned_size - sizeof(block_t);
    new_block->is_free = 0;
    new_block->next = NULL;
    
    if (heap_start == NULL) {
        heap_start = new_block;
    }
    
    return new_block;
}

static void coalesce_blocks(block_t *block) {
    if (block->next && block->next->is_free) {
        block->size += sizeof(block_t) + block->next->size;
        block->next = block->next->next;
    }
    
    block_t *current = free_list;
    while (current && current->next != block) {
        current = current->next;
    }
    
    if (current && current->is_free) {
        current->size += sizeof(block_t) + block->size;
        current->next = block->next;
        block = current;
    }
}

void *malloc(size_t size) {
    if (size == 0) return NULL;
    
    size = (size + 7) & ~7;  // Align to 8 bytes
    
    block_t *block = find_free_block(size);
    
    if (block) {
        split_block(block, size);
        block->is_free = 0;
    } else {
        block = request_memory(size);
        if (!block) return NULL;
    }
    
    return (void*)((char*)block + sizeof(block_t));
}

void free(void *ptr) {
    if (!ptr) return;
    
    block_t *block = (block_t*)((char*)ptr - sizeof(block_t));
    block->is_free = 1;
    
    block->next = free_list;
    free_list = block;
    
    coalesce_blocks(block);
}

// Utility function to print allocator state
void print_allocator_state() {
    printf("=== Allocator State ===\n");
    printf("Free list:\n");
    
    block_t *current = free_list;
    while (current) {
        printf("  Block at %p: size=%zu, free=%s\n", 
               (void*)current, current->size, current->is_free ? "yes" : "no");
        current = current->next;
    }
    printf("\n");
}

Testing Our Allocator

Let's create a test program to verify our allocator works correctly:

int main() {
    printf("Testing custom allocator...\n\n");
    
    // Test basic allocation and freeing
    printf("1. Basic allocation test:\n");
    int *array = (int*)malloc(10 * sizeof(int));
    if (array) {
        for (int i = 0; i < 10; i++) {
            array[i] = i * i;
        }
        printf("   Allocated and filled array\n");
        print_allocator_state();
        
        free(array);
        printf("   Freed array\n");
        print_allocator_state();
    }
    
    // Test multiple allocations
    printf("\n2. Multiple allocation test:\n");
    char *str1 = (char*)malloc(100);
    char *str2 = (char*)malloc(200);
    char *str3 = (char*)malloc(50);
    
    printf("   Allocated three blocks\n");
    print_allocator_state();
    
    free(str1);
    printf("   Freed first block\n");
    print_allocator_state();
    
    free(str3);
    printf("   Freed third block\n");
    print_allocator_state();
    
    free(str2);
    printf("   Freed second block\n");
    print_allocator_state();
    
    // Test large allocation
    printf("\n3. Large allocation test:\n");
    void *large = malloc(10000);
    printf("   Allocated large block\n");
    print_allocator_state();
    
    free(large);
    printf("   Freed large block\n");
    print_allocator_state();
    
    printf("All tests completed!\n");
    return 0;
}

Advanced Topics and Optimizations

Our basic allocator works, but there are many ways to improve it:

1. Different Fit Strategies

Instead of first-fit, we could implement:

  • Best-fit: Find the smallest block that fits
  • Worst-fit: Find the largest block (good for variable-sized allocations)
  • Next-fit: Start searching from where we left off

2. Separate Free Lists

Maintain separate free lists for different size ranges:

#define NUM_BUCKETS 8
static block_t *free_buckets[NUM_BUCKETS];

int get_bucket(size_t size) {
    // Map size to appropriate bucket
    for (int i = 0; i < NUM_BUCKETS; i++) {
        if (size <= (1 << (i + 4))) return i;
    }
    return NUM_BUCKETS - 1;
}

3. Memory Alignment

Different data types have different alignment requirements:

#define ALIGNMENT 16  // For SSE instructions

size_t aligned_size = (size + ALIGNMENT - 1) & ~(ALIGNMENT - 1);

4. Debugging Features

Add debugging capabilities:

#ifdef DEBUG
#define CANARY_VALUE 0xDEADBEEF

void add_canary(block_t *block) {
    uint32_t *canary = (uint32_t*)((char*)block + sizeof(block_t) + block->size);
    *canary = CANARY_VALUE;
}

int check_canary(block_t *block) {
    uint32_t *canary = (uint32_t*)((char*)block + sizeof(block_t) + block->size);
    return *canary == CANARY_VALUE;
}
#endif

5. Memory Pools

For applications with many small, similar-sized allocations, memory pools can be much more efficient:

typedef struct pool {
    size_t block_size;
    size_t pool_size;
    void *memory;
    void *free_list;
} pool_t;

pool_t *create_pool(size_t block_size, size_t num_blocks) {
    pool_t *pool = malloc(sizeof(pool_t));
    pool->block_size = block_size;
    pool->pool_size = block_size * num_blocks;
    pool->memory = malloc(pool->pool_size);
    
    // Initialize free list
    pool->free_list = pool->memory;
    for (size_t i = 0; i < num_blocks - 1; i++) {
        void **next = (void**)((char*)pool->free_list + i * block_size);
        *next = (void*)((char*)pool->free_list + (i + 1) * block_size);
    }
    
    return pool;
}

Performance Considerations

Fragmentation

There are two types of fragmentation:

  1. External fragmentation: Free memory is scattered in small pieces
  2. Internal fragmentation: Allocated blocks are larger than needed

Our coalescing strategy helps with external fragmentation, but there are trade-offs between different strategies.

Cache Performance

Memory allocators can significantly impact cache performance. Consider:

  • Keeping frequently used data together
  • Minimizing metadata overhead
  • Aligning blocks to cache line boundaries

Thread Safety

Our current allocator is not thread-safe. To make it thread-safe, we'd need to add synchronization:

#include <pthread.h>

static pthread_mutex_t alloc_mutex = PTHREAD_MUTEX_INITIALIZER;

void *malloc(size_t size) {
    pthread_mutex_lock(&alloc_mutex);
    // ... allocation logic ...
    pthread_mutex_unlock(&alloc_mutex);
    return result;
}

Real-World Allocators

Production allocators like jemalloc, tcmalloc, and mimalloc use sophisticated techniques:

  • Thread-local caching to avoid contention
  • Size classes for efficient small allocations
  • Lazy coalescing to balance performance and fragmentation
  • Huge pages for large allocations
  • NUMA awareness for multi-socket systems

Conclusion

Building a memory allocator teaches us valuable lessons about:

  1. Memory management fundamentals - How the heap really works
  2. Data structures - Linked lists, trees, and other structures in action
  3. Trade-offs - Performance vs. memory usage vs. complexity
  4. System programming - Interfacing with the operating system

Our simple allocator demonstrates the core concepts, but production allocators are much more sophisticated. They handle edge cases, optimize for specific workloads, and provide advanced features like debugging and profiling.

Understanding how allocators work helps us write better code - we can make more informed decisions about memory usage, recognize allocation patterns, and even choose the right allocator for our specific needs.

The next time you call malloc(), you'll have a deeper appreciation for the complex dance happening behind the scenes to manage your program's memory efficiently.


Want to dive deeper? Try implementing different allocation strategies, add debugging features, or explore how modern allocators like jemalloc handle massive workloads efficiently.