Note:
This article was published prematurely and is still undergoing work. Some additional sections need to be added to complete the article. This message will be removed when the article is complete. The editorial staff apologizes for the mix up!

Fast memory allocations along with memory leak detection can have a big impact on games performance.

C++ provides two well known functions to allocate dynamic (heap) memory (malloc and new), these functions are usually very slow because they're general purpose functions and generally require a context-switch from user mode into kernel mode. These functions also do not provide any kind of memory leak detection system natively.

Using custom allocators we can have well defined usage patterns and optimize the allocation process accordingly.

Base Allocator

Every allocator in this articles series will be derived from the class Allocator that declares 2 virtual functions (allocate and deallocate) that must be defined by each allocator.

Allocator.h

class Allocator
{
public:
    Allocator()
    {
        _UsedMemory     = 0;
        _NumAllocations = 0;
    }

    ~Allocator()
    {
        //Check memory leaks
        ASSERT(_NumAllocations == 0 && _UsedMemory == 0);
    }

    virtual void* allocate(size_t size, size_t alignment) = 0;

    virtual void deallocate(void* p) = 0;

    //Helper functions
    template <class T> T* allocateNew()
    {
        return new (allocate(sizeof(T), __alignof(T))) T;
    }

    template <class T> T* allocateNew(const T& t)
    {
        return new (allocate(sizeof(T), __alignof(T))) T(t);
    }

    template<class T> T* allocateArray(uint maxNumObjects)
    {
        if(maxNumObjects == 0)
            return NULL;

        return new (allocate(sizeof(T)*maxNumObjects, __alignof(T))) T [maxNumObjects];
    }

    template<class T> void deallocateDelete(T* pObject)
    {
        if (pObject != NULL)
        {
            pObject->~T();
            deallocate(pObject);
        }
    }

    u32 getUsedMemory() const
    {
        return _UsedMemory;
    }

    u32 getNumAllocations() const
    {
        return _NumAllocations;
    }

protected:
    u32        _UsedMemory;
    u32        _NumAllocations;
};

Memory leak detection (1)

In the code above you can see an assert in the destructor, this is a simple and easy way to check if you forgot to deallocate any memory, that won't cause any overhead or take any extra memory.

This simple method won't tell which allocation you forgot to deallocate but it will pin point in which allocator the leak occured so you can find the leak faster (especially if you use Proxy Allocators like I suggest later in this article).

Aligned Allocations

Processors access memory in word-sized blocks, so when a processor tries to access memory in an unaligned address it might have to access more word-sized memory blocks than would be needed if the memory was aligned and perform masking/shifting to get the required data in the register.

Example:
A processor accesses memory in 4-byte words (it can only directly access the words starting at (0x00, 0x04, 0x08, 0x0C,...).

If it needs to access data (4 bytes) stored at the address 0x0B it will have to read two word-sized blocks (the address 0x08 and the address 0x0C) because the data crosses the boundary from one of the blocks to the other:

If the data was stored in an aligned address like 0x0C the processor could read it in a single memory access:

Aligned data definition

Primitive data is said to be aligned if the memory address where it is stored is a multiple of the size of the primitive.

A data aggregate is said to be aligned if each primitive element in the aggregate is aligned.

Implementation

To n-byte align a memory address x we need to mask off the log2(n) least significant bits from x.

Simply masking off bits will return the first n-byte aligned address before x, so in order to find the first after x we just need to add alignment-1 to x and mask that address.

void* nextAlignedAddress(void* pAddress, u8 alignment)
{
	return (void*)( (uptr)pAddress + (alignment-1) ) & ~(alignment-1);
}

Sometimes it can be useful (like we'll see later) to calculate by how many bytes you need to adjust the address to align it.

u8 adjustment(void* pAddress, u8 alignment)
{
    u8 adjustment =  alignment - ( (uptr)pAddress & (alignment-1) );
    
    if(adjustment == alignment)
        return 0; //already aligned
    
    return adjustment;
}

Note: The alignment must be a power of 2!

Linear Allocator

A Linear Allocator is the simplest and fastest type of allocator. Pointers to the start of the allocator, to the first free address and the total size of the allocator are maintained.

Allocations

New allocations simply move the pointer to the first free address forward.

Deallocations

Individual deallocations cannot be made in linear allocators, instead use clear() to completely clear the memory used by the allocator.

Implementation

LinearAllocator.h

#ifndef LINEARALLOCATOR_H
#define LINEARALLOCATOR_H

/////////////////////////////////////////////////////////////////////////////////////////////
///////////////// Tiago Costa, 2013              
/////////////////////////////////////////////////////////////////////////////////////////////

#include "Allocator.h"

#include <new>

class LinearAllocator : public Allocator
{
    public:
    LinearAllocator(u32 size, void* pStart);
    ~LinearAllocator();
    
    void* allocate(u32 size, u8 alignment);
    
    void deallocate(void* p);
    
    void clear();
    
    private:
    LinearAllocator(const LinearAllocator&) {}; //Prevent copies because it might cause errors
    LinearAllocator& operator=(const LinearAllocator&) {};
    
    void* _pInitialPosition;
    
    void* _pCurrentPosition;
    
    u32   _Size;
};

#endif

LinearAllocator.cpp

#include "LinearAllocator.h"

LinearAllocator::LinearAllocator(u32 size, void* pStart) : Allocator()
{
	ASSERT(size > 0);

	_pInitialPosition = pStart;

	_Size             = size;

	_pCurrentPosition = pStart;

	clear();
}

LinearAllocator::~LinearAllocator()
{
	_pInitialPosition   = nullptr;
	_pCurrentPosition   = nullptr;

	_Size               = 0;
}

void* LinearAllocator::allocate(u32 size, u8 alignment)
{
	ASSERT(size != 0);

	u8 adjustment =  alignment - ( (uptr)_pCurrentPosition & (alignment-1) );
    
    if(adjustment == alignment)
        return 0; //already aligned

	if(_UsedMemory + adjustment + size > _Size)
		return nullptr;

	uptr alignedAddress = (uptr)_pCurrentPosition + adjustment;

	_pCurrentPosition = (void*)(alignedAddress + size);

	_UsedMemory += size + adjustment;
	_NumAllocations++;

	return (void*)alignedAddress;
}

void LinearAllocator::deallocate(void* p)
{
	ASSERT( false && "Use clear() instead" );
}

void LinearAllocator::clear()
{
	_NumAllocations     = 0;
	_UsedMemory         = 0;

	_pCurrentPosition   = _pInitialPosition;
}

Stack Allocator

A Stack Allocator, like the name says, works like a stack. Along with the stack size, three pointers are maintained:

Pointer to the start of the stack.
Pointer to the top of the stack.
Pointer to the last allocation made. (This is optional in release builds)

Allocations

New allocations move the pointer up by the requested number of bytes plus the adjustment needed to align the address and store the allocation header.

The allocation header provides the following information:

Adjustment used in this allocation
Pointer to the previous allocation.

Deallocations

Note: Memory must be deallocated in inverse order it was allocated! So if you allocate object A and then object B you must free object B memory before you can free object A memory.

To deallocate memory the allocator checks if the address to the memory that you want to deallocate corresponds to the address of the last allocation made.

If so the allocator accesses the allocation header so it also frees the memory used to align the allocation and store the allocation header, and it replaces the pointer to the last allocation made with the one in the allocation header.

Implementation

StackAllocator.h

#ifndef STACKALLOCATOR_H
#define STACKALLOCATOR_H

/////////////////////////////////////////////////////////////////////////////////////////////
///////////////// Tiago Costa, 2013              
/////////////////////////////////////////////////////////////////////////////////////////////

#include "Allocator.h"

#include <new>

class StackAllocator : public Allocator
{
    public:
    StackAllocator(u32 size, void* pStart);
    ~StackAllocator();
    
    void* allocate(u32 size, u8 alignment);
    
    void deallocate(void* p);
    
    void clear();
    
    private:
    StackAllocator(const StackAllocator&) {}; //Prevent copies because it might cause errors
    StackAllocator& operator=(const StackAllocator&) {};
    
    struct AllocationHeader
    {
        #if _DEBUG
            void* pPrevAddress;
        #endif
            u8 adjustment;
    };
    
    void* _pInitialPosition;
    
    void* _pPrevPosition;
    void* _pCurrentPosition;
    
    u32   _Size;
};

#endif

StackAllocator.cpp

#include "StackAllocator.h"

StackAllocator::StackAllocator(u32 size, void* pStart) : Allocator()
{
	ASSERT(size > 0);

	_pInitialPosition = pStart;

	_Size             = size;

	_pPrevPosition    = pStart;
	_pCurrentPosition = pStart;

	clear();
}

StackAllocator::~StackAllocator()
{
	_pInitialPosition   = nullptr;
	_pPrevPosition      = nullptr;
	_pCurrentPosition   = nullptr;

	_Size               = 0;
}

void* StackAllocator::allocate(u32 size, u8 alignment)
{
	ASSERT(size != 0);

	u32 adjustment = alignment - ( (uptr)_pCurrentPosition & (alignment-1) );

	//Increase adjustment so we can store the Allocation Header
	u32 neededSpace = sizeof(AllocationHeader);

	if(adjustment < neededSpace)
	{
		neededSpace -= adjustment;

		adjustment += alignment * (neededSpace / alignment);

		if(neededSpace % alignment > 0)
			adjustment += alignment;
	}

	if(_UsedMemory + adjustment + size > _Size)
		return nullptr;

	uptr alignedAddress = (uptr)_pCurrentPosition + adjustment;

	//Add Allocation Header
	AllocationHeader* pHeader = (AllocationHeader*) alignedAddress-sizeof(AllocationHeader);

	pHeader->adjustment   = adjustment;

	#if _DEBUG
	pHeader->pPrevAddress = _pPrevPosition;

	_pPrevPosition    = (void*)alignedAddress;
	#endif

	_pCurrentPosition = (void*)(alignedAddress + size);

	_UsedMemory += size + adjustment;
	_NumAllocations++;

	return (void*)alignedAddress;
}

void StackAllocator::deallocate(void* p)
{
	ASSERT( p == _pPrevPosition );

	AllocationHeader* pHeader = (AllocationHeader*) (uptr)p - sizeof(AllocationHeader);

	_pCurrentPosition = (void*)( (uptr)p - pHeader->adjustment );

	#if _DEBUG
	_pPrevPosition = pHeader->pPrevAddress;
	#endif
}

void StackAllocator::clear()
{
	_NumAllocations     = 0;
	_UsedMemory         = 0;

	_pPrevPosition      = nullptr;
	_pCurrentPosition   = _pInitialPosition;
}

Note: Storing the last previous allocations in a list-like fashion and checking it before deallocations is not mandatory so it can be disabled in release builds. It's just helpful to prevent memory from being overwritten causing bugs.

Proxy Allocator

A Proxy Allocator is a special kind of allocator. It is just used to help with memory leak and subsystem memory usage tracking.

It will simply redirect all allocations/deallocations to the allocator passed as argument in the constructor while keeping track of how many allocations it made and how much memory it is "using".

Example:
Two subsystems use the same allocator A.
If you want to show in the debugging user interface how much memory each subsystem is using, you create a proxy allocator, that redirects all allocations/deallocations to A, in each subsystem and track their memory usage.

It will also help in memory leak tracking because the assert in the proxy allocator destructor of the subsystem that is leaking memory will fail.

Implementation

ProxyAllocator.h

#ifndef PROXYALLOCATOR_H
#define PROXYALLOCATOR_H

/////////////////////////////////////////////////////////////////////////////////////////////
///////////////// Tiago Costa, 2013              
/////////////////////////////////////////////////////////////////////////////////////////////

#include "Allocator.h"

#include <new>

namespace Aqua
{
	class ProxyAllocator : public Allocator
	{
	public:
		ProxyAllocator(Allocator* pAllocator);
		~ProxyAllocator();

		void* allocate(size_t size, size_t alignment);
		
		void deallocate(void* p);

	private:
		ProxyAllocator(const ProxyAllocator&) {}; //Prevent copies because it might cause errors
		ProxyAllocator& operator=(const ProxyAllocator&) {};

		Allocator* _pAllocator;
	};
};

#endif

ProxyAllocator.cpp

#include "ProxyAllocator.h"
#include "Debug.h"

using namespace Aqua;

ProxyAllocator::ProxyAllocator(Allocator* pAllocator) : Allocator()
{
	ASSERT(pAllocator != NULL);

	_pAllocator = pAllocator;
	_UsedMemory = 0;
	_NumAllocations = 0;
}

ProxyAllocator::~ProxyAllocator()
{
	_pAllocator = nullptr;
}

void* ProxyAllocator::allocate(size_t size, size_t alignment)
{
	ASSERT(_pAllocator != NULL);

	_NumAllocations++;

	u32 mem = _pAllocator->getUsedMemory();

	void* p = _pAllocator->allocate(size, alignment);

	_UsedMemory += _pAllocator->getUsedMemory() - mem;

	return p;
}
		
void ProxyAllocator::deallocate(void* p)
{
	ASSERT(_pAllocator != NULL);

	_NumAllocations--;

	u32 mem = _pAllocator->getUsedMemory();

	_pAllocator->deallocate(p);

	_UsedMemory -= mem - _pAllocator->getUsedMemory();
}

Allocator Managment

A large block of memory should be allocated when the program starts using malloc (and this should be the only malloc made) this large block of memory is managed by a global allocator (for example a stack allocator).

Each subsytem should then allocate the block of memory it needs to work from the global allocator, and create allocators that will manage that memory.

Example:

int ResourceManager::init(Allocator* pGlobalAllocator)
{
    void* pMem = pGlobalAllocator.allocate(RESOURCE_MANAGER_MEMORY_SIZE, 1);
    
    pResourcePoolAllocator = new (pMem) PoolAllocator(memorySize-sizeof(PoolAllocator), (char*)_pMemory+sizeof(PoolAllocator));
    
    //...
}

Tips & Tricks

Depending on the type of allocator, keep the number of individual allocations to a minimum to reduce the memory wasted by allocation headers.
Prefer using allocateArray() to individual allocations when it makes sense. Most allocators will use extra memory in each allocation to store allocation headers and arrays will only need single header.

Performance Comparasion

To test the performance of each allocator compared to malloc I wrote a program that measures how long it takes to make 20000 allocations (you can download the program in the end of the article), the tests where made in release mode and the results are averages of 3 runs.

Malloc vs Linear Allocator

Allocator	Time (s)
Malloc	1
Linear	0.5

Malloc vs Stack Allocator

Allocator	Time (s)
Malloc	1
Stack	0.5

Malloc vs FreeList Allocator

Allocator	Time (s)
Malloc	1
FreeList	0.5

Malloc vs Pool Allocator

Allocator	Time (s)
Malloc	1
Pool	0.5

Conclusion

There isn't a single best allocator - it's important to think about how the memory will be allocated/accessed/deallocated and choose the right allocator for each situation.

Reference

http://bitsquid.blogspot.pt/2010/09/custom-memory-allocation-in-c.html
Game Engine Architecture, Jason Gregory 2009

Article Update Log

06 April 2013: Initial release

Base Allocator

Memory leak detection (1)

Aligned Allocations

Aligned data definition

Implementation

Linear Allocator

Allocations

Deallocations

Implementation

Stack Allocator

Allocations

Deallocations

Implementation

Proxy Allocator

Implementation

Allocator Managment

Tips & Tricks

Performance Comparasion

Malloc vs Linear Allocator

Malloc vs Stack Allocator

Malloc vs FreeList Allocator

Malloc vs Pool Allocator

Conclusion

Reference

Article Update Log

Trending Articles