Wait-Free Programming From Scratch

Jatin Chowdhury
10 min readJan 27, 2025

--

When you’ve worked in audio programming for as long as I have, you hear a lot about lock-free and wait-free programming. The fundamental idea is very simple: our audio process needs to run in real-time, meaning it needs to be wait-free, however, we still need to be able to communicate with our audio thread from other (non-realtime) threads. This inter-thread communication includes sharing parameter data, metering data, MIDI data, sample/impulse response data and more.

So how do we do wait-free programming? Well you may have heard terms like “atomics” and “lock-free queues” thrown around, but what do they mean? How do we use them to solve our problems? My goal with this article is to build up a simple understanding of wait-free programming from scratch to the point that we can solve a relatively complex problem with a few simple building blocks.

The Problem

I’m going to use a similar example to the one that Timur Doumler used in his ADC talk on Read-Copy-Update. Imagine we have a synthesizer with an ADSR envelope. In our code, the envelope is defined as:

struct Envelope_Params
{
double attack;
double decay;
double sustain;
double release;
};

We have some GUI that the user can use to set the envelope parameters but we need to pass the parameters through to the audio thread so that the audio process can correctly apply the envelope to the synthesizer’s voices.

Envelope parameters from the Surge XT synthesizer.

Let’s talk through some ways that we might approach this problem…

Atomics

The first and most useful way to approach a wait-free programming problem is to use atomics. Atomic machine instructions allow operations to be performed on data shared by multiple threads without locking or waiting. However, CPUs only support atomic machine instructions for data up to a certain size (usually 64 or 128 bits). The C++ standard library “works around” this limitation by inserting locks for std::atomic variables larger than that size, but those aren’t really helpful for our use case, since we want to pass our data from the UI thread to the audio thread without locking or waiting.

Sure enough, most C++ compilers will fail to compile a std::atomic<Envelope_Params>if we assert that the atomic must also be lock-free.

Now a reasonable solution could be to make the Envelope_Params type smaller (e.g. by using floats instead of doubles), or to make the members of Envelope_Params atomic rather than the whole struct. However, for the sake of this example, let’s assume that neither work-around is acceptable for this use-case.

Atomic Pointers

So we know that std::atomic<Envelope_Params>won’t be lock-free, but what about std::atomic<Envelope_Params*>? Since pointers are 64 bits on most modern CPUs, this type will be lock-free on most CPUs. So maybe we can do something like this:

std::atomic<Envelope_Params*> params {};

// ui thread
void set_envelope_params()
{
const auto [attack, decay, sustain, release] = get_params_from_ui();
auto* new_params = new Envelope_Params {attack, decay, sustain, release};
auto* old_params = params.exchange (new_params);
delete old_params;
}

// audio thread
void use_envelope_params()
{
auto* latest_params = params.load();
auto envelope_buffer = generate_envelope_buffer(
latest_params->attack,
latest_params->decay,
latest_params->sustain,
latest_params->release,
);
}

Unfortunately, this approach won’t work… since set_envelope_params() and use_envelope_params()are called on different threads, they could be called concurrently. That could lead to the latest_params pointer being freed from the UI thread, while the audio thread is still referencing it!

One option here could be to use something like a compare-swap loop to avoid deleting the “live” pointer while it’s in use by the audio thread. However, this would lead to the UI thread “waiting” for the audio thread. In some cases this wait might be acceptable, but let’s say that in this case we want our envelope UI to animate smoothly for the user. In order to make that happen we would like our UI thread to be wait-free as well.

Lock-Free Queues

So now it’s time to bring out the heavy-duty solutions! A lock-free queue allows data to be passed from one thread to another without using locks, and for a single-producer/single-consumer queue, this can be implemented with a wait-free guarantee.

For the sake of time, I’m not going to attempt to explain the workings of a lock-free queue in this article… Instead I’ll recommend this blog post from “moodycamel”, who created of one of the most widely-used open-source lock-free queue implementations.

Solving our problem with a lock-free queue might look something like this:

Lock_Free_Queue<Envelope_Params*> params_queue {};

// ui thread
void set_envelope_params()
{
const auto [attack, decay, sustain, release] = get_params_from_ui();
queue.push (new Envelope_Params {attack, decay, sustain, release});
}

// audio thread
void use_envelope_params()
{
const auto* latest_params = queue.pop();
auto envelope_buffer = generate_envelope_buffer(
latest_params->attack,
latest_params->decay,
latest_params->sustain,
latest_params->release,
);
}

However, there are still problems with this code! The first problem is that all the memory used to construct the Envelope_Paramsis leaked. Since we don’t want to free any memory from the audio thread, we could use a second lock-free queue to send “zombie” pointers back to the UI thread to be freed later.

Another option could be to use a queue of objects rather than pointers (i.e. Lock_Free_Queue<Envelope_Params>). This should work fine in this case since Envelope_Paramsis trivially constructible and destructible, and is relatively small. However, for larger and more complicated objects this strategy may not work! For example imagine passing a 1-minute long audio sample through a lock-free queue… popping the sample from the queue might require copying all the data from the sample, which would take a long time, and could cause the audio process to miss its real-time deadline.

One more issue with lock-free queues is that they require both threads to be making regular progress, otherwise the queue might fill up. If the queue is full, we can deal with it either by writing over the old data, or by “growing” the queue which typically requires allocating additional memory. In the world of audio plugins, the host may stop calling the plugin’s audio callback (e.g. if there’s no incoming audio), in which case we either have to freeze the UI as well, or deal with the potential for our lock-free queues to fill up.

Solving From Scratch

So what if we try to solve this problem from scratch, using the information that we’ve built up so far.

Writing

Let’s start on the UI thread, which in this case is our “writer” thread. Let’s make a pointer that the UI thread can write to, and since we expect that the audio thread is going to want to “read” this pointer, let’s make it atomic.

std::atomic<Envelope_Params*> ui_thread_params {};

// ui thread
void set_envelope_params()
{
// construct new parameters
const auto [attack, decay, sustain, release] = get_params_from_ui();
auto* new_params = new Envelope_Params {attack, decay, sustain, release};

// write new parameters to UI thread pointer
auto* old_params_ptr = ui_thread_params.exchange (new_params);

// If there was already UI thread pointer let's delete it (no memory leaks!)
delete old_params_ptr;
}

Reading

Now on the audio thread, we can check if ui_thread_params is not nullptr, and pull out the pointer if so. We’ll need another pointer to manage the “live” object on the audio thread, but since we’re only going to read/write from this pointer on the audio thread it doesn’t need to be atomic.

std::atomic<Envelope_Params*> ui_thread_params {};
Envelope_Params* audio_thread_params {};

// ui thread
void set_envelope_params()
{
const auto [attack, decay, sustain, release] = get_params_from_ui();
auto* new_params = new Envelope_Params {attack, decay, sustain, release};
auto* old_params_ptr = ui_thread_params.exchange (new_params);
delete old_params_ptr;
}

// audio thread
void use_envelope_params()
{
// check if there are new UI params available
if (ui_thread_object.load() != nullptr)
{
// store the new params in the audio thread pointer.
// store nullptr in the UI thread pointer so that it
// will be null the next time we check it.
audio_thread_params = ui_thread_params.exchange (nullptr);
}

// use the parameters on the audio thread!
auto envelope_buffer = generate_envelope_buffer(
audio_thread_params->attack,
audio_thread_params->decay,
audio_thread_params->sustain,
audio_thread_params->release,
);
}

Now it is possible for the UI thread to write to ui_thread_paramsin between when the audio thread checks if its nullptrand when the audio thread does its pointer exchange. However, this is not an issue, so long as the UI thread always writes a non-null value.

However, there is still a bug in this code… currently when the audio_thread_params are over-written, the previous pointer is leaked!

Zombies!

Once the audio thread pointer is no longer needed, we can’t free it immediately, because we don’t want to free memory on the audio thread, so we’ll call it a “zombie” pointer… it’s no longer alive, but it’s still out and about, and potentially causing problems. So now what we need is a “zombie list” to take the zombie pointer back to the UI thread where it can be killed properly! We could use a lock-free queue for this purpose, but I want to demonstrate a different strategy here that’s a little bit more lightweight.

Let’s start by wrapping our envelope parameters as a node in a linked-list, and pushing zombies into the linked list from the audio thread.

struct Node
{
Envelope_Params params {};
Node* next {}l
};
std::atomic<Node*> ui_thread_node {};
Node* audio_thread_node {};

Node* zombie_list_head {};
std::atomic<Node*> zombie_list_tail {};

// audio thread
void use_envelope_params()
{
if (ui_thread_object.load() != nullptr)
{
// the old audio thread pointer is now a zombie!
auto* new_zombie = audio_thread_node;
audio_thread_node = ui_thread_object.exchange (nullptr);

// push new zombie pointer onto the zombie list!
// first, point the current list tail to the new zombie
zombie_list_tail.load()->next = new_zombie;
// then set the new zombie as the new list tail
zombie_list_tail.store (new_zombie);
}

auto envelope_buffer = generate_envelope_buffer(
audio_thread_node->params.attack,
audio_thread_node->params.decay,
audio_thread_node->params.sustain,
audio_thread_node->params.release,
);
}

As is stands right now, the zombie list will keep growing forever. At some point, we also need to kill the zombies on the UI thread. In theory this could happen any time the UI thread feels like it, but for this example let’s kill the zombies before we set the new envelope parameters.

// ui thread
void kill_zombies()
{
// kill every pointer in the zombie list!
while (zombie_list_head != nullptr)
{
auto* zombie_to_kill = zombie_list_head;
zombie_list_head = zombie_list_head->next;
delete zombie_to_kill;
}
// this zombie has already been killed, so let's make it nullptr
zombie_list_tail.store (nullptr);
}

// ui thread
void set_envelope_params()
{
// kill the zombies before writing new parameters!
kill_zombies();

const auto [attack, decay, sustain, release] = get_params_from_ui();
auto* new_node = new Node {{attack, decay, sustain, release}};
auto* old_node = ui_thread_node.exchange (new_node);
delete old_node;
}

However, there’s a sneaky little bug in this implementation. If the UI thread clears the entire zombie list, then when the audio thread goes to push a new zombie into the list, zombie_list_tail will be null. There’s probably a way to work-around this by checking if the list is empty an doing some other weird atomic business, but in this specific case there’s a much easier fix. Instead of killing every zombie in the list, what if we always leave the tail alive?

// ui thread
void kill_zombies()
{
// kill every pointer in the zombie list!
while (zombie_list_head != zombie_list_tail.load())
{
auto* zombie_to_kill = zombie_list_head;
zombie_list_head = zombie_list_head->next;
delete zombie_to_kill;
}
}

This ensures that our zombie_list_tail will always be a valid pointer for the audio thread to append to! And with that our implementation is complete… well, sort of.

Memory Management

The informed reader will notice that I’ve been using new and deletefor memory allocation/deallocation throughout this article, rather than modern C++’s “managed” pointers (e.g. std::unique_ptr). The reason for this choice is that I actually don’t want to use “managed” allocations for this use-case… or more accurately, I don’t want to use managed individual allocations. For this type of use-case, I find it’s both simpler and more efficient to think in terms of group allocations rather than individual allocations, and personally, I would rather implement my own allocator for this purpose rather than use new/delete or std::unique_ptr. I think the best choice of allocation strategies for this example would be a “pool” allocator, although other allocator types might work as well.

Incidentally, when implementing this pattern with a pool allocator, I noticed that the pool only needs enough space to allocate 4 Envelope_Params objects. I’m pretty sure that this will always be the case, so long as the zombie list is cleared every time before writing to the UI thread pointer, but I’ll leave that proof as an exercise for the reader :).

Full Implementation

Here’s a full implementation of the example discussed above:

struct Envelope_Params
{
double attack;
double decay;
double sustain;
double release;
};

struct Node
{
Envelope_Params params {};
Node* next {}l
};

Pool_Allocator<Node> allocator {4}; // we only need enough space for 4 nodes!

std::atomic<Node*> ui_thread_node {};
Node* audio_thread_node {};

Node* zombie_list_head {};
std::atomic<Node*> zombie_list_tail {};

void init()
{
main_thread_object.store (nullptr);

audio_thread_object = allocator.allocate<Node> ({});

// Allocate an extra object to start the free list
zombie_list_head = allocator.allocate<Node> ({});
zombie_list_tail.store (zombie_list_head);
}

// ui thread
void kill_zombies()
{
while (zombie_list_head != zombie_list_tail.store())
{
auto* zombie_to_kill = zombie_list_head;
zombie_list_head = zombie_list_head->next;
allocator.free (zombie_to_kill);
}
}

// ui thread
void set_envelope_params()
{
kill_zombies();

const auto [attack, decay, sustain, release] = get_params_from_ui();
auto* new_node = allocator.allocate<Node> ({{attack, decay, sustain, release}});
auto* old_node = ui_thread_node.exchange (new_node);
allocator.free (old_node);
}

// audio thread
void use_envelope_params()
{
if (ui_thread_object.load() != nullptr)
{
auto* new_zombie = audio_thread_node;
audio_thread_node = ui_thread_object.exchange (nullptr);

zombie_list_tail.load()->next = new_zombie;
zombie_list_tail.store (new_zombie);
}

auto envelope_buffer = generate_envelope_buffer(
audio_thread_node->params.attack,
audio_thread_node->params.decay,
audio_thread_node->params.sustain,
audio_thread_node->params.release,
);
}

Conclusion

I hope this article has provided some insight into wait-free programming as well as an appreciation for the sorts of bugs and edge-cases that tend to show up when solving these types of problems. If you think you’ve spotted a bug or other issue with my implementation please let me know! I’d love to hear about it :).

--

--

Jatin Chowdhury
Jatin Chowdhury

No responses yet